Saturday 14 September 2013

How to create 4,294,967,296 rows of data from 1 duplicate in 5 easy moves


Like this:

You load data from a source with an allegedly unique key.

Then you need to do some calculations on it, using data from other sources say, or perhaps do some calculations in the script (to make charts faster in the app) so you join the table back onto itself.

Imagine there's a duplicate, 2 lines with the same key in. When you join it onto itself, each line will replicate X the number of rows it matches in the table that you join it with. IE 2 rows will turn into 2 X 2 = 4 rows:

Data
Data_2
Data - after join 
Key# Data
Key# Data
Key# Data
1 a
1 a
1 a
2 b
2 b
2 b
3 c
3 c
3 c
4 d
4 d
4 d
4 e
4 e
4 e






4 d






4 e

If you were to then join the table onto itself again, each row with key #4 will join back onto itself 4 times, creating 4 X 4rows = 16.

So far so obvious.