23 September 2007

SQL Reloaded: 6 out of 49 (Part 1: Getting the data)

I was thinking to model the 6 out of 49 lottery drawing, so common across EU countries. To study numbers’ occurrences we need to import the historical data, though during my first attempts this proved to be challenging. Each lottery has a website from which data can be taken drawing by drawing or when possible, as a data dump (see notes). Moreover, some of the lotteries have variations for 6/49.       

To avoid any legal litigations, I will use a self-made set of data via the RAND function (which unfortunately is not a perfect randomizer). To store the generated data, I've created a table called LottoExtracts: 

-- cleaning the table
DROP TABLE IF EXISTS dbo.LottoExtracts

--creating the LottoExtracts table 
CREATE TABLE dbo.LottoExtracts(     
   DrawingDate date NULL
 , N1 smallint NULL
 , N2 smallint NULL
 , N3 smallint NULL
 , N4 smallint NULL
 , N5 smallint NULL
 , N6 smallint NULL)

This format will be used to upload the data in a staging table which is used later as source for our data model. As can be seen it has a Drawing Date which identifies uniquely an extraction, while in N1, …, N6 are stored the drawings’ numbers in the order they were picked. 

--creating random data 
DECLARE @index, @counter int 
DECLARE @N1, @N2, @N3, @N4, @N5, @N6 smallint 
  
SET @index = 0 
WHILE @index < 2000 
BEGIN 
  -- creating the first number 
  SET @N1 = Cast(1000 * RAND() as int) % 48 + 1 

  -- creating the second number 
  SET @counter = 1 
  WHILE @counter < 50 or @N1 = @N2 
  BEGIN 
	  SET @N2 = Cast(1000 * RAND() as int) % 48 + 1;
	  SET @counter = @counter + 1 
  END;

  -- creating the third number
  SET @counter = 1 
  WHILE @counter < 50 or @N3 IN (@N1, @N2) 
  BEGIN 
	  SET @N3 = Cast(1000 * RAND() as int) % 48 + 1 
	  SET @counter = @counter + 1 
  END 

  -- creating the fourth number 
  SET @counter = 1 
  WHILE @counter < 50 or @N4 IN (@N1, @N2, @N3) 
  BEGIN 
	  SET @N4 = Cast(1000 * RAND() as int) % 48 + 1 
	  SET @counter = @counter + 1 
  END 

  -- creating the fifth number 
  SET @counter = 1 
  WHILE @counter < 50 or @N5 IN (@N1, @N2, @N3, @N4) 
  BEGIN 
	  SET @N5 = Cast(1000 * RAND() as int) % 48 + 1 
	  SET @counter = @counter + 1 
  END 

  -- creating the sixth number 
  SET @counter = 1 
  WHILE @counter < 50 or @N6 IN (@N1, @N2, @N3, @N4, @N5) 
  BEGIN 
	  SET @N6 = Cast(1000 * RAND() as int) % 48 + 1 
	  SET @counter = @counter + 1 
  END 

  --inserting a new drawing 
  INSERT LottoExtracts(DrawingDate, N1, N2, N3, N4, N5, N6) 
  VALUES (DateAdd(d, @index*7, '01/01/2000'), @N1, @N2, @N3, @N4, @N5, @N6) 
  SET @index = @index + 1 
END 

I've created only 200 records, which equates with almost 4 years of history, and should be enough for testing purposes. Of course, one can increase or decrease the number of drawings as needed! 
 
--checking the data 
SELECT * 
FROM dbo.LottoExtracts   

In what concerns the code, the following statement creates a random number in 1 … 49 range:
 
-- creating first number 
SET @N1 = Cast(1000 * RAND() as int) % 48 + 1    

Thus, can be created all 6 numbers, however because the same number can't be extracted twice in the same drawing, I had to add duplication checks:
 
-- creating third number 
SET @counter = 1 
WHILE @counter < 50 or @N3 IN (@N1, @N2) 
BEGIN 
	SET @N3 = Cast(1000 * RAND() as int) % 48 + 1 
	SET @counter = @counter + 1 
END    

Notes:
(1) The code is verbose and probably suboptimal, though it's a must because of the behavior of RAND function as part of transactions.
(2) The below links could change in time. I provided them just to make easier the search. (*) denotes that historical data can be downloaded.
Previous Post <<||>> Next Post

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.