Select random elements - SQL Server

0

I need to select 10 random elements from a table.I know how to do that, else this question have been answered a million times here on SO. By my problem is the randomisation isn't good enough

I've sat up the test case showing my problem :

DECLARE @Random TABLE
(
  Id int, 
  [Count] int  
)

DECLARE @TestData TABLE
(
  Id int
)

declare @runs int = 0;

 WHILE (@runs <=800)
 begin
   insert into @TestData values(@runs)
   set @runs = @runs +1 
 end;

 set @runs = 0 

 WHILE (@runs <=100)
 begin
     MERGE @Random AS target  
--      USING (SELECT ID FROM @TestData  where 0.01 >= CAST(CHECKSUM(NEWID(), id) & 0x7fffffff AS float) / CAST (0x7fffffff AS int) ) 
--      USING (SELECT top 10 ID FROM @TestData order by newid()) 
        USING (SELECT top 10 ID FROM @TestData order by abs(checksum(newid())) % 100)

AS SOURCE
        ON (target.id = source.id)  
        WHEN MATCHED THEN               
            UPDATE SET Target.[Count]  = Target.[Count] + 1  
        WHEN NOT MATCHED THEN  
            INSERT (ID, [Count])  VALUES (source.ID, 1);
    set @runs = @runs +1 
end

 select [count], count(*) "count(*)" from @Random group by [count] order by 1 desc 

As you can see I've tried several methods in order of randomisation. But every time I end up with a result like this :

enter image description here

So in short how do I select really random elements from a table?

Scope: SQL Server 2017, so every language feature is acceptable

sql-server
random
sql-server-2012
asked on Stack Overflow Jan 9, 2018 by Jens Borrisholt • edited Jan 9, 2018 by DineshDB

1 Answer

3

The issue is with your output query I believe, although this answer doesn't verify the randomness it should show that it's pretty random.

Firstly, don't use Keywords such as COUNT as column names if you can help it. That's what has confused your output here.

Run this sample with 10000 runs and you should get a random set or results, but I'm not claiming it's fully randomised:

DECLARE @Random TABLE
    (
        Id INT ,
        Occurences INT
    );

DECLARE @TestData TABLE
    (
        Id INT
    );

DECLARE @runs INT = 0;

WHILE ( @runs <= 800 )
    BEGIN
        INSERT INTO @TestData
        VALUES ( @runs );
        SET @runs = @runs + 1;
    END;

SET @runs = 0;

WHILE ( @runs <= 10000 )
    BEGIN
        MERGE @Random AS target
        USING (   SELECT   TOP 10 Id
                  FROM     @TestData
                  ORDER BY ABS(CHECKSUM(NEWID())) % 100 ) AS SOURCE
        ON ( target.Id = SOURCE.Id )
        WHEN MATCHED THEN
            UPDATE SET target.Occurences = target.Occurences + 1
        WHEN NOT MATCHED THEN INSERT ( Id ,
                                       Occurences )
                              VALUES ( SOURCE.Id, 1 );
        SET @runs = @runs + 1;
    END;

SELECT   Id ,
         Occurences
FROM     @Random
ORDER BY Id;

NOTE: this should help you further your investigation but is not proof of the randomness. Further testing should be carried out.

answered on Stack Overflow Jan 9, 2018 by Tanner

User contributions licensed under CC BY-SA 3.0