SQL Query for Random 10% with a minimum of 20 rows

1

I have been tasked with generating a report that will randomly pick 10% of a unique ID, unless 10% is less than 20 items in which case the report would pick 20 random ID's. I have been using NewID to generate the 10%, but that really isn't the best as it gives me variable results (IE: more or less than 10%)

Code also includes my attempts to get a total count of the results:

select  UniqueID, TotalCount = Count(*) Over(), SUM(COUNT(UniqueID)) OVER() 
AS total_count 
from table 
where 0.15 >= CAST(CHECKSUM(NEWID(), UniqueID) & 0x7fffffff AS float) / CAST (0x7fffffff AS int)
group by UniqueID
sql
sql-server
reporting-services
sql-server-2008-r2
ssrs-2008
asked on Stack Overflow Feb 1, 2019 by Monny • edited Feb 1, 2019 by Gordon Linoff

4 Answers

1

I would generate a row_number ordered on new_id, then work out how many (N) of the rows you can take (rows 1......N) have to be random. if you have less than 20 rows, you get all rows

whatever your query needs to be, put it in the WITH CTE AS ("your query here")

and add the row_number() over (order by newid()) as x to the list of selected columns

WITH CTE AS
  (select *,row_number() over (order by newid()) as x from istasks ) 
  SELECT * 
  FROM CTE 
  CROSS APPLY 
       (SELECT MAX(c2.X) MX, ROUND(.1* MAX(c2.X),0) P  --P is no rows needed according to 10% rule  
        FROM CTE C2
       ) DQ
  WHERE CTE.X <= CASE WHEN P < 20 THEN 20 ELSE P END  --take 20 rows if P < 20
answered on Stack Overflow Feb 1, 2019 by Cato • edited Feb 1, 2019 by Cato
1
if ((select count (*) as a from (select top 10 percent * from  [table]) t) < 20)
BEGIN
Select top 20 * from [table] order by newid()
END 
ELSE 
BEGIN
select top 10 percent * from  [table] order by newid() END
answered on Stack Overflow Feb 1, 2019 by ShruS
0

If you don't need to know the total amount of rows, then the construct below is probably the fastest approach for your "10% or first 20" requirement

If (SELECT COUNT(*) FROM (
      SELECT TOP 200 *  -- if 10% = 20, then 100% = 200
      FROM [table]       ) AS top200
   ) < 200
BEGIN
  SELECT TOP 20 ...
END
ELSE
BEGIN
  SELECT TOP 10 PERCENT ...
END

Note that there is no ORDER BY in the inner query

answered on Stack Overflow Feb 1, 2019 by Gert-Jan
0

If there was a UniqueID in your table where COUNT(UniqueID) was greater than 1 (per UniqueID), then that UniqueID wasn’t unique, so the TotalCount and total_count columns in your query should return the same values.

To filter out the top 10 percent (or at least 20) of your records, you could calculate a random row number (using NEWID()) and apply a filter to it (including the total number of records). Both the row number and the total number of records can be calculated using window functions (with an OVER clause), but since these functions can’t be used in a WHERE clause (they provide additional information after the regular result set has been generated), the calculation must happen in a subquery (which can be written as a CTE). My suggestion is to try it like this:

WITH
  cte AS (
    SELECT UniqueID,
      RowNumber = ROW_NUMBER() OVER (ORDER BY NEWID()),
      MaxNumber = COUNT(*) OVER ()
    FROM YourTable
  )
SELECT UniqueID
FROM cte
WHERE RowNumber <= MaxNumber/10 OR RowNumber <= 20;

If you need to round the 10% value up for fractional values (like TOP x PERCENT does), try an alternative WHERE clause:

WHERE RowNumber <= (MaxNumber+9)/10 OR RowNumber <= 20;
answered on Stack Overflow Feb 2, 2019 by Wolfgang Kais

User contributions licensed under CC BY-SA 3.0