I have been tasked with generating a report that will randomly pick 10% of a unique ID, unless 10% is less than 20 items in which case the report would pick 20 random ID's.
I have been using NewID
to generate the 10%, but that really isn't the best as it gives me variable results (IE: more or less than 10%)
Code also includes my attempts to get a total count of the results:
select UniqueID, TotalCount = Count(*) Over(), SUM(COUNT(UniqueID)) OVER()
AS total_count
from table
where 0.15 >= CAST(CHECKSUM(NEWID(), UniqueID) & 0x7fffffff AS float) / CAST (0x7fffffff AS int)
group by UniqueID
I would generate a row_number ordered on new_id, then work out how many (N) of the rows you can take (rows 1......N) have to be random. if you have less than 20 rows, you get all rows
whatever your query needs to be, put it in the WITH CTE AS ("your query here")
and add the row_number() over (order by newid()) as x to the list of selected columns
WITH CTE AS
(select *,row_number() over (order by newid()) as x from istasks )
SELECT *
FROM CTE
CROSS APPLY
(SELECT MAX(c2.X) MX, ROUND(.1* MAX(c2.X),0) P --P is no rows needed according to 10% rule
FROM CTE C2
) DQ
WHERE CTE.X <= CASE WHEN P < 20 THEN 20 ELSE P END --take 20 rows if P < 20
if ((select count (*) as a from (select top 10 percent * from [table]) t) < 20)
BEGIN
Select top 20 * from [table] order by newid()
END
ELSE
BEGIN
select top 10 percent * from [table] order by newid() END
If you don't need to know the total amount of rows, then the construct below is probably the fastest approach for your "10% or first 20" requirement
If (SELECT COUNT(*) FROM (
SELECT TOP 200 * -- if 10% = 20, then 100% = 200
FROM [table] ) AS top200
) < 200
BEGIN
SELECT TOP 20 ...
END
ELSE
BEGIN
SELECT TOP 10 PERCENT ...
END
Note that there is no ORDER BY in the inner query
If there was a UniqueID
in your table where COUNT(UniqueID)
was greater than 1 (per UniqueID
), then that UniqueID wasn’t unique, so the TotalCount
and total_count
columns in your query should return the same values.
To filter out the top 10 percent (or at least 20) of your records, you could calculate a random row number (using NEWID()
) and apply a filter to it (including the total number of records). Both the row number and the total number of records can be calculated using window functions (with an OVER clause), but since these functions can’t be used in a WHERE clause (they provide additional information after the regular result set has been generated), the calculation must happen in a subquery (which can be written as a CTE). My suggestion is to try it like this:
WITH
cte AS (
SELECT UniqueID,
RowNumber = ROW_NUMBER() OVER (ORDER BY NEWID()),
MaxNumber = COUNT(*) OVER ()
FROM YourTable
)
SELECT UniqueID
FROM cte
WHERE RowNumber <= MaxNumber/10 OR RowNumber <= 20;
If you need to round the 10% value up for fractional values (like TOP x PERCENT does), try an alternative WHERE clause:
WHERE RowNumber <= (MaxNumber+9)/10 OR RowNumber <= 20;
User contributions licensed under CC BY-SA 3.0