I am using SQL and obtained the following table:
userID| time | location A10 | 20130801| 1000 A10 | 20130802| 1002 A10 | 20130806| 1008 B21 | 20130803| 1000 B21 | 20130801| 1099 C11 | 20130802| 1000 D33 | 20130802| 1002 D33 | 20130806| 1877 E01 | 20130801| 1765 E01 | 20130801| 1000 E01 | 20130802| 1000
where userID is String, Time is YYYYMM, and Location is location ID (numeric) for each userID in YYYYMM.
In this example, I have 5 unique userIDs (A10,B21,C11,D33,E01). I would like to write query that randomly sample X percent of unique userIDs (for example, if X=80, randomly sample 4 unique userIDs from 5).
Select time,location, count(DISTINCT userID) as n_uu from( --- --- here, I construct the example table --- ) as maintable where 0.8 >= CAST(CHECKSUM(NEWID(), userID) & 0x7fffffff AS float)/CAST (0x7fffffff AS int) group by time, location
where I finally intend to count the number of userIDs for randomly selected samples (i.e., 80% of unique userIDs). That is, I try to get following table in this example (now suppose that B21 is not sampled):
time | location | n_uu 20130801| 1000 | 2 20130801| 1765 | 1 20130802| 1002 | 2 20130802| 1000 | 2 20130806| 1008 | 1 20130806| 1877 | 1
Yet, this seems not to randomly sample unique userIDs, rather it randomly select rows.
How to fix it, or can I do this in faster by using other query? Any advice would be very appreciated.
Select time,location, count(DISTINCT userID) as n_uu from( (Select --- here, I construct the example table --- ) as maintable Select maintable.* from maintable join (Select top 80 percent userID from (Select Distinct userID from maintable) newtable order by NEWID() ) newtable on maintable.userID = newtable.userID ) group by time, location
Hmmm. If I understand correctly, you want to sample within a hierarchy. So, get the user ids you want with a subquery and then join in the rest of the information:
select t.* from t join (select top 80 percent userid from (select distinct userid from t) u order by newid() ) u on t.userid = u.userid;
User contributions licensed under CC BY-SA 3.0