Random IDs in JavaScript

2

I'm generating random IDs in javascript which serve as unique message identifiers for an analytics suite.

When checking the data (more than 10MM records), there are some minor collisions for some IDs for various reasons (network retries, robots faking data etc), but there is one in particular which has an intriguing number of collisions: akizow-dsrmr3-wicjw1-3jseuy.

The collision rate for the above id is at around 0.0037% while the rate for the other id collisions is under 0.00035% (10 times less) out of a sample of 111MM records from the same day. While the other ids are varying from day to day, this one remains the same, so for a longer period the difference is likely larger than 10x.

This is how the distribution of the top ID collisions looks like enter image description here

This is the algorithm used to generate the random IDs:

function generateUUID() {
    return [
        generateUUID4(), generateUUID4(), generateUUID4(), generateUUID4()
    ].join("-");
}

function generateUUID4() {
    return Math.abs(Math.random() * 0xFFFFFFFF | 0).toString(36);
}

I reversed the algorithm and it seems like for akizow-dsrmr3-wicjw1-3jseuy the browser's Math.random() is returning the following four numbers in this order: 0.1488114111471948, 0.19426893796638328, 0.45768366415465334, 0.0499740378116197, but I don't see anything special about them. Also, from the other data I collected it seems to appear especially after a redirect/preload (e.g. google results, ad clicks etc).

So I have 3 hypotheses:

  1. There's a statistical problem with the algorithm that causes this specific collision
  2. Redirects/preloads are somehow messing with the seed of the pseudo-random generator
  3. A robot is smart enough that it fakes all the other data but for some reason is keeping the random id the same. The data comes from different user agents, IPs, countries etc.

Any idea what could cause this collision?

javascript
math
random
asked on Stack Overflow Aug 3, 2020 by icenac • edited Aug 4, 2020 by icenac

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0