How to generate a predictable shuffling of a sequence without generating the whole sequence in advance?

5

The following python code describes exactly what I want to achieve for a sequence of arbitrary size (population):

import random
fixed_seed = 1 #generate the same sequence every time with a fixed seed
population = 1000
sample_count = 5 #demonstration number
num_retries = 3  #just enough to show the repeatable behaviour
for trynum in xrange(num_retries):
    #generate the fresh/ordered sequence (0->population)...
    seq = range(population)
    #seed the random number generator the same way every time...
    random.seed(fixed_seed)
    #shuffle the sequence...
    random.shuffle(seq)
    #display results for this try...
    sample_sequence = [str(x) for x in seq[:sample_count]]
    print "try %s: %s..." % (trynum + 1, ", ".join(sample_sequence))
#Sample output...
#try 1: 995, 721, 62, 326, 541...
#try 2: 995, 721, 62, 326, 541...
#try 3: 995, 721, 62, 326, 541...

The problem with that method is that it requires generating the entire sequence in memory first. This can be a problem for huge populations.

Note that a potentially big advantage of this method is that you can pick off any array position at any time.

Now - If the problem at hand happens to let you set the population size to a power of two (minus 1), a Linear Feedback Shift Register can be used to get the predictable random sequence. LFSRs are neat, and explained pretty well in the wikipedia article on them.

The python code below demonstrates this (and I did a pile of uniqueness testing to ensure it works as advertised). See the wikipedia article again for an explanation of how the code works (Galois configuration).

TAP_MASKS = { #only one needed, but I included 3 to make the code more useful
    10: 0x00000240, #taps at 10, 7
    16: 0x0000B400, #taps at 16, 14, 13, 11
    32: 0xE0000200, #taps at 32, 31, 30, 10
}

def MaxLengthLFSR(seed, register_length):
    "Gets next value from seed in max-length LFSR using Galois configuration."
    lsb = seed & 1
    next_val = seed >> 1
    if lsb == 1:
        mask = TAP_MASKS[register_length]
        next_val ^= mask
    return next_val

reglen = 16  #number of bits in register
population = (2**reglen) - 1 #not used, just showing it
fixed_seed = 1   #seed == startval in this case (could randomize in population)
sample_count = 5 #demonstration number
num_retries = 3  #just enough to show the repeatable behaviour
for trynum in xrange(num_retries):
    next_val = fixed_seed
    seq = [fixed_seed, ]
    for x in xrange(sample_count - 1):
        next_val = MaxLengthLFSR(next_val, reglen)
        seq.append(next_val)
    seq = [str(x) for x in seq]
    print "try %s: %s..." % (trynum + 1, ", ".join(seq))
#Sample output...
#try 1: 1, 46080, 23040, 11520, 5760...
#try 2: 1, 46080, 23040, 11520, 5760...
#try 3: 1, 46080, 23040, 11520, 5760...

This is nice because you can have a HUGE population and easily calculate a repeatable non-repeating random number sequence without using a big chunk of memory.

The drawbacks are a) that it is limited to a "shuffling" sequences of size (2**N - 1), and b) that you cannot determine what the value of a particular position in the random sequence is at an arbitrary location. You need to know the value at a particular point and walk the sequence from there.

The latter (b) is mostly ok since most of the time you'll generate the sequence in order, so you just need to remember the last value. The power of 2 limitation (a) is kind of a deal killer,though... depending on the application.

How do you achieve maximum-length-LFSR-like non-repeating results for arbitrary sequence lengths?

As a bonus, it would be nice to have a solution where you are able to know the number at a given sequence position without needing to walk through the sequence to that position.


Note: if you want a good starting set of LFSR tap locations for maximum-length LFSRs (ones that generate the whole register population without repeating once), this link is quite good and has a huge number of tap locations per register size (up to 32 bits, anyway).

Also please note that I've seen many questions closely related to my question and shuffling/LFSR, but none of them exactly relate to what I'm after (predictable shuffle of arbitrary size linear sequence). Or at least as far as I have been able to understand them, anyway.

I've recently been looking into Linear Congruential Generators, which seem promising, but I haven't been able to get them to work yet. Rather than sitting on the question further I'll ask it, and post the answer if I figure it out and they work.

python
algorithm
random
lcg
asked on Stack Overflow Oct 11, 2010 by Russ

2 Answers

4

I've actually written about this before: Secure Permutations with Block Ciphers. In a nutshell:

  1. Yes, you can use an LFSR to generate permutations with a length that's a power of 2. You can also use any block cipher. With a block cipher, you can also find the element at index n, or the index for element n.
  2. To generate a permutation with arbitrary length l, create one with the smallest power of 2 length greater than l. When you want to find the nth permutation element, apply the permutation function, and if it generates a number outside the desired range, apply it again; repeat until the number is in the acceptable range.

The number of iterations required for step 2 will average no more than 2; the worst case is high, but extremely unlikely to occur.

answered on Stack Overflow Oct 11, 2010 by Nick Johnson
1

First off, note that this isn't a random sequence. It only generates a single, fixed, repeating sequence, and the seed chooses where in the sequence you start. That's the same as all PRNGs, of course, but normally the cycle of a PRNG is much larger than 16-bit or 32-bit. The way you've described using it, the cycle length is equal to the number of items you're iterating over, so all it'll do is take a single "shuffled" order and change where you start. The "seed" value is more like a starting index than a seed.

It's not the most satisfactory answer, but it's probably practical: you can pad the length to the next power of two, and skip any indexes above the actual maximum. Thus, if you have 5000 items, generate a sequence over 8192 items, and discard any results between [5000,8191]. The overhead from this sounds ugly, but in perspective it's not that bad: since this can at most double the length of the list, on average you'll have to discard one out of two results, so the worst-case average overhead is doubling the amount of work.

The following code demonstrates this (as well as showing a cleaner way to implement it). The third parameter to MaxLengthLFSR, if given, is the actual maximum value. You'd probably want to fill in TAP_MASKS for a larger number of sizes and then choose the smallest register size that fits the requested sequence length; here we just use the one requested, which works but will cause much more overhead if the length of the sequence is much larger than it needs to be.

TAP_MASKS = { # only one needed, but I included 3 to make the code more useful
    10: 0x00000240, # taps at 10, 7
    16: 0x0000B400, # taps at 16, 14, 13, 11
    32: 0xE0000200, # taps at 32, 31, 30, 10
}

def MaxLengthLFSR(next_val, reglen, max_value=None):
    """Iterate values from seed in max-length LFSR using Galois configuration."""
    # Ensure that max_value isn't 0, or we'll infinitely loop without yielding any values.
    if max_value is not None:
        assert max_value > 0

    while True:
        if max_value is None or next_val < max_value:
            yield next_val

        lsb = next_val & 1
        next_val = next_val >> 1
        if lsb == 1:
            mask = TAP_MASKS[reglen]
            next_val ^= mask

sample_count = 5 # demonstration number
num_retries = 3  # just enough to show the repeatable behaviour
for trynum in xrange(num_retries):
    it = MaxLengthLFSR(1, 16, 2000)
    seq = []
    for x in xrange(sample_count):
        seq.append(next(it))
    seq = [str(x) for x in seq]
    print "try %s: %s..." % (trynum + 1, ", ".join(seq))
answered on Stack Overflow Oct 11, 2010 by Glenn Maynard

User contributions licensed under CC BY-SA 3.0