Random but Evenly Distributed Sets of Numbers

April 17, 2010 - 7 minute read -

Let's say one is a computer programmer and let's say one's wife (or roommate; or significant other) does social science research (a totally hypothetical scenario of course). When doing social science research one needs to create randomized groups of participants.

So in a group you might be testing a variable X and you need to divide the group in half to test that variable with a number of control subjects. In this scenario, one needs to create groups A and B (or 1 and 2) such that there are the same number of A and B in a given set.

Now normally when generating random numbers you can just generate an arbitrary set of random numbers. But if you need a set of 10 numbers with 5 1's and 5 2's then that's a different problem. You could of course generate a set and test to make sure it's evenly distributed and then throw it away if it doesn't meet this constraint. That of course is an inefficient way of generating your sets because you would likely have to throw a lot of them away.

The solution to this conundrum is to use random numbers not to create your set, but to shuffle it. Luckily other smart people have proven ways to do this already. The Fisher-Yates shuffle is the technique that I used.

from random import randrange
items = [1,1,1,1,1,2,2,2,2,2]</p>
<p># Fisher-Yates shuffle, Durstenfeld in-place implementation
n = len(items)
while n > 1:
    k = randrange(n)  # 0..n-1
    n = n - 1
    items[k], items[n] = items[n], items[k]</p>
<p>print items    # e.g. [2, 1, 2, 1, 1, 2, 1, 2, 2, 1]

Basically the Fisher-Yates shuffle shuffle picks a random item and puts it at the end of the list by swapping the end with the randomly selected item. It then continues with the 'unpicked' numbers and puts them at the end of the unpicked set until it reaches the beginning of the list. Once it's traveled through, you have a randomly sorted set.

That's fine if you're a programmer and want to run python from the command line and change your items set if you need a different size, etc. But if you want an end-user who's not a programmer to use it, you better come up with something a bit configurable. So I need to allow for some variation to the sets that will be generated and shuffled.

Based on this information I created a simple class that would allow you to specify 3 properties that would define the random sets to generate: blocks, the number of individual sets; size, the number of numbers in each block; and groups, the number of variants within each block. So now you could generate a set of 10 blocks, with 12 numbers in each containing 1, 2, and 3 evenly distributed (i.e. 4 of each number).

from random import randrange</p>
<p>class Shuffler(object):
    def __init__(self, blocks, blockSize, groups):
        self.blocks = blocks
        self._groups = groups
        self._blockSize = blockSize
        self.deck = self.make_deck()</p>
<p>    @property
    def blockSize(self):
        return self._blockSize</p>
<p>    # BlockSize setter also initializes deck
    @blockSize.setter
    def blockSize(self, value):
        self._blockSize = value
        self.deck = self.make_deck()</p>
<p>    @property
    def groups(self):
        return self._groups</p>
<p>    # Groups setter also initializes deck
    @groups.setter
    def groups(self, value):
        self._groups = value
        self.deck = self.make_deck()</p>
<p>    def is_valid_group_size(self):
        return self.blockSize % self.groups == 0</p>
<p>    # Fisher-Yates shuffle, Durstenfeld in-place implementation
    def shuffle(self):
        items = self.deck[:]    # copy deck for in-place shuffle
        n = len(items)
        while n > 1:
            k = randrange(n)  # 0..n-1
            n = n - 1
            items[k], items[n] = items[n], items[k]
        return items</p>
<p>    def make_deck(self):
        result = []
        for i in range(self.blockSize):
            result.append(i % self.groups + 1)
        return result

Finally to wrap the shuffler in a nice command-line interface I created a simple specialization of the Shuffler class that would take command line arguments to generate blocks of specific properties.

#!/usr/bin/env python
"""
Usage: %(program)s [options] ... [-b <number>] [-s <number>] [-g <number>]
Options:
  -h, --help    This help message
  -b, --blocks  The number of blocks (default: 20)
  -s, --size    The number of participants in each group (default: 10)
  -g, --groups  The number of groups in each block (default: 2)
"""</p>
<p>import random, sys, getopt
import shuffle</p>
<p>program = sys.argv[0]</p>
<p>class CliShuffler(shuffle.Shuffler):
    def __init__(self):
        shuffle.Shuffler.__init__(self, 20, 10, 2)</p>
<p>    def print_results(self):
        for i in  range(self.blocks):
            print ' '.join([str(i) for i in self.shuffle()])</p>
<p>    def usage(self):
        print >> sys.stderr, __doc__ % globals()
        sys.exit()</p>
<p>    def run(self, argv):
        try:
            opts, args = getopt.getopt(sys.argv[1:], "hg:s:b:", ["help", "groups=", "size=", "blocks="])
        except getopt.GetoptError, err:
            # print help information and exit:
            print str(err) # will print something like "option -a not recognized"
            self.usage()
            sys.exit(2)</p>
<p>        for opt, arg in opts:
            if opt in ("-b", "--blocks"):
                try:
                    self.blocks = int(arg)
                except ValueError:
                    self.usage()
            elif opt in ("-s", "--size"):
                try:
                    self.blockSize = int(arg)
                except ValueError:
                    self.usage()
            elif opt in ("-g", "--groups"):
                try:
                    self.groups = int(arg)
                except ValueError:
                    self.usage()
            elif opt in ("-h", "--help"):
                self.usage()
            else:
                assert False, "unhandled option"</p>
<p>        if not self.is_valid_group_size():
            print("Block Size must be evenly divisible by groups to get an even grouping.")
            self.usage()
            sys.exit()</p>
<p>        self.print_results()</p>
<p>if __name__ == "__main__":
    shuffler = CliShuffler()
    shuffler.run(sys.argv[1:])

Happy shuffling...