Shuffle, Part 2

Thursday, 6 December 2007 03:49 PM

Onward mit de shuffling. Eric kindly pointed out the flaws in my original algorithms, altho I am somewhat mollified that my some of instincts about some of the algorithms were (unusually) somewhat correct.

Jeff used the occasion of the post to remind folks that the Random class does not produce real randomization; it's a pseudo-random generator that will produce the same "random" sequence, given the same seed. (Eric has made the same point a number of times, including in his interesting exchange with Greg Stemp.) I knew this, actually.

I feel like I need to point out that the discussion about randomizing is, haha, randomizing. It was not the original focus of my interest in shuffling. Eric effectively nailed the issue when he discussed the characteristics of different approaches to reordering arrays. It's the shuffly-ness of the shuffling that I was after, which is why I am more interested in Eric's comment on bias than I am in the discussion of randomness.

As an aside on the aside, an issue that has not been addressed, IMO, is the issue of random enough. Because I was screwing around with a card game, I used the example of a deck of cards. This tended to focus discussion on the intricacy of and need for super-duper-random shuffling for things like online poker sites. And everyone knows (even me) that true randomness is critical to decent cryptography. (Bruce Schneier: "As long as you don't write your own algorithm, secure encryption is easy.") It would perhaps have been wise of me to declare in advance my parameters for "good enough" randomness. Assume, for example, that I were writing a Solitaire game (I'm not) that I would only play myself. No money, no big stakes, nothing like that. Maybe I'd even want to be able to replay specific games. For that purpose, I think -- and I'm sure I'll hear about it if I'm wrong -- the randomness of the randomness is probably not that important, and certainly not critical, given a large enough pool of seeds to choose from randomly (ahem). I mean, if the task were not shuffling but instead to assign numbers to kindergartners to determine the order in which they go to lunch, I assume we would not be quite as focused on how good the randomizer is.

Back to algorithms. I used the Random class, which ok, fine, is not ideal. Given orders to find a crypto-quality randomizer, I found the RNGCryptoServiceProvider class, whose GetBytes method purports to fill a byte array "with a cryptographically strong sequence of random values."

Eric (among others) noted that the best likely performance[1] involves sorting after assigning a (true) random number to each element. Eric specifically says to use a real number between 0.0 and 1.0. (Jeff suggests GUIDs, which sure look random, but at least one commenter on his entry notes that GUID generation might not be random enough. See previous point.)

I'm sure that Eric has a very specific reason for recommending a real number between 0.0 and 1.0. Here again my lack of CS depth fails me, because it's not obvious to me why this data type and that numeric range is significant. I'm going to assume that it has to do with the distribution of bits in the byte representation of real numbers, but that's as far as I'll speculate. Perhaps Mr. Lippert can be coaxed into further explanation.

Ok, so GetBytes produces a byte array, which I want to use as the basis for sorting my "deck" of cards. I can use the BitConverter class to turn a byte array into various types of numbers. Which to use? Golly, so many choices. Likely-looking options include converting an 8-byte array into a double or a 4-byte array into an Int32. I went with double. (I also tested Int32s on the theory that it required only half the number of random bytes to be generated, but didn't find a lot of speed difference, so the choice would have more to do with the distribution of a sort based on different types rather than on the speed of generating the actual typed numbers.)

For purposes of fooling around with this, I added a SortOrder property (actually a series of them -- SortOrderInt32, SortOrderInt64, SortOrderDouble, etc.) to my Card class to store a number upon which the sort operation can be performed. And a IComparable.CompareTo method. (It feels wrong to me to add the sort order property to the Card class, but we'll leave it for now.) I'm assuming -- correct me if I'm wrong -- that the Sort method of the System.Collections.Generic.List class won't sort byte arrays.

Here's the code:

Protected Function Shuffle_CryptoSort(ByVal deck As List(Of Card)) _
        As List(Of Card)
    Dim randomNumber(8) As Byte
    Dim Gen As New System.Security.Cryptography.RNGCryptoServiceProvider()
    For i As Integer = 0 To deck.Count - 1
        Gen.GetBytes(randomNumber)
        deck(i).SortOrderDouble = BitConverter.ToDouble(randomNumber, 0)
    Next
    deck.Sort()
    Return deck
End Function

The only notable thing about this particular variation is that it's substantially slower than the other approaches, by, like, a factor of 8 to 10. Some investigation suggests that the biggest hit is in generating the random bytes and converting them to a double. Sorting the array certainly costs, but it's about 20% of the cost of the number generation, assuming I get the math right. Mind you, we're talking fractions of a second here, total.

Ok, let's see how this sits. I figure I must be pretty close to a solution for my original problem, which I'll remind myself is not the need to produce perfect shuffling, just good enough. PGS -- Pretty Good Shuffling. I imagine I'll hear about it if not. :-)

[1] For a good explanation of performance and the "traditional hierarchy of algorithms", have a look at O(n log n) by Tommy McGuire. Good explanation with only a little math. :-)

[categories] aspnet

comment [2] | link

mike's web log