Threaded Mode | Linear Mode

DavidM · 06-09-2017, 03:15 PM

A common method for obtaining a pseudo-random number in RPL applications is to use something like the following.

Method 1: RAND <num> * CEIL

...where the integer <num> represents the largest value in the defined range of possible results (1..<num>). Assuming <num> is less than 10^11, an alternative method might be something like this:

Method 2: RAND MANT 1E11 * <num> MOD 1 +

The two methods will clearly give different results, but I'm wondering if the second approach reduces the quality of the randomization in any way. My inclination is to believe that the second method should be at least as good as (and possibly better than) the first, but I know better than to rely on a gut feel for this sort of thing. Are there problems with the second approach that would weaken the randomization when compared to the first? How could I test this?

I'm aware that the second method is very slightly slower than the first in a UserRPL context. The application I'm considering this for is a combination of SysRPL and Saturn code, though. In that context, the second method is about 40% faster. So I'd like to be able to use the second method as long as it doesn't compromise the result. I'm just not sure how to go about validating its effectiveness.

pier4r · (This post was last modified: 06-09-2017 03:47 PM by pier4r.)

Just dropping some links that I remember from this forum (I had to search again).

In particular I remember than one of the many explorations of Namir focused on random generators and he had to test them, mentioning the tests. So what I was able to find:

- http://www.hpmuseum.org/forum/thread-7196.html discussion where staring from post #20/30 people talk about how to test a random generator.

- http://www.hpmuseum.org/cgi-sys/cgiwrap/...ead=189451 short discussion pointing to tests for random generators

- I cannot find the forum post of Namir about random generator tests but I believe those are in the documentation listed here: http://www.namirshammas.com/NEW/mainNEW.htm

Paging Namir (and others) for more details.

Paul Dale · (This post was last modified: 06-10-2017 11:50 PM by Paul Dale.)

Both of these will be biassed. Which one is closer to uniform will depend on <num>.

Off the top of my head, something like this would be better:

Code:

<< 1E11 -> n l 

    <<

        l n IQUOT n *         # Largest value below which the roll is fair

        DO RAND l * IP        # Roll once

        UNTIL DUP2 < END      # Check within range for fair roll

        NIP n MOD 1 +         # Clean up the stack and produce the result

    >>

>>

I've not checked this code, so there are likely errors.

I've fixed the point David noted below (IQUOT instead of IDIV).

Pauli

Joe Horn · 06-10-2017, 12:03 AM

(06-09-2017 03:15 PM)DavidM Wrote: ... I'm wondering if the second approach reduces the quality of the randomization in any way. My inclination is to believe that the second method should be at least as good as (and possibly better than) the first, but I know better than to rely on a gut feel for this sort of thing. Are there problems with the second approach that would weaken the randomization when compared to the first?

My strong gut feeling is that using MANT on random X's is a Bad Thing, because it SEEMS like it simply multiplies X by 10, but that's only if X>=0.1, which it isn't 10% of the time. And 1% of the time X<0.01, which is even worse. And 0.1% of the time X<0.001, et cetera. In effect, MANT(X) where X=RAND returns X*10^(-XPON(X)). This will not return uniformly random numbers at all.

Quote:How could I test this?

I'd suggest writing a loop on the HP 50g which turns on random pixels on the screen until you abort it by pressing a key. Write one version of the program using (RAND,RAND) scaled to the screen, to see what it should look like. Then write another version using (MANT,MANT) scaled to the screen. I'll bet a wooden nickel that you'll see a lack of uniformity quite clearly. I haven't tried it, so I might be wrong, in which case I'll have to go find a wooden nickel. Big Grin

DavidM · 06-10-2017, 01:45 AM

(06-10-2017 12:03 AM)Joe Horn Wrote: My strong gut feeling is that using MANT on random X's is a Bad Thing, because it SEEMS like it simply multiplies X by 10, but that's only if X>=0.1, which it isn't 10% of the time. And 1% of the time X<0.01, which is even worse. And 0.1% of the time X<0.001, et cetera. In effect, MANT(X) where X=RAND returns X*10^(-XPON(X)). This will not return uniformly random numbers at all.

@Joe, you've identified where my attempt to translate what I'm testing into a UserRPL equivalent failed miserably. I agree with your assessment, MANT is a bad idea due to the issues you mention.

I'll attempt to give a better/more detailed description of my actual approach:

- A call to (SysRPL) %RAN is issued so that a new RNSEED is created. The result is dropped from the stack as it won't be used; it's the internal seed that I need.
- A Saturn routine is passed a BINT parameter that represents the range of the random number desired.
- The Saturn routine makes a local copy of the 12 digits stored at RNSEED+3 (which are in decimal form) and converts them to hex form.
- The converted RNSEED is then divided by the BINT passed as a parameter; the remainder of this division becomes the zero-based form of the result.
- 1 is added to make the result 1-based.

If you've followed the above, you can see that the pitfall you identified doesn't come into play here. The decimal form of RNSEED contains leading zeros (it isn't normalized), so the magnitude of the seed is maintained appropriately.

The specific application I have in mind is for use in a "shuffle" command to be included in a library of extended list processing utilities. In this case, the range of the expected result would be limited to whatever size list is being shuffled. Realistically, probably no more than 3000. Processing larger lists on the 50g isn't very practical due to memory and performance issues. Garbage collection rears its ugly head pretty quickly when exploding/imploding lists that large.

@Pauli:

Given the more detailed description (and scope), does this change your assessment of the risk of bias?

@anyone:

My intent with this is simply to retain as much of the "quality" of RAND as I can for the purpose of randomizing a list of items that's probably no larger than 2-3 thousand items. Does using the RNSEED as I have described above create a significant bias?

Paul Dale · 06-10-2017, 05:18 AM

Yes your actual approach is still biassed. To get an unbiased output you need to deal with the issue of the number of random values (here 10[sup]12[sup]) modulo the size of the integer you want (n in my code) is not zero. This is what my code attempted to do. This largest possible random value needs to be a multiple of the output range (n) and for larger random values, the process is repeated. i.e. repeat the process while the random number is greater than \( n \lfloor{\frac{10^{12}}{n}}\rfloor \). The bias involved isn't large per sample, but repeated lots and it will build up. A shuffle will repeat it lots.

I didn't remember exactly what MANT did, Joe is correct it is horrible.

If you've not picked an algorithm for the shuffle yet, use Fisher-Yates. This would be well suited to sys-RPL implementation. Deconstruct the list onto the stack, loop n-1 times swapping the chosen value with the nth stack level.

- Pauli

pier4r · (This post was last modified: 06-10-2017 08:58 AM by pier4r.)

(06-10-2017 12:03 AM)Joe Horn Wrote: I'd suggest writing a loop on the HP 50g which turns on random pixels on the screen until you abort it by pressing a key. Write one version of the program using (RAND,RAND) scaled to the screen, to see what it should look like. Then write another version using (MANT,MANT) scaled to the screen. I'll bet a wooden nickel that you'll see a lack of uniformity quite clearly. I haven't tried it, so I might be wrong, in which case I'll have to go find a wooden nickel.

OT: What a nice idea (in general, not only for a test)

John Keith · 06-10-2017, 12:57 PM

As a side note, since you are going to the trouble of writing SysRPL/assembly code for random number generation, it occurs to me that the HP50's internal PRNG code is beginning to look a bit old-fashioned. The XORSHIFT+ algorithm seems to be fast and robust, and looks fairly easy to implement in Saturn assembly.

John

pier4r · (This post was last modified: 06-10-2017 02:08 PM by pier4r.)

(06-10-2017 12:57 PM)John Keith Wrote: HP50's internal PRNG code is beginning to look a bit old-fashioned.

While I welcome your suggestion to david that is quite skilled in sysRPL, I don't get the "old fashioned" part. Given that the 50g is not used for security (well, I could save my passwords there... nah, the keyboard is not so fast), its PRNG should be good enough for all the needed applications (math, math explorations, games, etc..) or not?

And if not, why not?

Given also the fact that one won't realistically generate a lot of random numbers in series (I guess an upper limit would be 50 thousand or 100 thousand, that should take many days to be produced).

lrdheat · 06-10-2017, 02:28 PM

Back in they day with my hp-67 (wish I still had it), I made a primitive random number generator simply by using the 10th through 12th digits of sin (X) where X was in radians, and X incremented up by 1 for each new random number generation. Seemed good for my purposes (had made a baseball game simulation). Wonder how good this method really is...?!

DavidM · 06-10-2017, 03:51 PM

(06-10-2017 05:18 AM)Paul Dale Wrote: To get an unbiased output you need to deal with the issue of the number of random values (here 10^12) modulo the size of the integer you want is not zero.

Thanks for the explanation, that helps to clarify what you meant to do in your code (yes, I saw the warning Smile

). I believe you probably meant to use IQUOT where you currently have IDIV. When I first looked at the code, I thought IDIV was a typo for IDIV2, which returns both the quotient and the remainder. So I didn't catch the full meaning the first time around.

(06-10-2017 05:18 AM)Paul Dale Wrote: If you've not picked an algorithm for the shuffle yet, use Fisher-Yates. This would be well suited to sys-RPL implementation. Deconstruct the list onto the stack, loop n-1 times swapping the chosen value with the nth stack level.

Fisher-Yates (more specifically the "modern algorithm" version) is exactly the approach I had already used, and my previous implementation simply used the equivalent of "RAND n * IP" to select the random swap index. My revision was an attempt to speed up the process by keeping the random index selection in the integer domain and doing the math involved in Saturn code instead of SysRPL.

The extra range check for the seed will slow things down, but obviously not for every iteration. So there's still likely to be a performance boost, just not as much as I had originally hoped.

(06-10-2017 12:57 PM)John Keith Wrote: As a side note, since you are going to the trouble of writing SysRPL/assembly code for random number generation, it occurs to me that the HP50's internal PRNG code is beginning to look a bit old-fashioned. The XORSHIFT+ algorithm seems to be fast and robust, and looks fairly easy to implement in Saturn assembly.

At one point I played around with an alternative method that used a combination of TIMER bits with the current CRC, and it was the fastest thing I tested. While the results appeared nicely shuffled, I realized that I lacked the knowledge and tools to validate them. Then there's also the issue of non-repeatability... it was a good exercise, but not something I'd pursue. I ultimately decided to stick with RAND for the very reason that I didn't want to reinvent the PRNG wheel (and fall into all the traps inherent with that). XORSHIFT+ does look interesting, but I'll put that on my list of things to revisit later.

DavidM · (This post was last modified: 06-10-2017 05:21 PM by DavidM.)

(06-10-2017 12:03 AM)Joe Horn Wrote: I'd suggest writing a loop on the HP 50g which turns on random pixels on the screen until you abort it by pressing a key. Write one version of the program using (RAND,RAND) scaled to the screen, to see what it should look like. Then write another version using (MANT,MANT) scaled to the screen. I'll bet a wooden nickel that you'll see a lack of uniformity quite clearly. I haven't tried it, so I might be wrong, in which case I'll have to go find a wooden nickel.

It's interesting that you brought this up. I did this exact thing several years ago when I was first playing around with using the CRC as a source of random numbers for some Saturn code. The code runs too fast for me to actually see any patterns, but the result is somewhat hypnotic. I also used the CRC to randomly turn on/off the annunciators, but I do remember having to slow them down to even see what was going on. I might have to blow the dust off of this for some more experiments!

Note: this app is only compatible with 49g+/50g systems.

Paul Dale · 06-10-2017, 11:49 PM

(06-10-2017 03:51 PM)DavidM Wrote: I believe you probably meant to use IQUOT where you currently have IDIV. When I first looked at the code, I thought IDIV was a typo for IDIV2, which returns both the quotient and the remainder.

I'll fix this. I've not used RPL much for quite a few years now...

Quote:The extra range check for the seed will slow things down, but obviously not for every iteration.

It will almost always not require a second iteration, so the cost is the comparison & the computation of the bound.

Pauli

Paul Dale · 06-11-2017, 12:00 AM

(06-10-2017 02:28 PM)lrdheat Wrote: Back in they day with my hp-67 (wish I still had it), I made a primitive random number generator simply by using the 10th through 12th digits of sin (X) where X was in radians, and X incremented up by 1 for each new random number generation.

Taking the middle order digits of \( x_n = frac\left(e^{2 x_{n-1} + 1}\right) \) starting with \( x_0 = frac\left(\pi\right) \) produces a fairly well distributed random number, at least in binary. This passes the dieharder tests and only fails a couple of the TestU01 suite. It is a good example of a simple function that appears to produce random output, yet is completely and obviously deterministic. It is also straightforward to show that rounding is not coming into play.

Pauli

ttw · 06-11-2017, 12:21 AM

One way to improve problems with pRNGs is to use very long cycle with a large state. The Fischer-Yates shuffle has the additional problem in that one needs to sample from various values of n. If shuffling a deck of cards, the first choice needs to sample from 51 items then from 50 items then from 49, 48,...3,2 items. There are four obvious choices.

First, one could use a big integer generator (Lehmer, shift-register, Fibonacci, etc.) which has a cycle length divisible by LCM(2*3*4*...50*51). This uses lots of big arithmetic.

Second, one could use 51 different RNGs with cycles divisible by a large multiple of the index of the generator. That is, the first is a multiple of a large power of 51, the second divisible by a large power of 50, and the last divisible by a large power of 2. The cycle length of each generator should be a multiple of a large different prime (perhaps 53,59,61,....) to keep each generator (sort of) independent of the others. This is probably the nicest in theory.

Third, one could use a very long cycle generator. The error from lack divisibility does decrease with cycle length. This is probably the most practical solution.

Fourth, one could use a simple RNG and not worry about the error. This I probably the most common solution.

Paul Dale · 06-11-2017, 12:38 AM

Fifth, one can detect if the RNG output is in the bias region and get a new RNG output that isn't Smile

Pauli

DavidM · 06-11-2017, 01:11 AM

(06-10-2017 11:49 PM)Paul Dale Wrote:
(06-10-2017 03:51 PM)DavidM Wrote: I believe you probably meant to use IQUOT where you currently have IDIV...

I'll fix this. I've not used RPL much for quite a few years now...

No worries. I know that RPL isn't your normal wheelhouse. I'm just glad that you provided your insight into the topic at hand!

(06-10-2017 11:49 PM)Paul Dale Wrote:
Quote:The extra range check for the seed will slow things down, but obviously not for every iteration.

It will almost always not require a second iteration, so the cost is the comparison & the computation of the bound.

My first stab at the new range check version of the shuffle seems to be averaging a 1000-element list in about 4.5 seconds. That's still better than the previous 6-second version, so it hasn't been a wasted effort. And I've yet to clean it up, so there might still be a little performance boost available.

DavidM · 06-11-2017, 01:39 AM

(06-11-2017 12:21 AM)ttw Wrote: Third, one could use a very long cycle generator. The error from lack divisibility does decrease with cycle length. This is probably the most practical solution.

What are your thoughts on the XORSHIFT+ PRNG that John brought up earlier? Does this have a long enough cycle for a practical use on calculator-based apps?

(06-11-2017 12:21 AM)ttw Wrote: Fourth, one could use a simple RNG and not worry about the error. This I probably the most common solution.

I have no data to back this up, but I'd be willing to bet that the "RAND n * CEIL" approach is used far more frequently than any other in calculator programs. Has this actually caused problems for anyone? I suspect we'll never really know.

Paul Dale · 06-11-2017, 05:04 AM

(06-11-2017 01:11 AM)DavidM Wrote: I know that RPL isn't your normal wheelhouse.

It used to be ... back when the 28S and 48SX were new. Almost thirty years ago now!

Pauli

Paul Dale · 06-11-2017, 05:19 AM

(06-11-2017 01:39 AM)DavidM Wrote: Does this have a long enough cycle for a practical use on calculator-based apps?

I think that pretty much any RNG has a long enough cycle. How many random numbers is a calculator likely to need? A few hundred? Thousand? Million? Most modern RNG have periods many orders of magnitude beyond what's required.

It is more important to provide a good RNG that generates uniform numbers that pass the various statistical test suites. This doesn't mean the results are random of course, but it is indicative that they aren't needlessly biassed.

Quote:I have no data to back this up, but I'd be willing to bet that the "RAND n * CEIL" approach is used far more frequently than any other in calculator programs.

I certainly wouldn't be accepting this bet Smile

Quote:Has this actually caused problems for anyone? I suspect we'll never really know.

It is impossible to know. Quite a few of the early RNGs were likely to have been poorly distributed and non-uniform. I'm guilty of this when I've had to reduce the step count in programs.

I've gone back and looked at the 34S RNG again. It looks fine in integer mode and double precision real mode. I suspect it will show some bias in single precision real mode.

Pauli