50g: an interesting RAND anomaly
|
03-17-2018, 05:02 PM
(This post was last modified: 03-17-2018 05:05 PM by DavidM.)
Post: #1
|
|||
|
|||
50g: an interesting RAND anomaly
Consider the following RPL program, which produces a list of 200 pseudo-random integers in the range 0..3:
Code: \<< The method used may seem a bit strange, but it's the simplest stripped-down UserRPL equivalent to the actual algorithm I'm using that I could put together. The original is actually a Saturn code object, so it uses integer math instead of floating point for performance reasons. The original also does some safety checks to make sure that the seed is in an acceptable range, but that wasn't needed for this demonstration. R→I is the only optional command here, and is simply used to make the final list easier to view. In particular, what initially caused me to take notice of the result is that it has an interesting property: each possible number occurs exactly 50 times. I thought that was odd enough, but further investigation showed something else: running the above program repeatedly will always give exactly the same result list, provided there are no intermediate calls to RAND made. Calling RAND in between invocations will cause the result list to change, but it still repeats if the program is called repeatedly (and it still has a perfect distribution). Changing the loop count to any number that is a multiple of 200 has shown similar results, so it appears that a cycle of 600 RAND invocations is pertinent. I also noted that changing the MOD value to 2 or 8 had similar results, but no other power of 2 I tried did. This represents a much smaller cycle than I had expected for RAND used in this manner, though it seems to be specific to these parameters. I haven't seen similar results outside of those mentioned (yet ). |
|||
03-17-2018, 05:43 PM
Post: #2
|
|||
|
|||
RE: 50g: an interesting RAND anomaly
The same happens using the HP-71B ancestor.
Using the Basic code: 0 ! --- RND test 20 DIM A(200) 30 FOR I=1 TO 200 40 A(I)=MOD(IP(RND*1.E+12),4) 50 X=RND @ X=RND 60 NEXT I 70 N=0 @ GOSUB 130 80 N=1 @ GOSUB 130 90 N=2 @ GOSUB 130 100 N=3 @ GOSUB 130 110 FOR I=1 TO 200 @ PRINT A(I); @ NEXT I 120 END 130 X=0 140 FOR I=1 TO 200 150 IF A(I)=N THEN X=X+1 160 NEXT I 170 PRINT X;"occurrences of";N 180 RETURN and three successive runs: 50 occurrences of 0 50 occurrences of 1 50 occurrences of 2 50 occurrences of 3 3 3 1 2 2 2 1 2 2 1 3 0 1 1 1 1 0 2 0 2 2 0 2 2 1 2 2 1 1 1 1 0 3 1 2 1 3 2 1 2 0 0 3 2 3 2 3 1 0 2 2 3 2 2 3 1 2 3 2 1 3 0 0 1 2 3 0 0 3 2 3 3 1 2 0 0 1 3 0 1 2 1 1 1 1 3 2 0 2 2 0 2 0 3 1 0 3 3 1 1 1 1 3 0 0 0 3 0 0 3 1 2 3 3 3 3 2 0 2 0 0 2 0 0 3 0 0 3 3 3 3 2 1 3 0 3 1 0 3 0 2 2 1 0 1 0 1 3 2 0 0 1 0 0 1 3 0 1 0 3 1 2 2 3 0 1 2 2 1 0 1 1 3 0 2 2 3 1 2 3 0 3 3 3 3 1 0 2 0 0 2 0 2 1 3 2 1 1 3 3 50 occurrences of 0 50 occurrences of 1 50 occurrences of 2 50 occurrences of 3 3 3 1 2 2 2 1 2 2 1 3 0 1 1 1 1 0 2 0 2 2 0 2 2 1 2 2 1 1 1 1 0 3 1 2 1 3 2 1 2 0 0 3 2 3 2 3 1 0 2 2 3 2 2 3 1 2 3 2 1 3 0 0 1 2 3 0 0 3 2 3 3 1 2 0 0 1 3 0 1 2 1 1 1 1 3 2 0 2 2 0 2 0 3 1 0 3 3 1 1 1 1 3 0 0 0 3 0 0 3 1 2 3 3 3 3 2 0 2 0 0 2 0 0 3 0 0 3 3 3 3 2 1 3 0 3 1 0 3 0 2 2 1 0 1 0 1 3 2 0 0 1 0 0 1 3 0 1 0 3 1 2 2 3 0 1 2 2 1 0 1 1 3 0 2 2 3 1 2 3 0 3 3 3 3 1 0 2 0 0 2 0 2 1 3 2 1 1 3 3 50 occurrences of 0 50 occurrences of 1 50 occurrences of 2 50 occurrences of 3 3 3 1 2 2 2 1 2 2 1 3 0 1 1 1 1 0 2 0 2 2 0 2 2 1 2 2 1 1 1 1 0 3 1 2 1 3 2 1 2 0 0 3 2 3 2 3 1 0 2 2 3 2 2 3 1 2 3 2 1 3 0 0 1 2 3 0 0 3 2 3 3 1 2 0 0 1 3 0 1 2 1 1 1 1 3 2 0 2 2 0 2 0 3 1 0 3 3 1 1 1 1 3 0 0 0 3 0 0 3 1 2 3 3 3 3 2 0 2 0 0 2 0 0 3 0 0 3 3 3 3 2 1 3 0 3 1 0 3 0 2 2 1 0 1 0 1 3 2 0 0 1 0 0 1 3 0 1 0 3 1 2 2 3 0 1 2 2 1 0 1 1 3 0 2 2 3 1 2 3 0 3 3 3 3 1 0 2 0 0 2 0 2 1 3 2 1 1 3 3 J-F |
|||
03-17-2018, 06:13 PM
Post: #3
|
|||
|
|||
RE: 50g: an interesting RAND anomaly
(03-17-2018 05:43 PM)J-F Garnier Wrote: The same happens using the HP-71B ancestor. Thanks for that confirmation! I suspect all of the HP calculators that use the same RAND algorithm will show these results. Admittedly, MOD 2/4/8 is simply using the last 1-3 bits of the seed, but I would have still thought the period would have been bigger. I'm wondering if performing a right-shift on the seed of 1-2 nibbles prior to the MOD would give better results. The range of input values for my application would leave plenty of overhead for that. But I have no idea if that would simply change the parameters with no real benefit. Does anyone know if there's a documented issue with simply using a few of the least significant bits of the RAND seed? Or have I simply stumbled upon a special case that resonates with the RAND seed generator? |
|||
03-17-2018, 06:41 PM
Post: #4
|
|||
|
|||
RE: 50g: an interesting RAND anomaly
I'm no expert in RNG but as far as I understand, the properties of the generated number don't necessarily apply to individual bits.
By properties I mean bias, distribution, and recurrence period. Since an algorithm manipulates each bit differently, you are not supposed to isolate individual bits. It seems you found that the last 3 bits (hard to define bits on a decimal number, but bear with me) have no bias and good uniform distribution (you got a nice 50/50/50/50), but have a short period, and you even happened to find out what that period is. Using a different nibble will give you different properties, not necessarily any better. |
|||
03-17-2018, 07:38 PM
Post: #5
|
|||
|
|||
RE: 50g: an interesting RAND anomaly
(03-17-2018 06:41 PM)Claudio L. Wrote: ...It seems you found that the last 3 bits (hard to define bits on a decimal number, but bear with me) have no bias and good uniform distribution (you got a nice 50/50/50/50), but have a short period, and you even happened to find out what that period is. I actually convert the native BCD seed to binary form before taking the MOD in the Saturn version, so it does actually end up being the last few bits in this case. And I'm obviously no expert in PRNGs, either, but I believe your assessment is spot on. |
|||
03-17-2018, 09:17 PM
Post: #6
|
|||
|
|||
RE: 50g: an interesting RAND anomaly
It is well known that the least significant bits of the output of a PRNG have a shorter period than the more significant bits. There was a lo-o-o-o-ng discussion of this issue on comp.sys.hp48 but I can't find the link at the moment.
Is there a way to sample the most significant bits or would that not fit your application? |
|||
03-17-2018, 10:23 PM
Post: #7
|
|||
|
|||
RE: 50g: an interesting RAND anomaly
(03-17-2018 09:17 PM)John Keith Wrote: It is well known that the least significant bits of the output of a PRNG have a shorter period than the more significant bits. There was a lo-o-o-o-ng discussion of this issue on comp.sys.hp48 but I can't find the link at the moment. It wouldn't be hard to do in this case, but knowing that zeros are more likely to creep into that side of the seed, I'd prefer to simply shift the seed to the right 2-3 nibbles. I tested this earlier with some success, but while that was successful at improving the results of this specific scenario, I have to wonder if it's simply trading the current problem for a different one elsewhere. Any insight is appreciated! |
|||
03-17-2018, 10:43 PM
Post: #8
|
|||
|
|||
RE: 50g: an interesting RAND anomaly
Intersting find! I wonder how frequent is that one ends up in problems that return such quirks, especially on a calculator.
I guess I got one discussion: https://groups.google.com/d/topic/comp.s...discussion a second https://groups.google.com/d/topic/comp.s...discussion a third, less interesting https://groups.google.com/forum/#!search...ehG6DozT8J fourth ultra long, a bit chaotic. https://groups.google.com/d/topic/comp.s...discussion and then J Meyers alone https://groups.google.com/forum/#!search..._1PsnteVMJ another long one https://groups.google.com/forum/#!search...QY47HsbnkJ and then the algorithm that should be the same since the 15C https://groups.google.com/forum/#!search...Y8QQe8XDYJ Also interesting to see comp.sys.hp48 active. Thanks to the new forum here, I forget that of the many existing communities in the past, at least 3 are still active (and hopefully with all the posts). The hp official forum, that I frequented long ago and from time to time someone posted interesting questions. Comp.sys.hp48, that I should check more and this forum. I wonder how many there were -at least in English - when there were more calculator users. Wikis are great, Contribute :) |
|||
03-18-2018, 02:03 AM
Post: #9
|
|||
|
|||
RE: 50g: an interesting RAND anomaly
For Lehmer type generator in base 2 (the most usual and useful case) there are several interesting properties that explain some of this. For a full-length generator (like X(j+1)=41309*X(j)+1 (or any other odd number) modulo 65535, one has that the last bit (1's) alternating between 1 and 0; the second (2's ) bit having a cycle of 4, the third (8's) a cycle of 8, etc. Thus the entire system has a cycle of 2^16. This may be bad (if just using low bits) but the analysis shows one how to do well. One can take a large (256 bit or longer modulus) and use the leading bits and throw away the lower bits. For example one could imitate a 16 bit generator using a 256 bit generator and discarding the last 240 bits (a bit wasteful or maybe 240 bits wasteful) and the resulting 32 bits will have a cycle of 2^256 with the lowest bit having a cycle of 2^241 instead of 2^0.
I'm guessing that powers of other primes behave similarly. It is known (see Knuth vol2) which is the product of the cycles of the various P^k bit generators that make up the sequence. |
|||
03-18-2018, 03:44 AM
Post: #10
|
|||
|
|||
RE: 50g: an interesting RAND anomaly
(03-17-2018 05:02 PM)DavidM Wrote: Consider the following RPL program, which produces a list of 200 pseudo-random integers in the range 0..3: The RNG in question is documented here: https://groups.google.com/forum/m/#!msg/...tzMtZhlGoJ It multiplies the seed by a large prime and takes the redult modulo 1e15, and then cuts off the least significant three digits from that, divides it by 1e15, and returns that. Note there's no additive constant. In multiplying the seed by the prime, the least significant n digits of the seed are completely determined by the least significant n digits of the previous seed, and the least significant n digits of the prime factor. Looking at the least significant 2 bits of the seed-without-its-least-significant-three-digits thus gives a maximum cycle of 4000, and in practice it'll be less because not all of those 4000 values are covered by any combination of starting seed and prime. TL;DR: if you want a random integer between 0 and 3 inclusive, do RAND 4 * IP for maximum cycle length. Stay away from the least significant digits! |
|||
03-18-2018, 04:08 PM
Post: #11
|
|||
|
|||
RE: 50g: an interesting RAND anomaly
(03-18-2018 03:44 AM)Thomas Okken Wrote: ... if you want a random integer between 0 and 3 inclusive, do RAND 4 * IP for maximum cycle length. Stay away from the least significant digits! The UserRPL sample was just intended to show the issue to a larger audience as opposed to being the final application. The application in question is actually in the form of a Saturn code object, whose purpose is to randomize a list. Performance is a key concern, so I'd like to stick with integer operations for this if possible. One alternative I'm considering: - Execute the internal RAND function - Convert the resulting seed from BCD to HEX - Split the 12-digit HEX seed into two halves and XOR them - MOD the result by the needed value (which will always be well within the limits of a 6-digit hex value) There would still be a bias toward the least significant digits, but in this case they would have also been XOR'd with the upper half of the seed. Does this present any obvious concerns? Initial testing from a speed standpoint is good, but that of course says nothing about efficacy. Note that the standard here is not DieHard(er) or NIST certification. I'm just looking for something that would be comparable in quality to "<num> RAND * IP 1 +". I've yet to find a reasonable way to test the results, though. |
|||
03-18-2018, 10:52 PM
Post: #12
|
|||
|
|||
RE: 50g: an interesting RAND anomaly
I'd use a Tausworthe generator, long period, good properties, small state and pretty quick.
Pauli |
|||
03-19-2018, 02:23 PM
Post: #13
|
|||
|
|||
RE: 50g: an interesting RAND anomaly
I'm rather fond of Xorshift+ because it is robust and well-tested, and is fast and simple to implement on processors that lack hardware multiply. I'm not sure that David wants to implement a new PRNG, though, just to randomize a list as efficiently as possible with the built-in one.
|
|||
03-19-2018, 02:57 PM
Post: #14
|
|||
|
|||
RE: 50g: an interesting RAND anomaly
(03-19-2018 02:23 PM)John Keith Wrote: I'm rather fond of Xorshift+ because it is robust and well-tested, and is fast and simple to implement on processors that lack hardware multiply. I'm not sure that David wants to implement a new PRNG, though, just to randomize a list as efficiently as possible with the built-in one. John's right. I'm perfectly happy using the built-in RAND seed, which also has the side-benefit of allowing the user to influence the outcome with a call to RDZ. The current problem with doing that is determining the best way to convert that seed to a rather smaller integer in the range 1..<list size>. This would also be an issue with any other method that produces a large seed, which may have been one of the reasons Paul recommended a Tausworthe generator. As I've mentioned, doing this with integer operations is preferred for performance reasons. For what it's worth, my earlier suggestion to "fold" the hex version of the RAND seed with an XOR of the two halves was inspired by looking at the XORSHIFT+ code. Much simpler than the full implementation, of course, but given that the seed had already gone through the RAND LCG, I was hoping a single XOR might be enough. While there is some anecdotal evidence that it may be comparable to the RAND standard (RAND * CEIL), I'm at a loss for a reasonable way to test it. |
|||
03-19-2018, 03:04 PM
Post: #15
|
|||
|
|||
RE: 50g: an interesting RAND anomaly
Can you release the attempt using XOR, as a side program, so I can try to test it a bit?
My 50g can run for a while during the week. Wikis are great, Contribute :) |
|||
03-19-2018, 03:26 PM
Post: #16
|
|||
|
|||
RE: 50g: an interesting RAND anomaly | |||
03-19-2018, 06:31 PM
Post: #17
|
|||
|
|||
RE: 50g: an interesting RAND anomaly
One must be careful with randomly programmed random number generators. When I was in grad school, one of my professor came running down the hall, flagged some of us down and showed his newest result. He and another prof ha proved that the average length of all mappings from the integers 0 to N-1 mod N was Sqrt(N).
A practical application came about 30 years later. A group designing some type of password stuff came to me with their modification of DES and asked me to determine how many different keys would result if the put in 10 digit passwords. I also got to do something I had wanted to do; that was, write a Fortran code that simulated DES and kept track of where each bit went. I could figure out which bits depended on which plaintext and which key bits. I told the group that if they just fiddled around with 10 digit passwords and put them through something that wasn't designed to handle 10 digit objects, they would get about 100,000 different outputs. Needless to say they wanted a simulation (which I had) so I ran a few million test cases. My simulation code was nearly as fast as many streamlined DES implementations. The idea was to feed the plaintext back into system as cypher text and look for a repeat. The average cycle length was 10^5. I do not know if they changed anything based on this result. There was the theory and simulation (actually, a bit-for-bit emulation) of their system. These two answers agreed. |
|||
03-19-2018, 07:08 PM
Post: #18
|
|||
|
|||
RE: 50g: an interesting RAND anomaly
ttw thanks for sharing!
I use your story as input to note that a solution is good or bad according to the need or the context. One cannot demand cryptographic secure randomness for something that is not designed to encrypt anything. Even less in a relatively simple calculator. Or better one can demand it, but the request has not much practical relevance. I perceive a bit too often (not only here, in general in internet) the approach "either I have the state of the art spaceship - that should be able to move between galaxies with no effort - to move from my home to the grocery store, a couple of blocks away; or I do not even bother to go there". The tool has to fit the context, otherwise it is just an activity to push higher the demands for the sake of pushing higher the demands. Something of no use. As a note, the middle square method was used, according to "n1", to simulate some physical events for the Manhattan project. We all know that due to that the US is now a japanese region. So much resources allotted to that project that then failed miserably due a subpar random number generator sequence. n1: https://books.google.de/books?id=3wlxf5C...wQ6AEIGDAB Wikis are great, Contribute :) |
|||
03-19-2018, 07:35 PM
Post: #19
|
|||
|
|||
RE: 50g: an interesting RAND anomaly
RNG failures can be obvious or subtle. I'll describe two cases I've seen in the literature. First, one simulation was with a 45-bit Lehmer generator. Good enough for small stuff. In this case, simulations were being done on a 512x512x512 cube. Using 1 RNG per site, we see that one has 27 repeated bits per cycle so that there are 2^18 rather than 2^45 or larger number of states being sampled. I would claim that the investigator should have known about this ahead of time; he did take precautions long before publication though.
A second and more subtle case came with a 31-bit generator on a 10001x10001 simulation. Note that this system has 2^31*10001^2 different states in the system. Then the investigator took line averages giving 10001 different results. (These were spatial averages.) In the final part of the problem, he took differences of these averages (A1-A2, A2-A3, A3-A4; A1-A3, A2-A4....). Note that one can average these differences to improve accuracy even though one isn't permitted to use these averages to improve variance computations. What happens is that he saw a "signal" that increased with spacings of powers of 2. At 8192 (the biggest), the cross stream (x1,x2,x3...) generators look good but the down stream becomes (x1-x8193, x2-x8194...). The random numbers share their 13 lower bits. Whatever the guy did detected this. Rather subtle trap. |
|||
03-20-2018, 06:03 AM
Post: #20
|
|||
|
|||
RE: 50g: an interesting RAND anomaly
Hello,
Here is the random generator source code. This should give you ALL the info you may want (including the reason why you see what you are seeing). HP_Real fRand(uint64_t *Seed) { if (*Seed==0LL) *Seed= BaseSeed; HP_Real r; r.M= 0LL; uint64_t c= 0x2851130928467LL; for (int i= 13; --i>=0; i--) { r.M= dcb_muladd(r.M, *Seed, ((uint32_t)c)&15); // r.M= r.M+seed*low nibble of c *Seed<<= 4; c>>= 4; } *Seed= r.M&0x0fffffffffffffffLL; // update seed for next time r.e= -1; r.s= SignPos; // result's exponent is -1 and sign is pos while((r.M&0xf00000000000000LL)==0) { r.M <<= 4; r.e--; } // Normalize real r.M &= 0x0ffffffffffff000LL; // round to 12 significant digits return r; } Cheers, Cyrille Although I work for the HP calculator group, the views and opinions I post here are my own. I do not speak for HP. |
|||
« Next Oldest | Next Newest »
|
User(s) browsing this thread: 4 Guest(s)