Post Reply 
Mersenne Twister implentation
02-04-2017, 04:19 PM
Post: #1
Mersenne Twister implentation
Based on the discussion in the random number thread, here is an implementation of the Mersenne Twister.

To use, call Twister which will set up the state in the list L9(). I used a list instead of a matrix since a list can store integers. After the initial Setup you can call:

MTSEED(number) to reset the seed.
RANDMT() to get a uniform number from 0 to 1
RANDMTI() to get the raw hex numbers
RANDMTN() to get normally distributed numbers with a mean of 0 and SD of 1.

Observations:
The output matches the reference implementations exactly.
Bit twiddling is not the Prime's strong suit. (Dogs walking on their hind legs come to mind)
It is much slower than the built in functions. My random pi program is slowed down by a factor of about 20, 21.5 sec for 10000 iterations vs 1.2 sec for 10000 iterations using the built-in rand.
Code:

//
// Twister
//An Implementation of the Mersenne
//Twister Random Number Algorithm by 
//Makoto Matsumoto and Takuji Nishimura
//Normalizer from Abramowitz and Stegun
//
// Basic flow cribbed from Stephan Brumme
//
// Keith Barkley, Feb 2017
//
//DECLARES
MTSEED();
TWIST();
RANDMTI();
RANDMT();
RANDMTN();

EXPORT Twister()
BEGIN
//THE STATE SIZE IS 624
// 625 FOR INDEX
// 626 FOR NORMAL "PHASE"
//INIT
LOCAL i;
FOR i FROM 1 TO 624 DO
L9(i):=#0d;
END;
L9(625):=625;
L9(626):=0;

MTSEED(1);
END;

EXPORT MTSEED(S)
BEGIN
LOCAL i;
L9(1):=R→B(S);
FOR i FROM 2 TO 624 DO
L9(i):=(#1812433253d *BITXOR(L9(i-1),BITSR(L9(i-1),30)))+R→B(i-1);
END;
TWIST();
L9(625):=1;
END;

TWIST()
BEGIN
LOCAL b,b1,b2,i;
//TWIST
FOR i FROM 1 TO 227 DO
b:=BITOR(BITAND(L9(i),#80000000h),BITAND(L9(i+1),#7FFFFFFFh));
b1:=BITSR(R→B(b),1);
b2:=BITAND(R→B(b),#1h) * #9908B0DFh;
L9(i):=BITXOR(L9(i+397),R→B(b1),R→B(b2));
END;

FOR i FROM 228 TO 623 DO
b:=BITOR(BITAND(L9(i),#80000000h),BITAND(L9(i+1),#7FFFFFFFh));
b1:=BITSR(R→B(b),1);
b2:=BITAND(R→B(b),#1h) * #9908B0DFh;
L9(i):=BITXOR(L9(i-227),R→B(b1),R→B(b2));
END;

b:=BITOR(BITAND(L9(624),#80000000h),BITAND(L9(1),#7FFFFFFFh));
b1:=BITSR(R→B(b),1);
b2:=BITAND(R→B(b),#1h) * #9908B0DFh;
L9(624):=BITXOR(L9(397),R→B(b1),R→B(b2));

END;

EXPORT RANDMTI()
BEGIN
LOCAL x;

IF (L9(625)=625) THEN 
TWIST();
L9(625):=1;
END;
x:=L9(L9(625));

x:= BITXOR(R→B(x),BITSR(R→B(x),11));
x:= BITXOR(R→B(x),BITAND(BITSL(R→B(x),7),#9D2C5680h));
x:= BITXOR(R→B(x),BITAND(BITSL(R→B(x),15),#EFC60000h));
x:= BITXOR(R→B(x),BITSR(R→B(x),18));

L9(625):=L9(625)+1;
RETURN x;
END;

EXPORT RANDMT()
BEGIN
RETURN B→R(RANDMTI())/B→R(#FFFFFFFFh);
END;
EXPORT RANDMTN()
BEGIN
LOCAL n,u1,u2;
IF (L9(626)≠0) THEN
n:=L9(626);
L9(626):=0;
RETURN n;
END;

u1:=RANDMT();
u2:=RANDMT();
L9(626):=(−2*LN(u1))^0.5*COS(2*π*u2);
RETURN ((−2*LN(u1))^0.5*SIN(2*π*u2));
END;
Find all posts by this user
Quote this message in a reply
02-05-2017, 05:37 PM
Post: #2
RE: Mersenne Twister implentation
(02-04-2017 04:19 PM)KeithB Wrote:  
Code:
{snip}
LOCAL i;
FOR i FROM 1 TO 624 DO
L9(i):=#0d;
END;
{snip}

Just wondering out loud here...

Is the above quicker than:
Code:
L9:=MAKELIST(#0d,A,1,624);

Somehow I'm guessing that the loop involved in MAKELIST, running at system level, is going to be faster than a FOR loop in user code. Also, no need for the local variable but it will clobber the 'A' home variable and anything in L9 beyond the 624th element, although the latter is a non-problem if this is an initialization function anyway.

Just curious...

Godwin Stewart (happy HP Prime owner since Wednesday)
Find all posts by this user
Quote this message in a reply
02-05-2017, 07:00 PM
Post: #3
RE: Mersenne Twister implentation
(02-05-2017 05:37 PM)grsbanks Wrote:  
(02-04-2017 04:19 PM)KeithB Wrote:  
Code:
{snip}
LOCAL i;
FOR i FROM 1 TO 624 DO
L9(i):=#0d;
END;
{snip}

Just wondering out loud here...

Is the above quicker than:
Code:
L9:=MAKELIST(#0d,A,1,624);

Somehow I'm guessing that the loop involved in MAKELIST, running at system level, is going to be faster than a FOR loop in user code. Also, no need for the local variable but it will clobber the 'A' home variable and anything in L9 beyond the 624th element, although the latter is a non-problem if this is an initialization function anyway.

Just curious...

Godwin Stewart (happy HP Prime owner since Wednesday)

Write 2 similar programs, each with the 2 optional styles, and time them. You may have to run this inside an outer loop for 10 (or 100, etc) times to really see the difference. Let us all know.

--Bob Prosperi
Find all posts by this user
Quote this message in a reply
02-05-2017, 09:53 PM
Post: #4
RE: Mersenne Twister implentation
It's 10pm here and I have to be out of bed at 5am so I'll post the code I used tomorrow.

I've observed MAKELIST to be consistently faster than the FOR loop by a factor of 12-14.
Find all posts by this user
Quote this message in a reply
02-06-2017, 07:58 AM
Post: #5
RE: Mersenne Twister implentation
This is what I used to benchmark the two methods. No comments in the code because it is fairly self-explanatory:
Code:
MethA();
MethB();
TimeMethA(n);
TimeMethB(n);

EXPORT LISTTIME(n)
BEGIN
LOCAL a, b;
 a:=TimeMethA(n);
 b:=TimeMethB(n);
 STRING(a)+" / "+STRING(b)+" = "+STRING(a/b);
END;

EXPORT TimeMethA(n)
BEGIN
 LOCAL b4:=TICKS, after, j;
 FOR j FROM 1 TO n DO
  MethA(); 
 END;
 after:=TICKS;
 RETURN after-b4; 
END;

EXPORT TimeMethB(n)
BEGIN
 LOCAL b4:=TICKS, after, j;
 FOR j FROM 1 TO n DO
  MethB();
 END;
 after:=TICKS;
 RETURN after-b4; 
END;

MethA()
BEGIN
 LOCAL k;
 L9:={};
 FOR k FROM 1 TO 624 DO
  L9(k):=#0d;
 END;
END;

MethB()
BEGIN
 L9:=MAKELIST(#0d,A,1,624);
END;
On a real device with display set to FIX 2:

LISTTIME(1) yields: "91.00 / 4.00 = 22.75"
LISTTIME(10) yeilds: "791.00 / 55.0 = 14.38
LISTTIME(100) yields: "7709.00 / 561.00 = 13.74"
LISTTIME(1000) yields: "77041.00 / 5617.00 = 13.72"

On an emulated Prime, also with display set to FIX 2:

LISTTIME(1) gives a division by zero because method B completes in less than a millisecond.
LISTTIME(10): "16.00 / 1.00 = 16.00"
LISTTIME(100): "159.00 / 10.00 = 15.90"
LISTTIME(1000): "1580.00 / 98.00 = 16.12"
LISTTIME(10000): "15528.00 / 975.00 = 15.93"
Find all posts by this user
Quote this message in a reply
02-06-2017, 04:42 PM
Post: #6
RE: Mersenne Twister implentation
Thanks for that, but since it is only done once as an initialization, it really only affects the user experience just a little bit.

It is cleaner code though. When I wrote that I was much more concerned about the bit-twiddling.
Find all posts by this user
Quote this message in a reply
02-06-2017, 06:24 PM
Post: #7
RE: Mersenne Twister implentation
(02-06-2017 04:42 PM)KeithB Wrote:  Thanks for that, but since it is only done once as an initialization, it really only affects the user experience just a little bit.

It is cleaner code though. When I wrote that I was much more concerned about the bit-twiddling.

No problem. Of course, it doesn't really matter in the grand scheme of things. It's just that I cut my teeth in programming computers nearly 40 years ago, back in the days of 8-bit machines with RAM measured in KB, not in GB, so I can't help myself sometimes when I see something that it must be possible to optimise somewhere Smile

Just call me an old fart. That explains away many things!
Find all posts by this user
Quote this message in a reply
02-06-2017, 07:08 PM
Post: #8
RE: Mersenne Twister implentation
If you use an AVR based Arduino Uno you can get back to that.
They often have only 2 KB of RAM - though you have a lot more flash for program memory!

I also take to heart Kernigham and Plauger's advice from "The Elements of Programming Style" from 1978:
"Everyone one knows that debugging is twice as hard as writing a program in the first place. So, if you are as clever as you can be when you write it, how can you ever hope to debug it?"

Also, in the new movie "Hidden Figures" Dorothy Vaughn steals a FORTRAN programming book from the library (It was in the white section of the library) to teach herself programming. They only showed it for a second, but I forgot to see if it was written by Daniel McCracken, who seems to have written most of the FORTRAN programming texts. 8^)
Find all posts by this user
Quote this message in a reply
02-06-2017, 10:18 PM
Post: #9
RE: Mersenne Twister implentation
There is also the TEVAL function. Call you function using it and get back the time of execution (at least if all you care about is total execution time).

TEVAL(your_func(your_inputs)) -> 10.453s or similar

TW

Although I work for HP, the views and opinions I post here are my own.
Find all posts by this user
Quote this message in a reply
02-06-2017, 11:23 PM
Post: #10
RE: Mersenne Twister implentation
(02-06-2017 07:58 AM)grsbanks Wrote:  On a real device with display set to FIX 2:

LISTTIME(1) yields: "91.00 / 4.00 = 22.75"
LISTTIME(10) yeilds: "791.00 / 55.0 = 14.38
LISTTIME(100) yields: "7709.00 / 561.00 = 13.74"
LISTTIME(1000) yields: "77041.00 / 5617.00 = 13.72"

On an emulated Prime, also with display set to FIX 2:

LISTTIME(1) gives a division by zero because method B completes in less than a millisecond.
LISTTIME(10): "16.00 / 1.00 = 16.00"
LISTTIME(100): "159.00 / 10.00 = 15.90"
LISTTIME(1000): "1580.00 / 98.00 = 16.12"
LISTTIME(10000): "15528.00 / 975.00 = 15.93"

Thanks for sharing the results.

While one should naturally expect that the actual times on the physical Prime and virtual Prime are quite different, one would not expect the relative times (the ratios on the device compared to the ratios on the emulator) to be this different (about 15%).

Tim/Cyril - why would the relative speed differences for these 2 types of commands vary this much between the device and virtual Prime? Internally, if they are presumably running the same code, the relative speeds should also be the same, right?

Higher ratios on the emulator implies the FOR...LOOP generated code is better optimized on the PC?

Just curious...

--Bob Prosperi
Find all posts by this user
Quote this message in a reply
02-07-2017, 08:24 AM (This post was last modified: 02-07-2017 08:48 AM by grsbanks.)
Post: #11
RE: Mersenne Twister implentation
(02-06-2017 10:18 PM)Tim Wessman Wrote:  There is also the TEVAL function.

I've only been using the Prime for less than a week and hadn't found out about TEVAL by the time I wrote these tests Smile If I were to rewrite the benchmark now, that's what I'd use!

(02-06-2017 11:23 PM)rprosperi Wrote:  Tim/Cyril - why would the relative speed differences for these 2 types of commands vary this much between the device and virtual Prime? Internally, if they are presumably running the same code, the relative speeds should also be the same, right?

Higher ratios on the emulator implies the FOR...LOOP generated code is better optimized on the PC?

To be honest, I don 't know exactly how the Prime emulator works. If it's similar to Emu48 et al. then it emulates the hardware and uses the original machine's ROM. The precise manner in which the emulator handles the ARM instruction set will therefore have a direct influence on execution time and, more to the point, relative execution time between various algorithms.

It's one possible explanation anyway...

Edit: surely a higher FOR-to-MAKELIST execution time ratio on the PC means that the FOR loop is better implemented on the Prime (and/or MAKELIST not so well implemented) if it's closer to the MAKELIST execution time?
Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 3 Guest(s)