Post Reply 
FFT Multiplication (HP-48G/GX/G+)
02-15-2014, 09:41 PM
Post: #21
RE: FFT Multiplication (HP-48G/GX/G+)
(02-15-2014 08:26 PM)Gerson W. Barbosa Wrote:  
(02-14-2014 10:37 PM)Raymond Del Tondo Wrote:  
Code:

* Interface  :    [%] [%]'  -->  [%]''
On my HP-48GX (and on the emulator) things go like this: [%] [%]' --> ENTER --> [%]". Is this behavior normal?
No, the stack diagram usually lists the inputs, the way and the output.
The program expects two arrays, and returns another array.


(02-15-2014 08:26 PM)Gerson W. Barbosa Wrote:  Here is the running times table, times in seconds:

Code:

# elements     HP-71B     hp 49g      HP-48GX     hp 50g   48GX(SysRPL)
     16         24.11      10.42         9.96       5.57        4.4  (*)
     32         53.41      22.58        21.91      11.90        9.2
     64        116.74      47.86        47.64      25.16       20.6
    128        269.89     110.14       106.45      47.54       43.6
    256          --       255.87       254.00     104.42       95.4

(*) SysRPL times have been measured with a chronometer because I wasn't able to do it programmatically.

The HP-48GX times have decreased because now I have 54 K RAM free (previously I had only 11 K).
You could use a simple TICKS wrapper, very similar to the tEVAL I use.

Thanks for the correction about the element count. For some reason I thought the example arrays consisted of 128 elements, but it was 64 only. My fault.

So the timings in my last posts actually were for two arrays of 64 elements each.
Still not bad for the brave old HP 48, and still slightly faster than the 50g;-)

Cheers

-- Ray
Find all posts by this user
Quote this message in a reply
02-15-2014, 10:19 PM (This post was last modified: 02-15-2014 10:20 PM by Gerson W. Barbosa.)
Post: #22
RE: FFT Multiplication (HP-48G/GX/G+)
(02-15-2014 09:41 PM)Raymond Del Tondo Wrote:  
(02-15-2014 08:26 PM)Gerson W. Barbosa Wrote:  On my HP-48GX (and on the emulator) things go like this: [%] [%]' --> ENTER --> [%]". Is this behavior normal?
No, the stack diagram usually lists the inputs, the way and the output.
The program expects two arrays, and returns another array.

This is what I get, when I enter two arrays, for instance:

[1234]
[5678]
FFTMLTS2 -->

3: 7006652
2: 0
1: <2d>

Upon pressing ENTER, I finally get

[ 700 6652 ]

Even calling FFTMLTS2 inside a program, I always get a screen like that. That's why I can't time it by using TICKS or by any other means. Something wrong? Thanks!

Gerson.
Find all posts by this user
Quote this message in a reply
02-15-2014, 11:23 PM
Post: #23
RE: FFT Multiplication (HP-48G/GX/G+)
Hi Gerson,

(02-15-2014 10:19 PM)Gerson W. Barbosa Wrote:  Something wrong?
Thanks for the hint. The binary in the attachment still has some debug info (like the stack view included). I updated the attachment by including a binary without debug info. Seems it was too late in the morning:-)

Please re-download the zip file.

Cheers

-- Ray
Find all posts by this user
Quote this message in a reply
02-15-2014, 11:53 PM (This post was last modified: 02-16-2014 12:31 AM by Gerson W. Barbosa.)
Post: #24
RE: FFT Multiplication (HP-48G/GX/G+)
(02-15-2014 11:23 PM)Raymond Del Tondo Wrote:  Hi Gerson,

(02-15-2014 10:19 PM)Gerson W. Barbosa Wrote:  Something wrong?
Thanks for the hint. The binary in the attachment still has some debug info (like the stack view included). I updated the attachment by including a binary without debug info. Seems it was too late in the morning:-)

Please re-download the zip file.

Cheers

It's ok now, thanks! Now I get 22.25 seconds for two 64-element arrays, close to what you've reported. Just one more question: does your SysRPL do a List->Array conversion at the end? (I don't understand the code, but it appears to do that from the behavior of your previous compiled program. My latest HP-48GX version, as of yesterday, uses lists only once, to do the Hadamard multiplication. This has improved performance significantly. Would it do the same in SysRPL? (Assuming of course your SysRPL version corresponds to the older User-RPL version).

Cheers,

Gerson.

P.S.: Here is the updated table:

Code:

# elements     HP-71B     hp 49g      HP-48GX     hp 50g   48GX(SysRPL)
     16         24.11      10.42         9.96       5.57       5.44
     32         53.41      22.58        21.91      11.90       9.89
     64        116.74      47.86        47.64      25.16      22.22
    128        269.89     110.14       106.45      47.54      49.18
    256          --       255.87       254.00     104.42     110.11

The difference is most likely due to the final conversion to array, which wasn't being taken into account in the manual timing.
Find all posts by this user
Quote this message in a reply
02-16-2014, 01:51 AM
Post: #25
RE: FFT Multiplication (HP-48G/GX/G+)
(02-15-2014 11:53 PM)Gerson W. Barbosa Wrote:  It's ok now, thanks! Now I get 22.25 seconds for two 64-element arrays, close to what you've reported. Just one more question: does your SysRPL do a List->Array conversion at the end? (I don't understand the code, but it appears to do that from the behavior of your previous compiled program. My latest HP-48GX version, as of yesterday, uses lists only once, to do the Hadamard multiplication. This has improved performance significantly. Would it do the same in SysRPL? (Assuming of course your SysRPL version corresponds to the older User-RPL version).

The difference is most likely due to the final conversion to vector, which wasn't being taken into account in the manual timing.
The SySRPL version doesn't use lists for the array/vector at all.
Starting after the 2nd fft call the result vector is dissolved during the multiplication loop.
After that loop the input vector for the ifft call is generated via XEQ>VEC.
After the ifft call the result vector is dissolved again during the rounding/cutting loop.

Following that loop, the stack contains a meta object of sizeA+sizeB entries (SysRPL version).

The final loop always works on the first two stack levels (elements n and n+1) of the meta object, and finally rotates out the level 1 ob. Since the meta ob will be worked on from the end to the front, one final rotation is needed after the loop to retain the correct order of elements.

The conversion to a vector at the end of the program takes about 81ms for a 256 elements meta ob, so that's rather negligible.

Cheers

-- Ray
Find all posts by this user
Quote this message in a reply
02-16-2014, 02:25 AM
Post: #26
RE: FFT Multiplication (HP-48G/GX/G+)
(02-16-2014 01:51 AM)Raymond Del Tondo Wrote:  
(02-15-2014 11:53 PM)Gerson W. Barbosa Wrote:  It's ok now, thanks! Now I get 22.25 seconds for two 64-element arrays, close to what you've reported. Just one more question: does your SysRPL do a List->Array conversion at the end? (I don't understand the code, but it appears to do that from the behavior of your previous compiled program. My latest HP-48GX version, as of yesterday, uses lists only once, to do the Hadamard multiplication. This has improved performance significantly. Would it do the same in SysRPL? (Assuming of course your SysRPL version corresponds to the older User-RPL version).

The difference is most likely due to the final conversion to vector, which wasn't being taken into account in the manual timing.
The SySRPL version doesn't use lists for the array/vector at all.
Starting after the 2nd fft call the result vector is dissolved during the multiplication loop.
After that loop the input vector for the ifft call is generated via XEQ>VEC.
After the ifft call the result vector is dissolved again during the rounding/cutting loop.

Following that loop, the stack contains a meta object of sizeA+sizeB entries (SysRPL version).

The final loop always works on the first two stack levels (elements n and n+1) of the meta object, and finally rotates out the level 1 ob. Since the meta ob will be worked on from the end to the front, one final rotation is needed after the loop to retain the correct order of elements.

Thanks for the detailed explanations. Although

(02-16-2014 01:51 AM)Raymond Del Tondo Wrote:  The conversion to a vector at the end of the program takes about 81ms for a 256 elements meta ob, so that's rather negligible.

Yes, I remember now the conversion was immediate after pressing ENTER in your debug version. I can't explain the 15 seconds difference compared to my manual timing, though. I'm sure I didn't make any mistake while taking time measurements, also my Casio wristwatch is accurate enough: only 133 seconds in advance since I bought it in Jul/2008 :-) (Time to adjust it for the first time now that our DST has just ended)

Thanks again for your fast SysRPL version (Finally the sixfold increase in speed when compared to the HP-71B !). Only thing left is checking whether 5-digit groups will yield accurate results -- will do that later.

Best regards,

Gerson.
Find all posts by this user
Quote this message in a reply
02-16-2014, 05:21 AM (This post was last modified: 02-19-2014 12:39 PM by Gerson W. Barbosa.)
Post: #27
RE: FFT Multiplication (HP-48G/GX/G+)
Ever been curious about all digits of 253! ? Out of 500, the HP-48GX gives the first 12 only: 517346099264. Just run the program below with single argument 253 and wait for about one hour and 45 minutes. This is by no means an efficient way to do this, but might be a way to check the accuracy of the algorithm after 253 multiplications, even though one on the operands is small. (Better use Emu48 and get the result under 60 seconds or less).

Code:

%%HP: T(3)A(D)F(,);
\<< 1 -
[ 1 ]
1 ROT ROT 1 ROT
  START OVER 1 +
DUP 1 \->ARRY ROT
FFTMLTS2 TRIM ROT
DROP
  NEXT SWAP DROP
\>>

where FFTMLTS2 is Raymond del Tondo's SysRPL program above and TRIM is the following program, slightly adapted from TRIM, by Wayne H. Scott, part of his POLY package:
Code:

%%HP: T(3)A(D)F(,);                 ;  
\<< OBJ\-> OBJ\-> DROP \->          ; This trims off the leading zeros of the array.
n                                   ; For example,
  \<< n                             ;
    WHILE ROLL DUP                  ; [ 0 0 0 1234 0 5789 ]  -->  [ 1234 0 5789 ]
ABS NOT n 1 - AND                   ;
    REPEAT DROP 'n'
DECR
    END n ROLLD n 1
\->LIST \->ARRY
  \>>
\>>

-->

[ 5173 4609 9264 78 9218 433 899 7295 2746 9542 3561 2720 6639 9607 4846 3616 3134 3029 313 41 2383 1443 7828 1112 1374 4932 5428 7661 7296 3169 484 977 8527 4435 4743 3640 9654 4589 6311 9980 576 3521 219 7345 934 790 1685 4446 6163 7384 4451 7144 4589 2498 2615 9309 2898 1062 2514 4818 9873 9824 3499 6567 2944 9381 9909 5203 1087 3152 8570 9655 6175 4517 6766 2603 4976 5427 6777 1987 6267 959 7099 9373 2257 7683 9082 7849 7279 3284 6880 6763 5727 3110 3332 7966 9572 6049 2114 9638 6749 6804 5622 1513 5307 5201 4396 1440 1249 2800 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]

As far as I have checked, all digits are correct, considering the hidden leading zeros in some 4-digit elements.

-----

Update: I've run the User-RPL program on the emulator (50g) and formatted the output to integer. The result was exactly the same I got when computing 253! on the hp 50g in exact mode.

Also, an exact match was obtained for 997! (2559 digits). This was computed using arrays with 5-digit elements. Here they are, in case someone wants to check them out :-)

Code:

40359724461645390234292652765290740211090335246139389143030797373519663190106842​37263758833853587102137006631984661977193944113181265519616824478081989249680516​43330792790545975658652366984953410102994729193397927862707860663312211428139657​33919921028883982924596589308458677218884794980135443761645075224506666559889800​94175577967376951675213432494794136314145342021847264214793926157307811731645263​93982880263279118925406206180689438683308644696334133955184235540077242451165903​81101827719832180031595827989994138156615149091737998105454985248322329275243898​11980802708882543991975745364605704734303715958724031694867571661542949412580453​11241382930836862005052391967478429035362983199050663230586866257612402804942403​44233166394434168335073220412356534986944621623211159899567872446218256850113174​63838577067904001075072667390026316129311241122279096729357421049685332780747960​00335855930432060517447195226436187301231195091058916141500005034486568847599649​00494067769318523221837865944485464570390882493401514455003570460531797737862031​18550953567694888922171302000112504911516415310901200837651592219697553144378802​09281708574493693840125338722070514029362985801732618715060934298236579096167095​85950405331060872571119845720022654435044594115773486340742853243112648568667878​84661486819750191740104532976390040068195207044638407735286912244552652299854897​64356909675383800245969276679872407757924211918488179598530382266647547907226165​47980297654789693965688881325682653906791569527887851625739692098351138902956310​11123253723954647397831433613628798725785501475711681360833913542427351428039887​35616917749898060073075542403509536490539404444972668319521415425667918323473675​96656633239099325959195904942407038086186468220698646372928155733874746654662785​92062875719964916067979790641428194695892008126790265612881240871363598309598670​34513441434850212864818601504529520195828528045600869420646442863720485414968365​31269052383502650854577265971210516113769359526291937135884001947338380202834453​11816794177165630135012424772911390424228141663696011522232935969575275309346520​46662174154235850073391729650007182794396630407081318880947107940245036774649857​42937922077663735689021159654000934909225598804790941759477837570572384191816766​30262770090339396547856717150451221853157302493936160447379021701169807360000000​00000000000000000000000000000000000000000000000000000000000000000000000000000000​00000000000000000000000000000000000000000000000000000000000000000000000000000000​0000000000000000000000000000000000000000000000000000000000000000000000000000000


Update:

It appears 5-digit group arrays yield reliable results up to 128 elements only (1280-digit results, 5*256). 4^2^11 has 1234 digits and is correctly computed using the program below. The next doubling (4^2^12) is not.

Code:


11 

%%HP: T(3)A(D)F(.);
\<< [ 4 ] 1 ROT
  START DUP FFTMULT TRIM
  NEXT A2I
\>>

EVAL

'4^2^11'              

EVAL  -

-->   0 




FFTMULT:

%%HP: T(3)A(D)F(.);
\<< DUP2 SIZE OBJ\-> DROP SWAP SIZE OBJ\-> DROP DUP2 MAX UNROT + 4.
ROLLD LN 2. LN / DUP FP NOT NOT SWAP IP + 2. SWAP ^ DUP + DUP ROT SWAP
1. \->LIST RDM UNROT 1. \->LIST RDM FFT OBJ\-> 1. GET \->LIST SWAP FFT
OBJ\-> 1. GET \->LIST
  \<< *
  \>> DOLIST OBJ\-> 1. \->LIST \->ARRY IFFT 1. ROT SUB RE 0. RND DUP
SIZE OBJ\-> DROP 1. - 1.
  FOR i i GETI 100000. MOD PICK3 ROT DUP UNROT GET ROT + PUT i GETI
100000. / IP PICK3 ROT GET 100000. / IP + i SWAP PUTI OVER OVER GET
100000. MOD PUT -1.
  STEP
\>>

TRIM:

%%HP: T(3)A(D)F(.);
\<< OBJ\-> OBJ\-> DROP \-> n
  \<< n
    WHILE ROLL DUP ABS NOT n 1. - AND
    REPEAT DROP 'n' DECR
    END n ROLLD n 1. \->LIST \->ARRY
  \>>
\>>

A2I:

%%HP: T(3)A(D)F(.);
\<< DUP "" UNROT SIZE OBJ\-> SWAP
  FOR i DUP i GET R\->I \->STR FMT ROT SWAP + SWAP
  NEXT DROP OBJ\->
\>>

FMT:

%%HP: T(3)A(D)F(.);
\<< DUP SIZE 5 - NOT NOT
  IF
  THEN 5. OVER SIZE - 1. SWAP
    START "0" SWAP +
    NEXT
  END
\>>
Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 17 Guest(s)