Threaded Mode | Linear Mode

PeterP · 04-15-2019, 03:58 PM

I found this article on Quanta Magazine (my favorite guilty pleasure read). For the MCODE gurus here, it seems one might be able to 'update' the OS routine for multiplication when doing large multiplications, ie for Prime Search or for multi-precision algorithms, or Pi-Spigots, etc.

I wonder what time-improvement one could get using an updated "*" in FOCAL programs for large numbers. This could be implemented in the 41CL for example.

It was just generally a fascinating read to learn something as basic as multiplication can still be improved.

Cheers

PeterP

Albert Chan · 04-15-2019, 05:07 PM

(04-15-2019 03:58 PM)PeterP Wrote: I found this article on Quanta Magazine (my favorite guilty pleasure read).

Thanks for the link ! Many other nice articles Smile

Too bad the article never touch any details about the fastest method, except O(n log(n))

If anyone want to see real code for other mentioned method, I have some code examples.
Note: MAPM number does based-100, and based-10000 for FFT calculations.

Grade school method: mapm_mul.c
Karatsuba’s method : mapm_fmul.c
Fast Fourier Transform: mapm_fft.c

Valentin Albillo · 04-15-2019, 07:50 PM

.
Hi, PeterP, long time no see:

(04-15-2019 03:58 PM)PeterP Wrote: For the MCODE gurus here, it seems one might be able to 'update' the OS routine for multiplication when doing large multiplications.

That's the key, large multiplications, because otherwise the overhead of calling a "*" in XROM would make it slower than the main "*" routine. I read many years ago an article (perhaps in CHHU) where someone coded addition "+" as XROM, to demonstrate the considerable overhead, and it executed way slower than using the main "+".

Quote:It was just generally a fascinating read to learn something as basic as multiplication can still be improved.

Yes, it is. For the record, I wrote an article titled Fantastic FOUR (which you can read/download in PDF format here) discussing FFT (Fast Fourier Transform) multiplication and including a 10-line subprogram for the HP-71B implementing it, with examples and timings.

Regards.
V.
.

Albert Chan · 04-15-2019, 10:22 PM

Knowing how multiplication is done internally, we can design better algorithm.
Example, doing factorial by multiply 1 number at a time is bad.

Assuming FFT multiply available, shell method is a fast way to do factorial.

Code:

def fac(n): return 1 if n<2 else m(n%2+1, n, (n//2)-1)

def m(a, b, n):

    'return (a b) * (a+1)(b-1) *  ... (a+n)(b-n), n >= 0'

    if n==0: return a*b

    k = n//2 + 1

    return m(a, b, k-1) * m(a+k, b-k, n-k)

fac(100) = m(1, 100, 49)

m(1,100,49) = m(1,100,24) * m(26,75,24)
m(1,100,24) = 58349563223699444377128670554435213870263955676382762341761024 * 10^12
m(26,75,24) = 1599432974093600067896009239156107676288209506371901842134782334820417536 * 10^12

m(1,100,24) = m(1,100,12) * m(14,87,11)
m(1,100,12) = 275716888847455562868038565888 * 10^6
m(14,87,11) = 211628542116555657261011122126848 * 10^6

m(26,75,24) = m(26,75,12) * m(39,62,11)
m(26,75,12) = 26582152251962634809969206238386323456 * 10^6
m(39,62,11) = 60169430937463291680684196401709056 * 10^6
...

Factor pairs are about the same size, even the recursive ones.

Gerson W. Barbosa · (This post was last modified: 04-16-2019 02:00 AM by Gerson W. Barbosa.)

(04-15-2019 07:50 PM)Valentin Albillo Wrote: For the record, I wrote an article titled Fantastic FOUR (which you can read/download in PDF format here) discussing FFT (Fast Fourier Transform) multiplication and including a 10-line subprogram for the HP-71B implementing it, with examples and timings.

Hello, Valentin,

I would like to point you out to an old thread inspired in your great article, in case you aren’t aware of it yet:

FFT Multiplication (HP-48G/GX/G+)

Best regards,

Gerson.

Leviset · 04-16-2019, 03:09 PM

Excuse me being out of date at 70 but what is the programming language being used on the Quanta page that you link to for the original source code please?

PeterP · 04-16-2019, 07:07 PM

Albert, this is great material! I wish I had been in contact with you when I wrote my MultiPrecision library in MCode a few years (we'll, more like many years) back. Given that MPL was designed for numbers with 100-1000 digits, this could have some meaningful impact. And in a later post you even show Fact as well. Thank you for sharing!

(04-15-2019 05:07 PM)Albert Chan Wrote:
(04-15-2019 03:58 PM)PeterP Wrote: I found this article on Quanta Magazine (my favorite guilty pleasure read).

Thanks for the link ! Many other nice articles

Too bad the article never touch any details about the fastest method, except O(n log(n))

If anyone want to see real code for other mentioned method, I have some code examples.
Note: MAPM number does based-100, and based-10000 for FFT calculations.

Grade school method: mapm_mul.c
Karatsuba’s method : mapm_fmul.c
Fast Fourier Transform: mapm_fft.c

ijabbott · (This post was last modified: 04-16-2019 07:35 PM by ijabbott.)

(04-16-2019 03:09 PM)Leviset Wrote: Excuse me being out of date at 70 but what is the programming language being used on the Quanta page that you link to for the original source code please?

I couldn't see any source code on the Quanta page, but the algorithm on the original paper was written in the language of mathematics!

(EDIT: I linked to the wrong paper. Fixed.)

Leviset · 04-16-2019, 10:29 PM

Language of Mathematics - very droll!
I meant this stuff:-
Do you know what programming language this is?

#include "m_apm.h"
#include "mapm_np2.c" // next powers of 2
/*
* FFT_MUL = 0 -> div-and-conq ONLY
* FFT_MUL = 1 -> fft mul ONLY, fatal if overflow
* FFT_MUL = 2 -> fft mul AND div-and-conq
*/
/*
* MAX_FFT_BYTES *must* be an exact power of 2
*
* **WORST** case input numbers (all 9's) has shown that
* the FFT math will overflow if the MAX_FFT_BYTES >= 1048576
*
* the define below allows the FFT algorithm to multiply two
* 524,288 digit numbers yielding a 1,048,576 digit result.
*/
#if FFT_MUL
#define MAX_FFT_BYTES 524288
void M_fast_mul_fft(UCHAR *, UCHAR *, UCHAR *, int);
#else
#define MAX_FFT_BYTES 8 /* NOT using FFT */
#define M_fast_mul_fft(r,a,b,sz) M_4_byte_mul(r, a, b)
static void M_4_byte_mul(UCHAR *, UCHAR *, UCHAR *);
#endif
#define DATA(r) r->m_apm_data
#if FFT_MUL != 1
void M_fmul_div_conq(UCHAR *, UCHAR *, UCHAR *, int);
static void M_fmul_dc2(UCHAR *, UCHAR *, UCHAR *, UCHAR *, int);
static void M_fmul_add(UCHAR *, UCHAR *, UCHAR *, UCHAR *, int);
static int M_fmul_sub(UCHAR *, UCHAR *, UCHAR *, int);
#endif

Leave the Planet (tlalticpac) in a better shape than when you first found it.

Albert Chan · 04-16-2019, 11:04 PM

Hi, Leviset

This is plain C code.

Although FFT is fast, it started to get inaccurate when numbers of digits go huge.
For product of over 2 million digits, it might return garbage results.

To be on the safe side, I cut that limit by half.
Div-and-Conquer code handled what FFT cannot.

These code are kinda heavy, and some application might not need the code bloat.
That was the reason for different versions, FFT_MUL = 0, 1, 2

Valentin Albillo · 04-17-2019, 11:24 PM

.
Hi, Gerson:

(04-16-2019 01:59 AM)Gerson W. Barbosa Wrote: I would like to point you out to an old thread inspired in your great article, in case you aren’t aware of it yet:

FFT Multiplication (HP-48G/GX/G+)

Thanks for your interest and kind words re my article, and also thank you very much for the link to the old thread. I'm quite sure that I must've read it at the time it was posted but nowadays I didn't remember most of it so it's been a really great read once more and you certainly did an amazing job, as expected of you. Smile

Have a nice Easter and best regards.
V.
.