Threaded Mode | Linear Mode

emece67 · (This post was last modified: 03-18-2018 07:34 PM by emece67.)

(03-05-2018 03:52 AM)Paul Dale Wrote:
(03-02-2018 01:00 PM)Mike (Stgt) Wrote: to compute sqare roots up to 100'000 digits and more?

[...]
It might be worth looking into Halley's method, which has cubic convergence, but needs the second derivative which is a constant here. However, I doubt the extra computation would be worthwhile. I think higher order Householder methods will fail because all of the higher order derivatives are zero.

You could look for Karatsuba and FFT square roots. I remember them being Newton's method with improvements to the basic arithmetic operations.

I finally computed square roots in Python upto 1 billion bits (~300 million digits), comparing 8 methods. The fastest was, always, Halley, with speedups when compared against the slower one (Kung-Traub) as high as 2.5x. When compared against Newton or Ostrowsky, the speedups are much moderate, around 20 %. The simplicity of the Halley method and the fact that the 2nd derivative is constant made it a winner for this case.

But the real accelerator was the core library used to perform the arithmetic. I started using the bare mpmath library (that uses the default Python machinery for arithmetic). The time needed by this library to perform multiplications and divisions seems to grow quadratically with the number of digits, so the algorithm used for such operations may be the schoolbook one.

Then switched to mpmath+gmpy (which includes the GMP library). This gave a speedup up to 35x at small digit counts and upto 1200x at big digit counts. The time required for mul/div grew slower than quadratically. This way, computation of sqrt(17) to ~300 million digits using the Halley method took less than 20 minutes in a laptop.

The gmpy library (in fact the GMP one) uses different algorithms for multiplication/division based upon the required number of digits: at low digit counts uses the schoolbook method, then switches to Karatsuba, then to different variants of Toom and finally to FFT.

As expected, Pauli's insight was foolproof ;-)

Regards.

EDIT: there were 1 billion bits, not digits. My fault when switching to gmpy. That is ~300 million digits, not a billion. Fixed also the speedup factors.