Threaded Mode | Linear Mode

J-F Garnier · 03-22-2022, 12:33 PM

I may have identified an accuracy flaw in the power operation in Free42/Decimal.

This came from the last [VA] SRC #010 - Pi Day 2022 Special thread where a calculation done by two methods gave significantly different results, with no clear explanation.

I found a problem with this calculation:
(1+1/X)^X [1]
with X=1E8.
A comparison with the expended and mathematically equivalent expression:
exp( X * ln(1+1/X) ) [2]
gives the following results
[1] 2.718281814867636217652977244308419
[2] 2.718281814867636217652977243009177

Although we could expect expression [2] to be less accurate due to the intermediate calculations, it turns out that result [2] is correct (according to Wolfram reference) and result [1] is in error in its last 7 digits.

Checked on latest Free42 3.0.10.

J-F

Albert Chan · 03-22-2022, 04:31 PM

https://github.com/thomasokken/free42/bl...at.cc#L649

Code:

        // Integral power. bid128_pow doesn't handle these very well,

        // so I'm using repeated squaring instead. This way at least

        // we get exact results for integral powers of ten!

        if (x < -2147483647.0 || x > 2147483647.0)

            // For really huge exponents, the repeated-squaring

            // algorithm for integer exponents loses its accuracy

            // and speed advantage, and we switch to the

            // library's bid128_pow() instead.

            goto noninteger_exponent;

I would have reduce integer exponent range a bit, possibly remove it.
For exponent 1e8, that is already 27 squaring's, plus many multiplies !
No wonder the result is off.

Let's bypass the integer squaring part, and see what happen.

1 1e-8 + 32e8 Y^X → 78962947548608.16088504664995794553
√ √ √ √ √ → 2.718281814867636217652977243009177

Thomas Okken · 03-22-2022, 05:18 PM

It looks like the base being so close to 1 is the problem. I'm seeing similarly bad results in Free42 Binary. Free42 Decimal does actually get it right if I disable the repeated-squaring algorithm it uses for integer exponents up to 2^31-1. And both versions get it right if you calculate 1E-8 LN1+X 1E8 * E^X.

So maybe it should check the base for proximity to 1, and use log1p() and expm1() in that case.

The repeated squaring is not the issue by itself, you're not losing 7 digits of precision from calculating 39 multiplications unless something else is wrong. Note that if you calculate this using LN instead of LN1+X, you end up way off as well.

Thomas Okken · (This post was last modified: 03-22-2022 05:43 PM by Thomas Okken.)

I'm reluctant to simply remove the repeated-squaring logic. That logic is there for a reason, bid128_pow() is not very good when raising integers to integer powers, and the 2.0u2 revision of the library even includes a patch that does the same thing as the work-around I'm applying to 2.0u1. But it does make sense that bid128_pow() without the work-around works well for (1+1e-8)^1e8, since it does seem to specifically handle bases close to 1. So the question to me is when to use the repeated-squaring work-around and when not to, and that question is relevant to the binary version as well, since the standard library's pow() has the same problem, so there's the question when to use pow() and when to use log1p() and expm1().

Albert Chan · 03-22-2022, 07:11 PM

(03-22-2022 05:18 PM)Thomas Okken Wrote: The repeated squaring is not the issue by itself, you're not losing 7 digits of precision from calculati≈ng 39 multiplications unless something else is wrong. Note that if you calculate this using LN instead of LN1+X, you end up way off as well.

When we keep squaring, the errors also blows up exponentially.

x = 1e8 = 0b101 111 101 011 110 000 100 000 000
b = 1+1/x

Free42 (internally only do squarings)
b^(2^08) = 1.000002560003264002763521747927282
b^(2^13) = 1.000081923355125194292436470243333
...
b^(2^26) = 1.956365933428064586618947538663749

All terms to multiply has errors on the *same* side; losing 7 digits precision is normal.

Note that b^(2^8) only lost 1 ULP here.
If the base is worse, it might even lose log10(x) = 8 digits precision.

Quote:That logic is there for a reason, bid128_pow() is not very good when raising integers to integer powers, and the 2.0u2 revision of the library even includes a patch that does the same thing as the work-around I'm applying to 2.0u1.

Just because the new revision has the patch does not make it right.
Unless bid128_pow() loses *millions* of ULP's, we should limit exponent range.

Patch should only help, stop if it pass the break-even point.

Thomas Okken · 03-22-2022, 07:38 PM

Okay, that makes sense. Maybe I should replace it with special-case logic for exact powers of 10 only, I'm sure those were a problem and that's why I added this logic in the first place. And for positive integer powers of other small integers it might also be useful, but I'm not sure about that. Getting exact results for powers of 10 is the main thing.

Thomas Okken · 03-22-2022, 10:08 PM

10^2 = 99.99999999999999999999999999999999

That's just the first one of many where bid128_pow() returns an inexact result for an easy case... Figuring out where the point is where repeated squaring becomes inferior will be the next challenge. And looking at powers of other small integers.

Werner · (This post was last modified: 03-23-2022 07:41 AM by Werner.)

(03-22-2022 07:38 PM)Thomas Okken Wrote: Getting exact results for powers of 10 is the main thing.

Or for any other integer result. So use the squaring algorithm only for integer powers of integers?
Werner

update: Ah, no I see: then 0.1^2 would probably not equal 10^2/10000. So the rule should be (but how to check it): as long as the result can be exact, use squaring ;-)

J-F Garnier · (This post was last modified: 03-23-2022 08:56 AM by J-F Garnier.)

History may help here.
In classic HP BCD machines, there are these special cases to ensure "nice" results for powers of ten:
- LOG10: check for exact power of 10,
- 10^X: check for integer,
- Y^X: check for y-mantissa=1 and x=integer.

(03-22-2022 10:08 PM)Thomas Okken Wrote: 10^2 = 99.99999999999999999999999999999999

Strangely, the Intel lib power function seems to have more problems with small numbers than with large numbers.
100 0.5 y^x ---> 9.999999999999999999999999999999999
0.01 0.5 y^x --> 9.999999999999999999999999999999998e-2
1E6000 0.5 y^x -> 1.e3000 (exact)
It seems counter-intuitive for me.

[added:]
This is the opposite of HP machines, we start to get slightly inaccurate y^x for large numbers
e.g. 1E480 0.5 y^x --> 9.99999999999E239

J-F

Thomas Okken · 03-23-2022, 08:05 AM

0.1^2=9.999999999999999999999999999999996e-3

Estimate for exactness: take the number of significant digits, that is, the mantissa length minus the number of trailing zeroes, and multiply that by the exponent; if <= 34, use squaring?

The estimate doesn't have to be sharp: multiply the mantissa by such a power of 10 that you get the smallest possible integer, then perform the repeated-squaring algorithm, and if the result is < 10^35, fix the exponent and return it, and otherwise, let bid128_pow() do its thing.

Werner · 03-23-2022, 08:49 AM

(03-23-2022 08:05 AM)Thomas Okken Wrote: Estimate for exactness: take the number of significant digits, that is, the mantissa length minus the number of trailing zeroes, and multiply that by the exponent; if <= 34, use squaring?

That doesn't work for single-digits eg 8^35. And even 11^32 < 1e34

Quote:The estimate doesn't have to be sharp: multiply the mantissa by such a power of 10 that you get the smallest possible integer, then perform the repeated-squaring algorithm, and if the result is < 10^35, fix the exponent and return it, and otherwise, let bid128_pow() do its thing.

Yes, even if you then do some superfluous work. But better that than accuracy loss ;-)
Doesn't the lib have an exact flag, like in the 71B? As long as the result is exact, keep on squaring. Once it isn't, go back and do the normal e^(x*LN(y))
Werner

Thomas Okken · (This post was last modified: 03-23-2022 05:15 PM by Thomas Okken.)

I just uploaded a set of Plus42 builds with a new pow(Phloat, Phloat) implementation. If this proves satisfactory, I'll make the same change in Free42 as well.

(03-23-2022 08:49 AM)Werner Wrote: Doesn't the lib have an exact flag, like in the 71B? As long as the result is exact, keep on squaring.

I can't find one, but my new implementation does the same thing:

https://github.com/thomasokken/plus42/bl...at.cc#L609

Thomas Okken · (This post was last modified: 03-23-2022 06:12 PM by Thomas Okken.)

There are a few optimizations that could (and should) be added: after scaling the mantissa, check if it is equal to 1, and if so, skip the whole squaring-and-multiplication loop; check if the exponent is greater than 112, and if so, jump directly to the inexact case, since no power greater than 112 of a nontrivial integer base fits in 34 digits; and if the scale times the exponent is greater than 33, jump directly to the inexact case.

That last one is based on the scale as defined by the bid128_ilogb() function, which is basically the power of 10, such that a scale of 0 means a number greater than or equal to 1 and less than 10, in other words, floor(log10(fabs(x))). When raising to a positive integer power n, the scale is guaranteed to grow by at least a factor n.

Thomas Okken · (This post was last modified: 03-23-2022 07:46 PM by Thomas Okken.)

I just uploaded a new set of Plus42 builds, with the new algorithm and the aforementioned performance optimizations in place.

Source:
https://github.com/thomasokken/plus42/bl...at.cc#L609

Albert Chan · 03-23-2022, 07:50 PM

If you don't mind cost of fma, we could square with more precisions.

Code:

def normalize(x, y):

    z = x + y

    return complex(z, x-z+y)

def sqr(z):

    x, y = z.real, z.imag

    x2 = x*x

    return normalize(x2, fma(x,x,-x2) + 2*x*y)

Above use complex part to carry the error.

(x+ε)^2 ≈ x^2 + 2xε , where x^2 = float(x^2) + fma(x, x, -float(x^2))

Lets do (1+123/2^26) ^ (2^26)

>>> from gmpy2 import *
>>> z = 1 + 123/2**26
>>> t1 = t2 = z
>>> for i in range(26): t1=t1*t1; t2=sqr(t2)
...
>>> t1
2.6192220598552021e+53
>>> t2
(2.6192220641937287e+53+1.3516279600598801e+36j)

Real part result is correct, with digits to spare.

Albert Chan · 03-23-2022, 09:11 PM

Continued on from previous post, lets add mul, and ipow

Code:

def mul(z1, z2):

    x1, y1 = z1.real, z1.imag

    x2, y2 = z2.real, z2.imag

    x3 = x1*x2

    return normalize(x3, fsum([fma(x1,x2,-x3), x1*y2, x2*y1]))

def ipow(x,n):  # assumed integer n > 0

    if n <= 1: return x

    y = ipow(sqr(x), n//2)

    return y if n%2==0 else mul(x,y)

Lets do (1+1/1e8) ^ 1e8

To represent decimal base more accurately:

exact(1+1/1e8) - float(1+1/1e8) ≈ 6.07747097092215e-017

>>> from gmpy2 import *
>>> ipow((1+1/1e8) + 6.07747097092215e-017j, 1e8)
(2.7182818148676362+1.4094245018370637e-17j)

Add parts together, convert to decimal base, we matched true result 24 digits.
Or, considering 2 doubles have about 30 digits precision, we lost 6.

2.718281814867636217652976740666852

J-F Garnier · 03-24-2022, 10:20 AM

(03-22-2022 07:11 PM)Albert Chan Wrote: x = 1e8
b = 1+1/x

b^(2^08) = 1.000002560003264002763521747927282
b^(2^13) = 1.000081923355125194292436470243333
...
b^(2^26) = 1.956365933428064586618947538663749

All terms to multiply has errors on the *same* side; losing 7 digits precision is normal.

I can't find any similar problem with Free42 Binary (still 3.0.10):

x = 2^27
b = 1+1/x
b^(2^08) = 1.000001907350445
b^(2^13) = 1.000061037018706
b^(2^26) = 1.648721267629145 (all correct)

Any reason why the Decimal and Binary versions behave differently?

J-F

Thomas Okken · 03-24-2022, 11:02 AM

The binary version simply uses the standard C library's pow() function, without any separate special case handling. So the behavior you're seeing there is whatever the standard pow() provides. (Which may not be the same everywhere!)

J-F Garnier · 03-24-2022, 01:02 PM

Ok, I see, it's clear now.
Thanks Thomas for the quick fix of the decimal version, and more generally for the good work.

J-F

Albert Chan · 03-24-2022, 04:53 PM

(03-23-2022 12:07 PM)Thomas Okken Wrote:
(03-23-2022 08:49 AM)Werner Wrote: Doesn't the lib have an exact flag, like in the 71B? As long as the result is exact, keep on squaring.

I can't find one, but my new implementation does the same thing:

Can I assume new implementation is never worse than internal bid128_pow() ?