Threaded Mode | Linear Mode

Albert Chan · 02-07-2024, 12:47 AM

(04-10-2023 06:47 PM)Albert Chan Wrote: I considered Kahan's clog(z), but Python's cmath.log code is simpler.

Code:

function complex.log(z) local x, y = I.real(z), I.imag(z) local h = hypot(x, y) if 0.71 <= h and h <= 1.73 then if abs(x) < abs(y) then x,y = y,x end h = log1p((x+1)*(x-1) + y*y)/2 else h = log(h) end return I.new(h, I.arg(z)) end

I found a better way to calculate (x^2-1) = (x-1)*2 + (x-1)^2
If x is slightly bigger than 1, RHS is sum of 2 positive terms, no cancellation errors.
If x is slightly below 1, statistically, errors is much smaller than doing (x+1)*(x-1)

How to calculate 1-x·x in floating-point
To SSE, or not to SSE, that is the question

I might as well fix when (x,y) both finite, yet hypot(x,y) overflowed.
Also, optimized a bit, not to use hypot() until really needed.

Clog code for lua lcomplex.c

Code:

static Complex Clog(Complex z) {

    double x = Real(z), y = Imag(z);

    double h = x*x + y*y;

    double a = atan2(y,x);

    if (0.5 < h && h < 3.0) {

        if ((h=fabs(x)) < (y=fabs(y))) {h=y; y=x;}

        h -= 1; /* x^2-1 = 2*(x-1) + (x-1)^2 */

        h = log1p(2*h + h*h + y*y)/2;

    } else if (isnormal(h)) {

        h = log(h)/2;

    } else if (isinf(h)) {

        h = log(hypot(x*.5,y*.5)) + M_LN2;

    } else

        h = log(hypot(x,y));

    return CMPLX(h, a);

}

Albert Chan · (This post was last modified: 02-07-2024 01:22 AM by Albert Chan.)

Code:

dd = require'dd' -- David Bailey's double-double

function ratio_relerr(n, lo, hi)

    local e1, e2, x = 0, 0

    for i = 1,n do

        repeat x = lo + (hi-lo)*random() until x~=1

        local g = dd.new(x)

        g = g*g-1; e1 = e1 + abs(((g-(x+1)*(x-1))/g):tonumber())

        x = x - 1; e2 = e2 + abs(((g-(2*x + x*x))/g):tonumber())

    end

    return e1/e2

end

.5 < h < 3
min(|x|) = sqrt(.5/2) = 0.5
max(|x|) = sqrt(3/1) ≈ 1.732

lua> ratio_relerr(1e6, .5, 1)
1.1272738159812026
lua> ratio_relerr(1e6, 1, 1.732)
1.4886689516224851
lua> ratio_relerr(1e6, .5, 1.732)
1.3195779095764026

OP Clog, (h=|x|-1, 2*h + h*h) statistically more accurate than (x+1)*(x-1)

Albert Chan · (This post was last modified: 02-07-2024 12:35 PM by Albert Chan.)

I was confused how giorgiomcmlx produce plots in How to calculate 1-x·x in floating-point
Then, I figure this out ... we are measuring different things!

Previous post compared ratio of relative errors, measure against exact (x^2-1)

Plots does absolute errors against correctly-rounded (x^2-1)
(negated, but error have same plot, |exact-approx| = |(-exact)-(-approx)|)

We can't get any better than correctly rounded!

We don't need double-double (~106 bits), just fma(x,x,-1) for the reference.

Code:

function ratio_relerr(n, lo, hi)

    local e1, e2, x, y = 0, 0

    for i = 1,n do

        repeat x = lo + (hi-lo)*random() until x~=1

        y = fma(x,x,-1)

        e1 = e1 + abs((y-(x+1)*(x-1))/y)

        x = x-1

        e2 = e2 + abs((y-(2*x + x*x))/y)

    end

    return e1/e2

end

lua> ratio_relerr(1e6, .5, 1)
1.3800592174099267
lua> ratio_relerr(1e6, 1, 1.732)
2.9941118786964216
lua> ratio_relerr(1e6, .5, 1.732)
2.3113799166880664

lua> ratio_relerr(1e6, 0.9, 1.1)
18.32637500303419
lua> ratio_relerr(1e6, 0.99, 1.01)
192.75860912929932
lua> ratio_relerr(1e6, 0.999, 1.001)
2008.010676441317

This now matched the plot when x ≈ 1
[Image: omxsq__erraccum__www.png?w=712]

Albert Chan · (This post was last modified: 02-11-2024 04:33 PM by Albert Chan.)

(02-07-2024 12:49 AM)Albert Chan Wrote: .5 < h < 3
min(|x|) = sqrt(.5/2) = 0.5
max(|x|) = sqrt(3/1) ≈ 1.732

We can always scale z to z', with its h = |z'|^2 within this range.
Clog can be coded using only log1p formula, without other branches.

ln(z) = ln(z') + b*ln2 , where z' = z/2^b
ln2 = ln2_hi + ln2_lo

b = binary exponent of z parts = -1074 .. 1023 +1 (*)
if ln2_hi at most 53-11 = 42 bits, then b*ln2_h is exact.

Update: I use constants from https://www.netlib.org/fdlibm/e_log.c
ln2_hi has only 32 bits, which is ok (we have ln2 of 32+53 = 85 bits)
This give more room for future use, say, with fractional b's.

Code:

static const double

ln2_hi  =  6.93147180369123816490e-01,    /* 3fe62e42 fee00000 */

ln2_lo  =  1.90821492927058770002e-10;    /* 3dea39ef 35793c76 */

static double shift_ln2(double x, double b) {    

    double y = b*ln2_hi, z = b*ln2_lo;

    double s = x + y;

    return z - (s-y-x) + s; // Kahan sum x + b*ln2

}

Clog, version 2

Code:

static Complex Clog(Complex z) {      

    double x = Real(z), y = Imag(z);

    double a = atan2(y,x), b;

    if ((x=fabs(x)) < (y=fabs(y))) {b=x; x=y; y=b;}

    if (isinf(b=logb(x))) return CMPLX(b,a);

    x = ldexp(x, -b);

    y = ldexp(y, -b);

    if (x*x+y*y > 2.718281828) {x*=.5; y*=.5; ++b;}

    x -= 1;

    x = log1p(2*x + x*x + y*y)/2;

    return CMPLX(shift_ln2(x,b), a);

}

Which versions are best depends on usage pattern.
It also depends on speed/accuracy of built-in log1p vs log.

(*) b might be +1 higher, to reduce h down to below e
Code doesn't use h=x²+y², but h = [1,8) → [e/4,e) → x ≈ [.5829, 1.649)

Why e, not 3 Wrote:Let r = ln(|z'|) = ln(h)/2

h = [1, e)      --> r = [0, .5)
h = [e, 8)      --> r = [.5, 1.040)
h = [e/4, 2)   --> r = [-.1931, .3466)

if h < e, then (ulp of r) ≤ (ulp of 1/4)

Albert Chan · 02-08-2024, 09:04 PM

Clog version 1 has an issue if |z| is subnormal, does not have full bits precision.
This implied z complex parts also subnormal, calculated |z| may be inaccurate. (*)

--> if |z| is subnormal, ln(|z|) for re(ln(z)) may be inaccurate.

We can fix version 1, but it look ugly (see Python c_log edge case branches)

Python c_log re(ln(z)) issues Wrote:1. z complex parts subnormal --> |z| may be subnormal.
2. z complex parts finite, but |z| overflowed.
3. |z| ≈ 1
4. |z| = 0

Clog version 2 is so much cleaner, fixed all 4 issues at the same time!

Yes, it has scaling cost, but it also produced more accurate real part.
For huge or tiny |z|, real part is practically correctly rounded!

p3> from cmath import *
p3> t = float.fromhex('0x1p-1074')
p3> log(complex(3*t,5*t))
(-742.6768916590732+1.0303768265243125j)

lua> t = 0x1p-1074
lua> I(3*t, 5*t) :log() -- version 2
(-742.6768916590731+1.0303768265243125*I)

(*) subnormal |z| can be calculated with scaled z, getting result to full precision.
However, it then had to un-scale before returning results, losing whatever gain it got.

Above example, t = 0x1p-1074 is minimum size of IEEE float. All float must be multiples of t

z = I(3*t, 5*t) → |z| = √(34) t, rounded up to 6t

For version 1, we would get ln(|z|) = ln(6t) = ln(6) - 1074*ln(2) ≈ -742.648, only 4 digits accuracy.

Albert Chan · (This post was last modified: 02-11-2024 02:56 PM by Albert Chan.)

Here is updated OP version 1, fixed its subnormal accuracy issue.

Instead of LN2 shift for overflowed h, (DBL_MANT_DIG*M_LN2) for subnormal, I combined them.

relative error = (exact - approx) / exact

relerr(ln(2)) ≈ 3.346E-17
relerr(ln(2)*53) ≈ 1.835E-17
relerr(ln(2)*61) ≈ -6.781E-19

Last constant has 59 bits accuracy (~ 18 digits), the most accurate for our domain.

float(ln(2)*61) = 0x1.52417db067f38p+5 = 42.28197801415666390...

Clog Version 1b

Code:

static Complex Clog(Complex z) {

    double x = Real(z), y = Imag(z);

    double h = x*x + y*y;

    double a = atan2(y,x);

    if (0.5 < h && h < 3.0) {

        if ((h=fabs(x)) < (y=fabs(y))) {h=y; y=x;}

        h -= 1; /* x^2-1 = 2*(x-1) + (x-1)^2 */

        h = log1p(2*h + h*h + y*y)/2;

    } else if (isnormal(h)) {

        h = log(h)/2;

    } else {

        double m = 0x1p61, b = -M_LN2*61;

        if (isinf(h)) {m = 1/m; b = -b;};

        h = log(hypot(m*x,m*y)) + b;

    }

    return CMPLX(h, a);        

}

(02-08-2024 09:04 PM)Albert Chan Wrote: lua> t = 0x1p-1074
lua> I(3*t, 5*t) :log() -- version 2
(-742.6768916590731+1.0303768265243125*I)

We get exactly the same answer for version 1b

Update: previously, I used shift of (M_LN2*732) to avoid hypot() function.
With ULP of log part different than final result, answer is always double rounded.
Double rounding is acceptable if log part ULP is much smaller, but it isn't.

Also, for big |log(x)|, error caused by hypot() inside log may be ignored.

log(x*(1-ε)) ≈ log(x) + (-xε) / x = log(x) - ε

if relerr(x) is ε, then relerr(log(x)) is ε/log(x), much smaller than ε

Albert Chan · 02-11-2024, 06:32 PM

(02-10-2024 09:55 PM)Albert Chan Wrote: Update: previously, I used shift of (M_LN2*732) to avoid hypot() function.
With ULP of log part different than final result, answer is always double rounded.

Version 1b with smaller shift (M_LN2*61) avoid most double rounding problem, but not all.
There are still holes where log part and final result ULP size are different.

This version traded hypot() for frexp(), and have no performance penalty.
I still use 732 for rough shift, but any number that get finite (x²+y²) is OK

(x²+y²) = h * 2^n
log(x²+y²)/2 = log(h)/2 + (n/2)*log(2)

With h = [0.5, 1), log(h)/2 = [-.3466, -0). max ulp = (ulp of 1/4), same as version 2

Clog Version 1c

Code:

static Complex Clog(Complex z) {

    double x = Real(z), y = Imag(z);

    double h = x*x + y*y;

    double a = atan2(y,x);

    if (0.5 < h && h < 3.0) {

        if ((h=fabs(x)) < (y=fabs(y))) {h=y; y=x;}

        h -= 1; /* x^2-1 = 2*(x-1) + (x-1)^2 */

        h = log1p(2*h + h*h + y*y)/2;

    } else if (isnormal(h)) {

        h = log(h)/2;

    } else {

        double m = 0x1p732, b = -732;

        if (isinf(h)) {m = 1/m; b = -b;};

        x *= m; y *= m;

        int n;

        h = frexp(x*x+y*y, &n); 

        h = shift_ln2(log(h)/2, b + .5*n);

    }

    return CMPLX(h, a);        

}