Threaded Mode | Linear Mode

Paul Dale · 12-14-2020, 05:16 AM

(12-13-2020 06:41 PM)Albert Chan Wrote: It would be nice if Free42 exposed FMA(a,b,c) to the user ...

An alternative to FMA is an accurate cross product: ab - cd (or equally ab + cd). This can be used for a number of things including FMA.

Pauli

Werner · (This post was last modified: 12-14-2020 07:59 AM by Werner.)

(corrected)
using floor(q) now, and the 'correct or not' approach:
(replace all NOPs with LBL 00 on a real 42S, or the DM42 for the time being)

Code:

 { 45-Byte Prgm } @     X       Y       Z       T

 LBL "DIV" @            b       a

 RCL ST Y

 RCL ST Y

 /

 IP @                  q        b       a       a

 LASTX

 X<Y?

 DSE ST Y

 NOP @                        floor(q)  b       a

 R^

 STO ST Y @            a        a       q       b

 R^

 MOD @          b      a%b      a       q       q

 ISG ST L @     c

 NOP

 X<>Y

 LASTX @                c       a       a%b     q 

 MOD

 STO- ST Y @    c       a%c     a'      q       q

 X<> ST T @     c       q       a'      q

 LASTX @                c       q       a'      q

 MOD @          c       b'      a'      q       q

 X#Y?

 RCL- ST L

 X#Y?

 DSE ST T

 NOP

 R^

 END

Cheers, Werner

Werner · 12-14-2020, 07:01 AM

(12-14-2020 05:16 AM)Paul Dale Wrote: An alternative to FMA is an accurate cross product: ab - cd (or equally ab + cd). This can be used for a number of things including FMA.

Yes, but an FMA is built into the Intel library, and extended precision is not..
Werner

Albert Chan · (This post was last modified: 12-14-2020 08:08 AM by Albert Chan.)

(12-14-2020 06:59 AM)Werner Wrote: using floor(q) now, and the 'correct or not' approach:

I started with q = floor(a/b), then correct, but it is not necessary.
You can use round(a/b), ceil(a/b), IP(a/b), ... it does not matter.

The correction, r is still limited to 0, -1

Example, say we use ceil(a/b) for initial q

a/b = 10 - ε → q = 10 → r = -1
a/b = 10 +ε → q = 11 → r = -1

What if a/b = 10 ± ε, because of round-to-nearest, get rounded to exactly 10 ?
It is OK. For interger, round/ceil/floor/IP does nothing to it.

Since edge case work, so is the rest ...

Werner · 12-14-2020, 09:10 AM

That simplifies things:

Code:

 { 37-Byte Prgm } @     X       Y       Z       T

 LBL "DIV" @            b       a

 RCL ST Y

 RCL ST Y

 /

 IP @                  q        b       a       a

 Rv

 MOD @          b      a%b      a       q       q

 ISG ST L @     c

 NOP

 X<>Y

 LASTX @                c       a       a%b     q 

 MOD

 STO- ST Y @    c       a%c     a'      q       q

 X<> ST T @     c       q       a'      q

 LASTX @                c       q       a'      q

 MOD @          c       b'      a'      q       q

 X#Y?

 RCL- ST L

 X#Y?

 DSE ST T

 NOP

 R^

 END

Werner

Werner · 12-14-2020, 10:10 AM

(12-13-2020 06:41 PM)Albert Chan Wrote: Bonus: it removed the integer arguments requirement.

At the risk of being proven wrong again..
as long as b+1 is exact, (and q is not too large), the code for idiv3 works for fractional a and b, too, no?

function idiv3(a,b) -- assumed b > 0
local q, c = floor(a/b), b+1
a, b = a%b - a%c, q%c -- a-b = [2-2c, c-2]
if a==b or a==b-c then return q end
return q-1
end

Werner

Albert Chan · 12-14-2020, 04:12 PM

(12-14-2020 10:10 AM)Werner Wrote: At the risk of being proven wrong again..
as long as b+1 is exact, (and q is not too large), the code for idiv3 works for fractional a and b, too, no?

function idiv3(a,b) -- assumed b > 0
local q, c = floor(a/b), b+1
a, b = a%b - a%c, q%c -- a-b = [2-2c, c-2]
if a==b or a==b-c then return q end
return q-1
end

Yes, fractions still work! Big Grin

a = (a//b)*b + (a%b) = q*(c-1) + (a%b)

→ a%b - a%c - q%c ≡ 0 (mod c)

If c=b+1 is exact, and q is correct, above identity holds.

And, optimization for (mod c) by 2 comparisons also hold.
Checking range of each terms (note, I use opened end intervals here):

a1 = (a%b) - (a%c) = [0, c-1) + (-c, 0] = (-c, c-1)
b1 = (q%c) = [0, c)

a1 - b1 = (-c,c-1) + (-c,0] = (-2c, c-1)

a1 - b1 ≡ 0 (mod c) → a1 - b1 = 0, or -c

But, a1 - b1 = -c implied we need room, such that c+ulp(c) ≠ c
So, we switched it around, and test: (a1==b1) or (a1==b1-c)

---

c=b+1 (exactly) is equivalent to test: (b+1-1-b == 0) and (b+1-b-1 == 0)
With b>0, ulp(b) = ulp(c) = some multiples of machine epsilon.

This is lua 5.4 implmentation for floor-mod, from llimits.h

PHP Code:

#define luai_nummod(L,a,b,m)  \
  { (void)L; (m) = l_mathop(fmod)(a,b); \
    if (((m) > 0) ? (b) < 0 : ((m) < 0 && (b) > 0)) (m) += (b); }
#endif 

For our situation, m and b have the same sized ulp.
Even if it needed sign correction, (m) += (b), there no no cancellation errors.

ULP consideration is needed. Unlike truncated fmod, floor-mod might not be exact.
Eli Bendersky's post showed why fmod is exact: Computing Remainders by Doubling

We need to watch out for cancellation errors, when floor-mod fix the sign.

lua> eps = 1e-10
lua> x, y = eps%1, -eps%1
lua> x, y
1e-10 0.9999999999
lua> x+y, 1-y-x -- x+y rounded to 1, but not exactly equal 1
1 8.274037096265818e-018

Albert Chan · 12-14-2020, 06:01 PM

(12-14-2020 04:12 PM)Albert Chan Wrote: a = (a//b)*b + (a%b) = q*(c-1) + (a%b)

→ a%b - a%c - q%c ≡ 0 (mod c)

If c=b+1 is exact, and q is correct, above identity holds.

I forget to show what happened if q is not correct.
In other words, if q is correct, show:

a%b - a%c - (q±1)%c ≠ 0 (mod c)

Assumed what is to proof is actually true, subtract, we get:

±1 ≠ 0 (mod c)

Since b > 0, c = b+1 > 1, this is always true ⇒ assumption is true too.

Albert Chan · 12-15-2020, 10:37 AM

(12-14-2020 04:12 PM)Albert Chan Wrote: ULP consideration is needed. Unlike truncated fmod, floor-mod might not be exact.

For idiv3(a, b), with tiny negative a, indeed we may hit with cancellation bug.

lua> a = -0.09
lua> b = 1.1 - 1 -- so c = b+1 is exact
lua> idiv3(a , b) -- should be -1
-2

For negative tiny a = -ε

a1 = a%b - a%c = (b-ε) - (c-ε) = -1 + error
b1 = q%c = q = 0, or -1

It is this error that make idiv3 failed.

If the problem scaled to integers, the problem goes away.

lua> A, B = a*0x1p56, b*0x1p56
lua> C = B+1
lua> A%B - A%C
-1
lua> idiv3(A,B)
-1

Werner · 12-15-2020, 12:43 PM

Yes, but that is only with binary floating point, not with decimal, as in HP calcs ;-)
Werner

Albert Chan · (This post was last modified: 12-16-2020 08:30 PM by Albert Chan.)

(12-15-2020 12:43 PM)Werner Wrote: Yes, but that is only with binary floating point, not with decimal, as in HP calcs ;-)

Not true. When floor-mod "fix" sign, subtraction cancellation may occur, binary or decimal.

But, the fix is easy. (your code already fixed this, by using IP)

The trick is to start with q = ceil(a/b), or IP(a/b)

For a<0, |a|<b: min(a/b) = min(ulp(b)-b)/b) = min(ulp(b)/b) - 1 > -1

Confirm:
Free42 Binary: 2 53 X^Y 1 - 1/X 1 - → -9.999999999999999e-1
Free42 Decimal: 1E34 1 - 1/X 1 - → -9.999999999999999999999999999999999e-1

This guaranteed q start with 0, for negative tiny a.

(12-15-2020 10:37 AM)Albert Chan Wrote: For negative tiny a = -ε

a1 = a%b - a%c = (b-ε) - (c-ε) = -1 + error
b1 = q%c = q = 0, or -1

It is this error that make idiv3 failed.

If error = 0, this always work, nothing to fix.
If error ≠ 0, and q = 0, we get this:

a1 == b1 test : -1 + error == 0 → error == 1
a1==b1-c test: -1 + error == -c → error == -b

Since error is tiny, error(b-ε) ≤ ½ulp(b), error(c-ε) ≤ ½ulp(c), both tests failed.
This force correction, returning corrected q-1 = -1

This is revised idiv3 should work with non-integer inputs, if c = b+1 is exact.
I also added a fast-path, to speed things up.

PHP Code:

function idiv3c(a,b)        -- assumed b > 0
    local q, c = ceil(a/b), b+1
    if q*b > a then return q-1 end
    a, b = a%b - a%c, q%c   -- a-b = (-2c, c-1)
    if a==b or a==b-c then return q end
    return q-1
end 

Albert Chan · (This post was last modified: 12-17-2020 12:35 AM by Albert Chan.)

(12-13-2020 07:43 PM)Thomas Okken Wrote:
(12-13-2020 06:41 PM)Albert Chan Wrote: It would be nice if Free42 exposed FMA(a,b,c) to the user ...

Noted...

Free42 FMA already work in progress ! Smile

https://github.com/thomasokken/free42

With FMA, idiv simplified to this. (code optimized for speed, and removed b>0 requirement)

PHP Code:

function idiv5(a,b)
    local c = a/b
    local q = floor(c)
    if b > 0 then b=-b else a=-a end
    if q == c and fma(q,b,a) < 0 then return q-1 end
    return q
end 

Werner · 12-17-2020, 08:53 AM

(12-15-2020 02:51 PM)Albert Chan Wrote: I also added a fast-path, to speed things up.

PHP Code:

function idiv3c(a,b) -- assumed b > 0 local q, c = ceil(a/b), b+1 if q*b > a then return q-1 end a, b = a%b - a%c, q%c -- a-b = (-2c, c-1) if a==b or a==b-c then return q end return q-1 end

That fast path wouldn't be useful in my case: I wanted to use integer division to correctly split
Cc00 = Qq*Xx + Rr, where each letter is a half-length integer and Cc<Xx

For instance, in a 2-digit calculator:
2300 = 57*40 + 20, but 2300/40=58
2700 = 37*72 + 36

What I did up till now is determine Q and q separately with
Cc0 = Q*Xx + Aa
Aa0 = q*Xx + Rr

and each split Yy0=Q*Xx+Rr is carried out as
Q := Yy0/Xx;
Rr := MOD(Yy0,Xx);
if FRC(Q) >0
then Q := INT(Q);
else
if Rr>0 then
Q := ROUND(Q - Rr/Xx,0);
end;
end;

sadly, that still is faster than the integer division - mainly because the rounding part is almost never reached, certainly not in the 34-digit Free42..

Werner

Werner · (This post was last modified: 12-17-2020 10:50 AM by Werner.)

the rounding part I do as

if R>X-R then Q:= Q-1;

for inputs Cc0 and Xx, and Cc<Xx, if FP(Cc0/Xx)=0, R # X-R (round-to-even does not occur, while in Cc00/Xx it does)

Werner

Werner · (This post was last modified: 12-17-2020 07:28 PM by Werner.)

Actually that code works for Cc00 = Qq*Xx + Rr, too, save for the round-to-even cases, that are very rare.
So I can do

Code:

Qq := Cc00/Xx;

Rr := MOD(Cc00,Xx);

if FRC(Qq) >0

then Qq := INT(Qq);

else

  if Rr>0 then

    if Rr>Xx-Rr then Qq:= Qq - 1;

    elsif Rr = Xx-Rr then

    c := Xx+1;

    b := MOD(Qq,c);

    a := Rr - MOD(CC00,c);

    if NOT(a=b or a=b-c ) then Qq:=Qq - 1;

  end;

end;

and the test for Rr>0 is not necessary either

Werner

Albert Chan · 12-17-2020, 06:41 PM

(12-17-2020 12:42 PM)Werner Wrote: Actually that code works for Cc00 = Qq*Xx + Rr, too, save for the round-to-even cases, that are very rare.

Code:

Half-way-case correction ... elsif Rr = Xx-Rr then c := Xx+1; b := INT(Cc00/c); a := Rr - MOD(CC00,c); if NOT(a=b or a=b-c ) then Qq:=Qq - 1; end; end;

Your half-way-case had a typo. It should be b := Qq % c
If Qq is correct, we have:

Cc00 = Qq*Xx + Rr = Qq*c - Qq + Rr
Qq = Qq*c + Rr - Cc00

We take (mod c) for both side, b = LHS, a = RHS, and do the test.

---

If remainder() is available, we can replace (mod c) test with it.

PHP Code:

function remainder(x, y)            -- assumed x >=0, y > 0
    local r, halfy = fmod(x, y+y), y/2
    if r <= halfy then return r end -- q = (x-r)/y must be even
    r = r - y                       -- q is odd, r = (-y/2, y)
    return r < halfy and r or r - y -- halfway case: r = -y/2, q is even
end 

remainder() guaranteed that halfway case have even quotient.

For half-way case, Qq must be even (example, 11.5 -> 12, 12.5 -> 12)
We can use sign of remainder, for the half-way case correction:

→ if remaider(Cc00, Xx) < 0 then Qq := Qq - 1;

---

If Cc00/Xx fractional part is 0.5, it implied Cc000 / Xx is integer, ends in 5.

To "remove" 3 0's, and turn last digit to 5, Xx = 8*k
(Cc00 ± 100) / Xx will *not* be half-way case, since 100/8 = 12.5, not an integer.

We can use gap test to check for half-way case. This work on a 2-digits calculator:

→ if (Cc00+100)/Xx - Qq < Qq - (Cc00-100)/Xx then Qq := Qq - 1;

Example, if Cc00 = 900, Xx = 40, Qq = round(22.5) = 22

LHS = high gap = 25 - 22 = 3
RHS = low gap = 22 - 20 = 2

This meant Qq were already rounded-down, and no correction needed for it.

Werner · 12-17-2020, 07:50 PM

(12-17-2020 06:41 PM)Albert Chan Wrote: Your half-way-case had a typo. It should be b := Qq % c

Thanks, I corrected it.

Quote:If Cc00/Xx fractional part is 0.5, it implied Cc000 / Xx is integer, ends in 5.

No.. each 0 is the length of a half integer, so for a 12 digit machine it is 000000 and for Free42/DM42 it's 17 zeroes ;-)

Cc00 = Qq*Xx + Rr is the first part of dividing a multiprecision number stored in sequential registers by a number not exceeding a single register:

R1 R2 R3 R4 R5
Aa Bb Cc Dd Ee ... / Xx

At each step you have the carry from the previous step, Cc, and so you have to do
CcYY = Qq*Xx + Rr

with Rr next step's Cc.

We can only work with 2 letters at the same time, so first

Cc00 = Qq*Xx + Rr (as previously covered)
t := Yy-Xx+Rr; (because Yy+Rr may overflow)
Cc := MOD(t,Xx);
IF t>0 then
Qq := Qq + (t-Cc)/Xx + 1;
end;
Yy := Qq;

Werner

Albert Chan · (This post was last modified: 12-17-2020 10:29 PM by Albert Chan.)

(12-17-2020 06:41 PM)Albert Chan Wrote: If Cc00/Xx fractional part is 0.5, it implied Cc000 / Xx is integer, ends in 5.

To "remove" 3 0's, and turn last digit to 5, Xx = 8*k
(Cc00 ± 100) / Xx will *not* be half-way case, since 100/8 = 12.5, not an integer.

We can use gap test to check for half-way case. This work on a 2-digits calculator:

→ if (Cc00+100)/Xx - Qq < Qq - (Cc00-100)/Xx then Qq := Qq - 1;

(12-17-2020 07:50 PM)Werner Wrote: No.. each 0 is the length of a half integer, so for a 12 digit machine it is 000000 and for Free42/DM42 it's 17 zeroes ;-)

I misunderstood. "0" = h zeroes, not 1 zero (*)
But, the logic holds. Just replace "hidden" 1 with h

\(\large{CcOO \over Xx}\) is half-way cases → \(Xx = 2^{2h+1} \cdot k\)

\(\large{CcOO\;±\;IOO \over Xx}\) will *not* be half-way case, since \(
\large{10^{2h} \over 2^{2h+1}} = {5^{2h} \over 2}\), not an integer.

Gap test work with with "half-integers" too.

(*) I had assumed "0" = 1 zero, on a 2-digit calculator, and checked how the code perform.

Code:

Cases      %   Types

  404     8.2  FRAC(Qq) > 0, return Qq

 2420    48.9  Mod test, return Qq

 2050    41.4  Mod test, return Qq - 1

   76     1.5  Half-way cases (evenly split 38/38)

-------------

 4950     100

Albert Chan · (This post was last modified: 12-19-2020 12:56 AM by Albert Chan.)

remainder() based idiv is very elegant.

PHP Code:

function idiv6(a, b)    -- assumed b > 0
    local r = a/b
    local q = floor(r)
    if q == r and remainder(a, b) < 0 then q=q-1 end
    return q
end 

Above is similar to this (6a version possibly faster, only tried to get remainder sign)

PHP Code:

function idiv6a(a, b)   -- assumed b > 0
    local r = a/b
    local q = floor(r)
    if q ~= r then return q end
    r = a % (b+b)
    local R = r < b and r or r-b
    local h = 0.5*b    -- halfway
    if R < h or r == h then return q end
    return q - 1
end 

All the code for mod/half-way test is the last if line.

R < h:

if residual is less than half of b, q had rounded-down. For floor divide, no need for correction.

r == h:

if residual is half of b, q (because of round-to-even) is even.
But, r also guaranteed even q -> no need for correction.

Update: slightly faster version 6b, removed R, and r shifted, with range = [-2h, 2h)

PHP Code:

function idiv6b(a, b)   -- assumed b > 0
    local r = a/b
    local q = floor(r)
    if q ~= r then return q end
    r = a % (b+b) - b
    local h = 0.5*b     -- halfway
    if r <= -h or (r < h and r >= 0) then return q end
    return q - 1
end 

Albert Chan · (This post was last modified: 12-20-2020 11:09 AM by Albert Chan.)

The chance of hitting half-way cases are small.
We can do the remainder check by then. This made the code simpler too.

PHP Code:

function idiv6c(a, b)   -- assumed b > 0
    local r = a/b
    local q = floor(r)
    if q ~= r then return q end
    r = (a%b) * 2
    if r == b then r = a % (b+b) end
    return r > b and q-1 or q
end 

Example, for 2-digit calculator, above/below halfway decides how to correct q

a/b = 700/22 = 31.81..., rounded to q = 32
ulp = (a%b)/b = 18/22 ≈ 0.81 ULP, above halfway, return q-1 = 31

a/b = 700/23 = 30.43..., rounded to q = 30
ulp = (a%b)/b = 10/23 ≈ 0.43 ULP, below halfway, return q = 30

---

For halfway case, r is updated, r = a % (b+b)
Previously, r = b, this implied updated r = b/2, or 3b/2
To maintain even q, and remainder range of ±b, 3b/2 got shifted down 2b, to -b/2

Only -b/2 remainder require correction: a/b = q - 1/2, rounded-up to q, floor-divide should be q-1

Example, for 2-digit calculator

a/b = 460/40 = 11.5, rounded to q = 12 (half-way to even)

r = (a%b)*2 = 40 = b --> signal halfway case
r = a%(b+b) = 60 --> signal remainder = -20, floor-divide should return q-1 = 11

Reuse the code for non-halfway: return r > b and q-1 or q

For above example, r=60, b=40, thus it return q-1 = 11, as expected

Update: removed variable h=b/2. This made the code more robust.

Previously, we had r = a%b, followed by r = a%(b+b) if halfway.
Last line was return r > h and q-1 or q

For halfway case, we may hit with *slight* floor-mod rounding.

If a is negative, a % (b+b) = (b+b) - |a| % (b+b) = 4h - 3h = h
But, 3h term might not be exact, thus last h may have *slight* rounding error.

Update 2: fix a typo for halfway test, r == 0 should be r == b