Threaded Mode | Linear Mode

Dieter · (This post was last modified: 05-04-2014 05:33 PM by Dieter.)

(05-04-2014 06:29 AM)Paul Dale Wrote: I can only agree with you here. I'm using a fairly generic Newton based solver for all the quantile functions except the normal QF. I know it can be very slow, that is the price for having a single general solver.

Even a simple Newton solver should converge quadratically. I think the slow execution of the current 34s quantile implementations is due to an error in the initial guess for the solver. It should be within approx. 10% of the true quantile. So even a Newton solver should not require much more than five iterations, which means 6 - 10 s in user code.

Quote:The 34S certainly doesn't have space for a C implementation of these. I originally implemented all the distribution functions in C but had to rewrite them as keystroke programs for space reasons. Adding additional custom solvers is likewise going to consume precious bytes. Thus, the 34S is unlikely to see any distribution speed ups. However, the 31S quite possibly will.

Great. And many things are easier to do as there is no DP mode. ;-)

Quote:I'd like to rewrite the distribution code there in native C -- I can't just drop in the original distribution code, we moved on algorithmically.
So, yes I'd love to hear your thoughts on improving these functions.

I recently tried some improvements, especially with the solver. The following ideas refer to the Student distribution, but they should be applicable for the Chi² and Fisher cases as well. The following thoughts assume the quantile is ≥ 0 and p is the upper tail probability (i.e. p ≤ 0.5). As usual, the rest is done with a simple transformation, e.g. QF(1–p) = –QF(p).

So, what can be done?

1. Use a Halley solver. It converges very fast and it can be implemented easily since the second derivative is a function of the first (i.e. the pdf).

Student:
f"(x) = -(n+1)x / (n+x^2) * pdf

Chi²:
f"(x) = (n-x-2)/(2x) * pdf

Fisher:
f"(x) = -(n(m(x-1)+2) + 2mx) / (2x(mx+n)) * pdf

where n (resp. m and n) are the degrees of freedom.

So f"(x) simply is the pdf times a factor. This way a Halley iteration e.g. for the Student case can be written as follows:

r := (StudentUpperCDF(x) - p) / StudentPDF
d := r / ( r * (n+1)x / 2(n+x^2) - 1 )
x := x - d

Due to the roughly cubic convergence, the iteration usually may quit as soon as |d| < 1E–10*x for 30+ digit accuracy. In my 34s program I tried a conservative CNVG 00 (i.e. rel. error < 1E–14) which usually returns results that are as good as a user code program gets when running in DP mode (approx 30-34 digits).

The same idea can be used with the Chi² and Fisher quantile.

2. I slightly modified the initial guess for the Chi² quantile and I tried a new approach in the Student case, based on a 1992 paper on bounds of various quantiles by Fujikoshi and Mukaihata. The idea is a simple transformation of the normal quantile z:

t = sqrt(n * (e^{a * z^2/n} - 1 ))

Where a is close to 1 and a function of n (or simply 1 so that it can be omitted). I used a = 1 + 1/(e*n) which works very well with the Normal estimate I used (simply the one in the Normal quantile function).

This works fine for the center but less so for the distribution tails. For all p < 12^–n I use a slight modification of the tail approximation suggested years ago:

u = 2 * p * n * sqrt(pi / (2 * n - 0.75))
t = sqrt(n) / u ^ (1 / n)

The 0.75 originally was a 1, but changing this value improves the results for low n.

Although the Student estimate originally was intended for n≥3 it also works well for n=2 or even n=1. Usually it converges within three iterations, here and there in maybe four. This means that the code for n=1 and n=2 (direct solutions) may be omitted.

For x close to 0 (i.e. p near 0.5) the expression t_u(x) – p loses accuracy due to digit cancellation. So I used the same idea as in the Normal quantile routine and had this value calculated differently for small t, using the incomplete Beta function. Yes, that's why I found the bug discussed in this thread. ;-)

Code:

LBL 'TQF'  ' T Quantile Function

ENTER      ' probability in X, degrees of freedom in register J

+/-

INC X

MIN

CF 00

x!=? L

SF 00       ' set flag 00 if p > 0.5

CF 01       ' clear error flag

STO 00      ' save p in R00

# 005

STO 01      ' do not more than 5 iterations

# 012

RCL J

+/-

y^x

RCL 00

x>? Y

GTO 00

RCL J       ' estimate for small p

STO+ X

×

# pi

RCL L

# 1/2

x²

RCL+ L    ' = 0.75

-

/

sqrt

×

RCL J

xrooty

RCL J

sqrt

x<> Y

/

GTO 01

LBL 00     ' estimate for low and moderate t

XEQ 'GNQ'  ' get guess for the normal quantile

x²

# eE

RCL× J

1/x

INC X

×

RCL/ J

e^x-1

RCL× J

sqrt

LBL 01     ' iteration starts here

FILL

# 1/2

x>? Y

GTO 02

DROP

t_u(x)

RCL- 00

GTO 03

LBL 02

DROP

x²

ENTER

RCL+ J

/

# 1/2

RCL× J

RCL L

IBeta

RCL× L

+/-

# 1/2

RCL- 00

+          ' cdf(t) - p = (0,5-p) - 1/2 IBeta(x=t^2/(n+t^2), a=1/2, b=n/2)

LBL 03

RCL T

t_p(x)

/

ENTER

RCL× T

RCL J

INC X

×

RCL T

x²

RCL+ J

STO+ X

/

DEC X

/

-

CNVG? 00

SKIP 003

DSE 01

GTO 01

SF 01       ' Raise error flag if no convergence after 5 iterations

FS?C 00

+/-         ' adjust sign

FS?C 01

ERR 20      ' if error, display "no root found" and exit with last approximation

END

LBL 'GNQ'   ' input: p =< 0.5

# 232       ' output: Normal estimate > 0

SDR 003

x<>Y

x>? Y

GTO 00

FILL        'Normal estimate for p up to 0.232

LN

STO+ X

+/-

ENTER

DEC X

# pi

×

STO+ X

sqrt

RCL× T

LN

STO+ X

+/-

sqrt

x<>Y

# 004

×

1/x

+

RTN

LBL 00   ' Normal estimate for p close to the center

+/-

# 1/2

+

# pi

STO+ X

sqrt

×

ENTER

x³

# 006

/

+

RTN

END

For best accuracy this should run in DP mode. The program exits if the last two approximations agree in approx. 14 digits. At this point the result usually carries 30+ valid digits.

Here are some examples:

Code:

10 STO J

   0,1 XEQ"TQF" => -1,372183641110335627219156967662554 in 3,2 s

   exact result:   -1,37218364111033562721915696766255392

 1E-20 XEQ"TQF" => -256,4346993185261855315362349874343 in 3,1 s

   exact result:   -256.434699318526185531536234987434334

 0,5 ENTER 1E-16 -

       XEQ"TQF" => -2,569978034930492409497513483729480 E-16 in 1,9 s

   exact result:   -2,56997803493049240949751348372947856 E-16

 1 STO J

 0,05  XEQ"TQF" => -6,313751514675043098979464244768186 in 3 s

   exact result:   -6,3137515146750430989794642447681860594

 1E-10 XEQ"TQF" => -3183098861,837906715272955512330630 in 1,9 s

   exact result:   -3183098861.837906715272955512330627466

100 STO J

 0,025 XEQ"TQF" => -1,983971518523552286595184867990389 in 5,1 s

   exact result:   -1,983971518523552286595184867990339165

Please note that the results for n=1 are exact to 33 resp. 34 digits while the current implementation (that calculates the result directly) gets only 32 resp. 24 (!) digits right. Either the internal tangent function is not that accurate or the current implementation does not evaluate the quantile as 1/tan(1E–10*180°) which would yield a nearly perfect result. ;-)

Dieter

Paul Dale · 05-05-2014, 02:20 AM

(05-04-2014 05:26 PM)Dieter Wrote: ... 24 (!) digits right. Either the internal tangent function is not that accurate or the current implementation does not evaluate the quantile as 1/tan(1E–10*180°) which would yield a nearly perfect result. ;-)

The one df case is implemented as tan(pi(x-1/2)) which loses digits in cases like these. I missed the transformation to -cot(pi x) when I implemented this Sad

I'm pretty sure TAN is up to the task.

More usefully, your QF implementation isn't much longer than the current one and it avoids the generic solver. I might see if I can substitute it on the 34S too. It isn't all that straightforward because several functions you use are implemented in XROM and can be used directly. Rather, they have to be called using XEQ and they'll mess the stack up.

- Pauli

Paul Dale · 05-05-2014, 08:26 AM

After a bit of debugging, the new Student-T quantile function code appears to be working on both the 34S and the 31S. The next builds will get this Smile

- Pauli

Paul Dale · 05-05-2014, 11:02 AM

Is it worthwhile using CNVG? 03 which will choose 1e-24 or 1e-32 for the convergence criteria (this command is more conservative in XROM than in user code)? This will mean an additional iteration but will give better result in DP mode. I'd probably also increase the iteration limit to 6 or 7 to avoid the error case.

I don't see why it would be worthwhile given the fast convergence of Halley's method. We'd only be getting at most a couple of extra digits in DP mode.

- Pauli

Paul Dale · 05-05-2014, 11:15 AM

The normal QF estimate used in your Student's-T is slightly different to the one I used in the normal.wp34s file. Only on the < .23 branch in the code.

Is the difference significant?

- Pauli

Dieter · 05-05-2014, 08:14 PM

(05-05-2014 11:15 AM)Paul Dale Wrote: The normal QF estimate used in your Student's-T is slightly different to the one I used in the normal.wp34s file. Only on the < .23 branch in the code.
Is the difference significant?

No. I just simplified the last term 0,254/u to 1/4u. The original version even is a tiny bit better. So you should leave it the way it is. I remember it was also fine-tuned to make sure the Normal quantile safely returns 32...34 digits in just two iterations.

If all the distributions get reworked we could also switch to a simple rational approximation for the Normal estimate. I suppose a (2;2) type, i.e. with five real constants, should be sufficiently accurate. Since the initial guesses for the other distributions rely on a Normal guess we just have do decide now whether to switch or not. Designing such an approximation would require some time. On the other hand, everything also works fine with the current solution.

Dieter

Dieter · 05-05-2014, 08:30 PM

(05-05-2014 11:02 AM)Paul Dale Wrote: Is it worthwhile using CNVG? 03 which will choose 1e-24 or 1e-32 for the convergence criteria (this command is more conservative in XROM than in user code)? This will mean an additional iteration but will give better result in DP mode.

If it really improves the result in DP mode and DP mode is available (i.e. on the 34s) it should be done - "accuracy is not optional". ;-) However, if the proposed routine runs in DP mode I could not find a case where the result had less than 30 correct digits.

Quote:I'd probably also increase the iteration limit to 6 or 7 to avoid the error case.

During my tests the result always converged within 3, sometimes 4 iterations. A possible fifth loop was added "just to be sure". The error message was essentially included for testing to get a feedback if something really goes wrong. It should never show up, and during my tests it never did.

Quote:I don't see why it would be worthwhile given the fast convergence of Halley's method. We'd only be getting at most a couple of extra digits in DP mode.

If (!) we can get some additional digits and (!) DP mode is available (34s) another iteration is fine by me. But honestly I do not think that this will happen. Do you have an example where a fifth iteration yields better accuracy?

FTR: for my own tests I used a threshold of 1E-8 for a SP result. I think 1E-14 is on the safe side, even in DP mode.

Dieter

Dieter · (This post was last modified: 05-05-2014 09:45 PM by Dieter.)

(05-05-2014 02:20 AM)Paul Dale Wrote: The one df case is implemented as tan(pi(x-1/2)) which loses digits in cases like these.

If this really got implemented it should quit with an error message as soon as x resp. p is small enough so that 0,5–p equals 0,5. E.g. p=1E-50 becomes tan(pi/2) which is not defined. But since there is no exact n-digit value for pi/2, the result becomes meaningless: my 34s returns t^–1(1E–50, 1) as +1,7924 E+33. Yes, a positive result. For comparison, the true result is –3,1831 E+49. Actually +1,7924 E+33 is returned for any (!) input below 1E-34. Arguments below 0,1 lose one digit per magnitude. That's why t^–1(1E–10, 1) has merely 34–10 = 24 valid digits. And so the result is bogus for any input below 1E–34. In other words, the current implementation is not just inaccurate, it's simply buggy. Sorry. ;-)

Quote:I missed the transformation to -cot(pi x) when I implemented this I'm pretty sure TAN is up to the task.

That's exactly why the Normal and now the Student QF code use two different ways of evaluating CDF(x) – p. A special one for cases close to the center and the straightforward method otherwise.

Likewise, if the Student dof=1 case is calculated directly, two different methods should be used as well:

if p ≤ p_crit: t = –cot(pi · p)
if p > p_crit: t = tan(pi · (p – 1/2))

Where p_crit can be anything between 0,1 and 0,4.

Quote:More usefully, your QF implementation isn't much longer than the current one and it avoids the generic solver. I might see if I can substitute it on the 34S too. It isn't all that straightforward because several functions you use are implemented in XROM and can be used directly. Rather, they have to be called using XEQ and they'll mess the stack up.

I see. Please also do your own tests. And I would love to know how fast this might run in compiled C code, e.g. on a 31s. ;-)

Dieter

Paul Dale · 05-05-2014, 09:40 PM

Thanks for the clarification. I'll use the existing normal QF estimate in the Student's T code to save some bytes.

- Pauli

Paul Dale · 05-05-2014, 09:49 PM

(05-05-2014 08:30 PM)Dieter Wrote: However, if the proposed routine runs in DP mode I could not find a case where the result had less than 30 correct digits.

The XROM code runs in double precision regardless of the user's setting. This is good for rounding back to single precision.

Changing the convergence check will pretty much force an extra iteration to detect the convergence. Thus it will cause a slow down. So an increase to the maximum to 6 seems reasonable "just to be sure".

No, I don't have an example that increases accuracy. I'm leaning towards the minimum of thirty digits being sufficient in DP, since we don't guarantee accuracy there.

- Pauli

Paul Dale · 05-05-2014, 09:57 PM

(05-05-2014 09:13 PM)Dieter Wrote: If this really got implemented it should quit with an error message as soon as ...

I'm slightly embarrassed that this bit of code made it in and has been unnoticed for so long Sad

I don't know what I was thinking when I implemented it -- probably rushing through recoding all the distributions.

The good news is the modified Student's T code is in both the 31S and the 34S. Should be a nice speed up and bypasses the poor implementation of the one degree of freedom case.

Quote:And I would love to know how fast this might run in compiled C code, e.g. on a 31s. ;-)

I suspect not as much faster as we'd like. The incomplete beta function, which is the heart of of the CDF, is in C already. Likewise for LogGamma in the PDF. The rest is predominately shuffling and scaling values which would speed up in C but I doubt they are the bulk of the time.

C would give five more digits in intermediates. This would justify an extra iteration in DP mode I think.

- Pauli

Dieter · (This post was last modified: 05-05-2014 10:05 PM by Dieter.)

(05-05-2014 09:49 PM)Paul Dale Wrote: Changing the convergence check will pretty much force an extra iteration to detect the convergence. Thus it will cause a slow down. So an increase to the maximum to 6 seems reasonable "just to be sure".

Changing the convergence check from 1E-14 to 1E-24 equals roughly half an iteration (on average). ;-) I am not sure if I understand this correctly, but do you say that CNVG? 00 which usually tests a relative error < 1E–14 automatically switches to 1E–24 if run in XROM? I think that 1E–14 is on the safe side.

Quote:No, I don't have an example that increases accuracy. I'm leaning towards the minimum of thirty digits being sufficient in DP, since we don't guarantee accuracy there.

Yes, that makes sense since this is what can be achieved with 34-digit user code. The one or other last digit may be off. For my earlier humble contributions to the 34s project I tried to get 32 digits +/– 1 ULP.

Dieter

Paul Dale · 05-05-2014, 10:16 PM

(05-05-2014 10:04 PM)Dieter Wrote: I am not sure if I understand this correctly, but do you say that CNVG? 00 which usually tests a relative error < 1E–14 automatically switches to 1E–24 if run in XROM? I think that 1E–14 is on the safe side.

No. CNVG? 00 will always check against 1E-14. Depending on mode, CNVG? 03 chooses either 1E-14 or 1E-32 in user code but 1E-24 or 1E-32 in XROM.

- Pauli

Dieter · (This post was last modified: 05-05-2014 11:10 PM by Dieter.)

(05-05-2014 10:16 PM)Paul Dale Wrote: CNVG? 00 will always check against 1E-14. Depending on mode, CNVG? 03 chooses either 1E-14 or 1E-32 in user code but 1E-24 or 1E-32 in XROM.

I now did the following test: After the quantile has been computed, a relative error of 1E–12 was added back to the result and the iteration restarted. Some code was added to display the relative error of the current approximation. As expected, the next iteration showed 1E–12. The following one mostly displayed a plain zero (no correction, result is considered exact) or an error near 1...5E–34. In two cases a correction near 1...3E–33 was applied.

In other words, even if the last correction was 1E–12, it looks like we can expect 32...34 valid digits. So I assume a test against 1E–14 should be on the safe side.

Dieter

Paul Dale · 05-06-2014, 12:54 AM

(05-05-2014 11:09 PM)Dieter Wrote: In other words, even if the last correction was 1E–12, it looks like we can expect 32...34 valid digits. So I assume a test against 1E–14 should be on the safe side.

Wouldn't we have to force one additional iteration after achieving the convergence precision to guarantee this level of accuracy? A value that differed from the previous by, say, 1E-15, would pass the convergence test yet not be good.

I'm thinking along the lines of:

Code:

FS? .02

JMP exit

CNVG? 00

SF .02

instead of the:

Code:

CNVG? 00

JMP exit

- Pauli

Dieter · (This post was last modified: 05-06-2014 11:08 PM by Dieter.)

(05-06-2014 12:54 AM)Paul Dale Wrote: Wouldn't we have to force one additional iteration after achieving the convergence precision to guarantee this level of accuracy? A value that differed from the previous by, say, 1E-15, would pass the convergence test yet not be good.

If the last correction was 1E-15, this was the error of the previous approximation. Since Halley's method converges roughly cubic, the expected remaining error after this correction is below 1E-40.

Quote:I'm thinking along the lines of:

Code:

FS? .02 JMP exit CNVG? 00 SF .02

instead of the:

Code:

CNVG? 00 JMP exit

That's the method that is used in Lambert's W function as well as (sort of) in the Integrate routine. But I think we can save this additional iteration.

I set up a test program for the emulator and tested a dozen of different input values from 0,4999999999999999 to 1E-300, with 1 to 10000 degrees of freedom (including every single value between 1 and 100). The results showed an interesting pattern:

1. The number of uncertain (last) digits in a perfectly converged result is quite uniformly the number of digits in the degrees of freedom. In other words: for n=1...9 you can expect 33 valid digits, for 10...99 it's 32 digits, for n=100..999 it's 31 digits, and so on. I assume this is because of the term x² + n within the iteration loop.

2. This level of accuracy is reached as soon as the first 12 digits agree. So with 14 matching digits we should be on the safe side, as already mentioned. I think that this should be sufficient even for a 39-digit C function.

3. and most important: further iterations do not seem to guarantee a better result. Unless CDF(t) – p is exactly zero, they just cycle through various permutations of the last digits.

Example: p=0,1 with 20 degrees of freedom.

Initial guess for t: 1,312400768070987055677223263645284
After 1st iteration: 1,325340121782327835195769644183172
After 2nd iteration: 1,325340706985046343045588015602724
After 3rd iteration: 1,325340706985046343099953113344022
*** at this point CNVG 00 considers the quantile converged ***
After 4th iteration: 1,325340706985046343099953113344020
After 5th iteration: 1,325340706985046343099953113344022
After 6th iteration: 1,325340706985046343099953113344020
...

True 34-digit value: 1,325340706985046343099953113344018

4. There are cases where an additional iteration corrects the last digit(s) and then the approximation does not change any more. This does not neccessarily mean that the added iteration improves the result.

Example: p=0,05 with 20 degrees of freedom

After 3rd iteration: 1,724718242920787272832061725382084

This is the exact 34-digit result. Further iterations change this to ...082 and then stay there.

That's why I think that CNVG 00 should be fine here.

Dieter

Paul Dale · 05-07-2014, 12:12 AM

(05-06-2014 11:01 PM)Dieter Wrote: If the last correction was 1E-15, this was the error of the previous approximation. Since Halley's method converges roughly cubic, the expected remaining error after this correction is below 1E-40.

I understand now.
We'll stick with CNVG? 00.

- Pauli

Dieter · 05-08-2014, 06:26 PM

(05-07-2014 12:12 AM)Paul Dale Wrote: I understand now.
We'll stick with CNVG? 00.

I now tend to think the accuracy limit is not in the quantile code but in the Student CDF itself (resp. in the probably underlying Incomplete Beta function). Comparing some results with the 34s t_u function showed the well-known pattern:

Code:

t=1, dof=5

true:  0.1816087338245613128000742599987684

WP34s: 0.1816087338245613128000742599987684   = 34 correct digits

t=2, dof=5

true:  0.05096973941492917812268055292114372

WP34s: 0.05096973941492917812268055292114375  = 33 correct digits

t=1, dof=20

true:  0.1646282885858545321288508309266107

WP34s: 0.1646282885858545321288508309266112   = 32 correct digits

t=2, dof=20

true:  0.02963276772328523648480977051479628

WP34s: 0.02963276772328523648480977051479624  = 33 correct digits

t=2, dof=100

true:  0.02410608936556683980047800955979969

WP34s: 0.02410608936556683980047800955980203  = 31 correct digits

t=2, dof=300

true:  0.02320076049157454520121192249335514

WP34s: 0.02320076049157454520121192249334880  = 31 correct digits

t=1, dof=2000

true:  0.1587157390503716353822190042123339

WP34s: 0.1587157390503716353822190042125858   = 30 correct digits

t=2, dof=2000

true:  0.02281763665584740680547795791001989

WP34s: 0.02281763665584740680547795791000417  = 30 correct digits

t=1, dof = 20000

true:  0.1586613031239535484612795731910335

WP34s: 0.1586613031239535484612795731911468   = 30 correct digits

This pattern is consistent with information on the accuracy of the one or other Incomplete Beta algorithm included in mathematical software libraries. As a rule of thumb the number of potentially inaccurate digits seems to match the number of digits in the degrees of freedom.

Which leads to the question if an upper limit in the dof value would make sense. For instance a maximum of 1000. This would apply to the Student as well as the Chi² and F distribution.

Dieter

Paul Dale · 05-08-2014, 10:28 PM

(05-08-2014 06:26 PM)Dieter Wrote: Which leads to the question if an upper limit in the dof value would make sense. For instance a maximum of 1000. This would apply to the Student as well as the Chi² and F distribution.

Given we're only promising accuracy for single precision, there are many orders of magnitude left before the risk of digit loss becomes significant. I don't think I'd want to impose a limit suitable for DP that restricts results in SP, likewise I don't want a computation that errors out in DP but not SP.

For Student's-T and Chi² distributions we could approximate with a normal for very high degrees of freedom I guess. I don't remember the asymptotic properties of the F distribution when one or both degrees of freedom get large.

- Pauli

Dieter · 05-08-2014, 10:58 PM

(05-08-2014 10:28 PM)Paul Dale Wrote: Given we're only promising accuracy for single precision, there are many orders of magnitude left before the risk of digit loss becomes significant.

The user may enter dof values that lead to problems and strange results. At the moment I am looking at the Chi² quantile. For Chi² = n the CDF should be near 0,5 (slightly larger). Try this with n=2000 and you get 0,5042... (correct). Then try n=3000 and the 34s returns zero (!).

Quote:For Student's-T and Chi² distributions we could approximate with a normal for very high degrees of freedom I guess. I don't remember the asymptotic properties of the F distribution when one or both degrees of freedom get large.

I think that indeed very large dof values will be required until the Normal CDF matches the others only for SP accuracy. ;-)

Dieter