HP Forums
Accuracy of Quadratic Regression - Printable Version

+- HP Forums (https://www.hpmuseum.org/forum)
+-- Forum: HP Calculators (and very old HP Computers) (/forum-3.html)
+--- Forum: HP Prime (/forum-5.html)
+--- Thread: Accuracy of Quadratic Regression (/thread-20349.html)



Accuracy of Quadratic Regression - bernhardSA - 08-20-2023 05:07 PM

Dear HP Prime User Thread

There seems to be a subtlety with the quadratic regression that I cannot figure out: the HP Prime results for a sample problem are subtly different from those from MS Excel, an on line tool (https://ezcalc.me/quadratic-regression-calculator/) and TI-84

x y
2003 813
2004 941
2005 962
2006 1053
2007 1132
2008 1194
2009 1205
2010 1244
2011 1254
2012 1262

-> Excel, and other tools calculate the quadratic fit as:
y = -5.46969697 X^2 + 22010.28788 X -22141315.33
-> HP calculates it is:
y = -5.47571307 X^2 + 22034.44253 X -22165560.54

It is close ... but annoyingly different. I played with HOME/CAS settings and set home number format to standard and floating 11 and CAS format to standard 12 to no avail.

I am not sure if this is related to floating point accuracy (although I thought that HP Prime has better accuracy than TI-84), or I may be missing a special setting.

I am a long time HP28S and HP4G user and HP fan in general and recently purchased my daughter an HP Prime. I would appreciate help/insight into this scenario

Thx

PS: The linear fit matches the other calculators exactly


RE: Accuracy of Quadratic Regression - parisse - 08-21-2023 10:03 AM

The correct coefficients are [-361/66,1452679/66,-66423946/3], you can get them from CAS with the following commands:
Code:
m:=[[2003,813],[2004,941],[2005,962],[2006,1053],[2007,1132],[2008,1194],[2009,1205],[2010,1244],[2011,1254],[2012,1262]];
polynomial_regression(m,2);
They match the numerical values from Excel and co. The HP home values are indeed inaccurate.


RE: Accuracy of Quadratic Regression - Albert Chan - 08-21-2023 05:52 PM

(08-20-2023 05:07 PM)bernhardSA Wrote:  I am not sure if this is related to floating point accuracy

Yes, it is due to floating point precision. (HOME have only 12 decimals)

[Image: Snip20191206_1.png]

To solve for quadratic best fit, we need upto Σx²y (top-right, m=2)
Update: Worse, for quadratic fit, we need Σx4 (top-left, m=2)

Generalized mean, M4 ≥ M1

OP example, Σx4 ≥ ((2003+2012)/2)^4 * 10 ≈ 1.624E14      // 15 digits!

To get best fit coefficients, it may be better to reduce both x and y from its mean.

We can simply knock down year 2000, (2003 .. 2012) --> (3 .. 12)
APPS quadratic fits get Y = -5.4696969697*X^2+131.5*X+472.545454545

CAS> symb2poly(-5.4696969697*X^2+131.5*X+472.545454545 | X=x-2000

[−5.4696969697, 22010.2878788, −22141315.3333]

This matched Excel results, y = -5.46969697 x^2 + 22010.28788 x -22141315.33


RE: Accuracy of Quadratic Regression - Steve Simpkin - 08-21-2023 08:10 PM

Interesting that the TI-89 calculates the quadratic fit very close to the Excel and other tools values. Internally, the TI-89 calculates and retains all decimal results with up to 14 significant digits (although a maximum of 12 are displayed).

[attachment=12438]

Update: Oddly the HP 50g does not calculate Quadratic regression as a built-in regression type. The program PREGR appears to be able to do this (and more).
https://www.hpcalc.org/details/3151


RE: Accuracy of Quadratic Regression - Steve Simpkin - 08-21-2023 09:21 PM

For reference, the least expensive calculator that I have that can perform Quadratic Regression is the Casio fx-991EX (about $22 USD). Its answer matched the Excel results exactly. For Statistics, this model will remember the data until you switch out of Statistics mode.

See photo of results at:
https://drive.google.com/file/d/1JUvxsNxl9W1VlcLARqhq9bugZ1pOUIu9/view?usp=sharing


Update:
Here are the results of the Quadratic Regression calculation on the TI-36X Pro (about $20 USD). I used the on-line TI-36X Pro Emulator for this example. Its answer also matched the Excel results exactly.

See photo of results at:
https://drive.google.com/file/d/14BazIaV-vaiLC_Yus98BqK49f_2kdz1j/view?usp=sharing


RE: Accuracy of Quadratic Regression - bernhardSA - 08-21-2023 09:33 PM

Dear Thread Responders

I appreciate the quick and insightful answers. This has been very useful. The first set of "less accurate" results were via the Statistics2Var app. I have been able to get the same high accuracy results with CAS command line programming, and the modified x-vector in the statistics app as suggested.

It is a little surprising that HP chose to segregate the lower precision for HOME and higher precision for CAS. Why not simply do all the math at highest precision and display in most accurate floating point format or super snazzy symbolic/fraction/exact format as required by user. A high school student may not care (or appreciate HOME vs CAS subtlety) - they just need to get the same answer as the teacher's solution key, and using the statistics app may seem faster more attractive than explicitly typing in matrices and commands ... it stands to reason that they might expect the same answer regardless of the path.

As a longstanding old school HP user and fan, and one who was trying to convert his high school freshman child to HP and RPN, this has been an unexpected speedbump.

Again thx for the quick help (I will keep the HP and my high schooler will now get a TI)


RE: Accuracy of Quadratic Regression - parisse - 08-22-2023 06:35 AM

(08-21-2023 09:33 PM)bernhardSA Wrote:  As a longstanding old school HP user and fan, and one who was trying to convert his high school freshman child to HP and RPN, this has been an unexpected speedbump.

Again thx for the quick help (I will keep the HP and my high schooler will now get a TI)
I believe this is a disappointing conclusion. Getting exactly the same answer as the teacher should not be that important. This difference could be a starting point to have a deeper understanding of the math behind :
1/ Floating point is not exact, and depending on the representation you will get different results for the same computation. For bad conditionned problems, this might be very different. Here if you define y1:=-5.46969697 x^2 + 22010.28788 x -22141315.33 and y2:= -5.47571307 x^2 + 22034.44253 x -22165560.54 and plot(y2-y1,x=2003..2010), you see that the absolute difference is less than about 0.05 for data about 1e3, that's a relative error of less than 0.01%, something that is negligible.
2/ Accuracy can sometimes be improved, here by translating the data.
3/ It is a good idea to check how the answer changes with slightly modified data, this will spot badly conditionned problems. Quadratic regression is more sensitive than a linear regression (and higher degree regressions would be much more sensitive). It should not be used to extrapolate far from the initial data x interval.


RE: Accuracy of Quadratic Regression - Werner - 08-22-2023 07:27 AM

(08-21-2023 09:33 PM)bernhardSA Wrote:  A high school student may not care [...]

A high school student should be aware of what they are doing.
Calculation devices have limits, and we run into one here.
My regular MLR/POLR regression programs for the 42S fail utterly here, not even getting a single digit correct; the version with orthogonal factorization gets 5-6 digits correct (and is thus already better than the Prime's result in HOME).
But the ultimate lesson is what Albert shows us here (and in a zillion other posts): have a look at the input data, and remove possible causes of numerical instability or inaccuracy.

Cheers, Werner


RE: Accuracy of Quadratic Regression - Joe Horn - 08-22-2023 11:00 AM

(08-21-2023 05:52 PM)Albert Chan Wrote:  To get best fit coefficients, it may be better to reduce both x and y from its mean.
For your example, we can simply knock down year 2000, (2003 .. 2012) --> (3 .. 12)

(08-22-2023 06:35 AM)parisse Wrote:  2/ Accuracy can sometimes be improved, here by translating the data.

Most of the manuals for the older HP calculators explained this idea of "normalizing" data to avoid roundoff errors. For example, here's what the HP-32S manual says:

HP-32S Owner's Handbook Wrote:
Limitations on Precision of Data

Since the calculator uses finite precision (12 to 15 digits), it follows that there are limitations to calculations due to rounding.

Normalizing Close, Large Numbers.
The calculator might be unable to correctly calculate the standard deviation and linear regression for a variable whose data values differ by a relatively small amount. To avoid this, normalize the data by entering each value as the difference from one central value (such as the mean). For normalized x-values, this difference must then be added back to the calculation of x-bar and x-hat, and y-hat and b must also be adjusted. For example, if your x-values were 7776999, 7777000, and 7777001, you should enter the data as -1, 0, and 1; then add 7777000 back to x-bar and x-hat. For b, add back 7777000 × m. To calculate y-hat, be sure to supply an x-value that is less 7777000.

A cursory search did not find a similar explanation in the HP 50g or Prime owner's manuals.


RE: Accuracy of Quadratic Regression - Steve Simpkin - 08-22-2023 12:22 PM

(08-21-2023 09:21 PM)Steve Simpkin Wrote:  For reference, the least expensive calculator that I have that can perform Quadratic Regression is the Casio fx-991EX (about $22 USD). Its answer matched the Excel results exactly. For Statistics, this model will remember the data until you switch out of Statistics mode.

See photo of results at:
https://drive.google.com/file/d/1JUvxsNxl9W1VlcLARqhq9bugZ1pOUIu9/view?usp=sharing


Update:
Here are the results of the Quadratic Regression calculation on the TI-36X Pro (about $20 USD). I used the on-line TI-36X Pro Emulator for this example. Its answer also matched the Excel results exactly.

See photo of results at:
https://drive.google.com/file/d/14BazIaV-vaiLC_Yus98BqK49f_2kdz1j/view?usp=sharing

Not only did the Casio fx-991EX come up with an accurate answer, its QR code feature displayed all of the data in a table, the quadratic coefficients and a graph of the data on the following web page:
Example of Quadratic Regression problem

[attachment=12441]

The web page allows you to edit or add additional data to the data in the table, zoom or scroll on the graph, add notes and change many aspects of the results.
Pretty neat for a $22 calculator.


RE: Accuracy of Quadratic Regression - KeithB - 08-22-2023 02:15 PM

I might have done something wrong, but the CFIT program in the curve fitting pack on my HP71 could not even converge using the POLY program!


RE: Accuracy of Quadratic Regression - Albert Chan - 08-22-2023 10:56 PM

If we use data relative to its mean, and X is uniformly spaced, by symmetry, Σ(x^(2k+1)) = 0
This saved a lot of calculations!

X = [2003,2004,2005,2006,2007,2008,2009,2010,2011,2012]
Y = [813,941,962,1053,1132,1194,1205,1244,1254,1262]

x = X - X̅ = X - 2007.5
y = Y - Y̅ = Y - 1106.0

a*Σx³ + b*Σx² + c*Σx = Σxy         // uniform X spacing, a, c terms goes away

b = Σxy / Σx² = 4080 / 82.5 = 544/11

a*Σx² + b*Σx + c*Σ1 = Σy = 0      // uniform X spacing, b terms goes away
a*Σx4 + b*Σx³ + c*Σx² = Σx²y

a = (Σy Σx² - Σ1 Σx²y) / (Σx² Σx² - Σ1 Σx4) = (0 - 10*-2888) / (82.5^2 - 10*1208.625) = -361/66

c = (Σy - b*Σx - a*Σx²) / Σ1 = -a*Σx² / Σ1 = 361/66 * 82.5 / 10 = 361/8

(Y-1106.0) = (-361/66)*(X-2007.5)^2 + (544/11)*(X-2007.5) + 361/8

--> Y ≈ -5.4696969697*X^2 + 22010.2878788*X - 22141315.3333


RE: Accuracy of Quadratic Regression - Albert Chan - 08-23-2023 11:45 AM

(08-22-2023 10:56 PM)Albert Chan Wrote:  c = (Σy - b*Σx - a*Σx²) / Σ1

This is how Casio FX-3650P get quadratic regression constant term.
With c out of the way, we simply solve for 2 equations with 2 unknown.

\(\begin{bmatrix}
Σ1\;ΣX^2 - ΣX\;ΣX & Σ1\;ΣX^3 - ΣX\;ΣX^2\\
Σ1\;ΣX^3 - ΣX\;ΣX^2 & Σ1\;ΣX^4 - ΣX^2\;ΣX^2
\end{bmatrix}
\begin{bmatrix}
b \\
a \end{bmatrix}
=
\begin{bmatrix}
Σ1\;ΣXY - ΣX\;ΣY\\
Σ1\;ΣX^2Y - ΣX^2\;ΣY
\end{bmatrix}
\)

Divide both side by n = Σ1, we have Casio's notation

\(\begin{bmatrix}
SXX & SXX^2\\
SXX^2 & SX^2X^2
\end{bmatrix}
\begin{bmatrix}
b \\
a \end{bmatrix}
=
\begin{bmatrix}
SXY\\
SX^2Y
\end{bmatrix}
\)


RE: Accuracy of Quadratic Regression - Albert Chan - 08-25-2023 12:06 AM

(08-22-2023 07:27 AM)Werner Wrote:  My regular MLR/POLR regression programs for the 42S fail utterly here, not even getting a single digit correct; the version with orthogonal factorization gets 5-6 digits correct (and is thus already better than the Prime's result in HOME).

X = 2003 .. 2012 = 2007.5 ± 4.5
With huge center, tiny gap, all the equations are similar, if scaled by \(\bar{X}\)

\(\begin{bmatrix}
1.00001228271 & 1.00000614135 & 1.00000204712\\
1.00000614135 & 1.00000204712 & 1.0\\
1.00000204712 & 1.0 & 1.0
\end{bmatrix}
\begin{bmatrix}
a\;\bar{X}^2 \\ b\;\bar{X} \\ c
\end{bmatrix}
=
\begin{bmatrix}
1106.40866817 \\ 1106.20323786 \\ 1106.0
\end{bmatrix}
\)

This make the matrix ill-conditioned. Tiniest of rounding errors may cause trouble.

Starting from original matrix m, we pre-treat it a bit before solving it.

\(\left[\begin{array}{cccc}162415528660693&80903876075&40300645&44588891682\\80903876075&40300645&20075&22207030\\40300645&20075&10&11060\end{array}\right]\)

Cas> m := float(m);                 // work with float ... no cheating
Cas> m(1) -= 2007 * m(2);
Cas> m(2) -= 2007 * m(3);

\(\left(\begin{array}{cccc}
41449378168 & 20481560.0 & 10120.0 & 19382472.0 \\
20481560.0 & 10120.0 & 5.0 & 9610.0 \\
40300645.0 & 20075.0 & 10.0 & 11060.0
\end{array}\right) \)

Cas> m(1) -= 2023 * m(2);      // once more, for good measure

\(\left(\begin{array}{cccc}
15182288.0 & 8800.0 & 5.0 & -58558.0 \\
20481560.0 & 10120.0 & 5.0 & 9610.0 \\
40300645.0 & 20075.0 & 10.0 & 11060.0
\end{array}\right) \)

Equations are now very different, easy to solve.

Cas> rref(m)

\(\left(\begin{array}{cccc}
1.0 & 0.0 & 0.0 & -5.46969696961 \\
0.0 & 1.0 & 0.0 & 22010.2878784 \\
0.0 & 0.0 & 1.0 & -22141315.333
\end{array}\right)\)


RE: Accuracy of Quadratic Regression - Hollerith - 09-05-2023 11:28 AM

HP did better job with the HP38G in 1995, 18 years before the Prime was released.

The HP38G (and HP40GS) results :-

-5.46969698931 x^2 + 22010.2879576 x - 2214135.4125

Not as good as the "Excel results" but a lot better than the Prime.

The HP38G like the Prime uses BCD arithmetic with a 12 digit mantissa and 3 digit exponent.

I suspect there is an issue with the recoding of the BCD math routines from HP38/9/40 System RPL to platform-independent C for the HP39Gii which was then used in the Prime. The HP39Gii gives the same result as the Prime.


RE: Accuracy of Quadratic Regression - jte - 09-05-2023 11:28 PM

(08-21-2023 05:52 PM)Albert Chan Wrote:  
Yes, it is due to floating point precision. (HOME have only 12 decimals)

For conceptual simplicity / implementation ease, I used higher precision arithmetics for the curve fitting in the Function app’s ”Sketch” feature. (129-bit sign-magnitude for quadratic fits and 161-bit sign-magnitude for cubic fits.)

My intention to use higher precision there was not just based on wanting to rule out any round-off during the production of the entries of the design matrix, but to also bound the round-off involved when solving for the coefficients (of the quadratic / cubic fit).


RE: Accuracy of Quadratic Regression - Hollerith - 09-06-2023 06:18 PM

Why do people keep on saying the low accuracy is due to the Home floating point precision being 12 digits when it is clear that this is not the case? Please explain how HP managed to do it better with 12 digit precision on the HP38G from 28 years ago.