Post Reply 
qreg() quadratic regression with r
12-31-2017, 12:50 AM (This post was last modified: 12-31-2017 01:21 AM by TheKaneB.)
Post: #1
qreg() quadratic regression with r
Hi,

I noticed that the HP Prime Statistics 2Var App has got the very useful quadratic regression function, but as I understand it doesn't calculate the correlation coefficient "r" (it just gives NaN).
So, I thought it could be a fun exercise to reimplement the quadratic regression from scratch and including the r calculation as well.

Code:

EXPORT qreg(data)
BEGIN
  LOCAL n := SIZE(data);
  LOCAL n_inv := 1.0 / n;
  LOCAL sigmaX := 0, sigmaY := 0, sigmaX2 := 0, sigmaX3 := 0, sigmaX4 := 0, sigmaXY := 0, sigmaX2Y := 0;

  local i;
  FOR i FROM 1 TO n DO
    local x := data(i,1);
    local y := data(i,2);
    local x2 := x * x;
    local x3 := x2 * x;
    local x4 := x2 * x2;
    local xy := x * y;
    local x2y := x2 * y;
    sigmaX := sigmaX + x;
    sigmaY := sigmaY + y;
    sigmaX2 := sigmaX2 + x2;
    sigmaX3 := sigmaX3 + x3;
    sigmaX4 := sigmaX4 + x4;
    sigmaXY := sigmaXY + xy;
    sigmaX2Y := sigmaX2Y + x2y;
  END;

  local x_m := sigmaX * n_inv;
  local x2_m := sigmaX2 * n_inv;
  local x_m2 := x_m * x_m;
  local y_m := sigmaY * n_inv;

  local Sxx := sigmaX2 * n_inv - x_m2;
  local Sxy := sigmaXY * n_inv - x_m * y_m;
  local Sxx2 := sigmaX3 * n_inv - x_m * x2_m;
  local Sx2x2 := sigmaX4 * n_inv - x2_m * x2_m;
  local Sx2y := sigmaX2Y * n_inv - x2_m * y_m;

  local comm := 1.0 / (Sxx * Sx2x2 - Sxx2 * Sxx2);
  local B := (Sxy * Sx2x2 - Sx2y * Sxx2) * comm;
  local C := (Sx2y * Sxx - Sxy * Sxx2) * comm;
  local A := y_m - B * x_m - C * x2_m;
 
  local sigmaUp := 0;
  local sigmaDown := 0;
  FOR i FROM 1 TO n DO
    local x := data[i,1];
    local y := data[i,2];
    sigmaUp := sigmaUp + (y - (A + B * x + C * x * x))^2;
    sigmaDown := sigmaDown + (y - y_m)^2;
  END;

  local r := sqrt(1.0 - sigmaUp / sigmaDown);

  RETURN {C, B, A, r};

END;

example usage
Code:

L1 := {{1,1}, {2.1, 4}, {3.98, 16.2}}
L2 := qreg(L1)

The returned list gives the coefficient for a parabola with the formula f(x) = A + Bx + Cx^2. The order of the coefficient is chosen to be compatible with the built-in function "polynomial_regression".
The fourth element in the list L2(4) is the correlation coefficient.

You can compare the built-in polynomial_regression(L1, 2) and qreg(L1) and compare the results. From a few, totally non scientific, test that I made the errors seem to be around 1e-10 on average.

Sources:
Formula used http://keisan.casio.com/exec/system/14059932254941

I believe that the coefficient are correct at least to 8 significant digits, but I'm not 100% sure.

Here it is another implementation I did in Python, to check if the algorithm was correct (I find it easier to do a prototype in Python and then translate it to HPPPL later).
Code:
import math

# The data in pairs (mm, GB)
data = [
    (45, 0),
    (90, 2.39),
    (117, 4.7),
    (58, 0.34),
    (97, 2.68),
    (114, 4.18),
    (106, 3.41),
    (86, 1.93)
]

# count all the sums
n = len(data)
n_inv = 1.0 / n
sigmaX = sigmaY = sigmaX2 = sigmaX3 = sigmaX4 = sigmaXY = sigmaX2Y = 0
for x,y in data:
    x2 = x * x
    x3 = x2 * x
    x4 = x2 * x2
    xy = x * y
    x2y = x2 * y
    sigmaX += x
    sigmaY += y
    sigmaX2 += x2
    sigmaX3 += x3
    sigmaX4 += x4
    sigmaXY += xy
    sigmaX2Y += x2y

x_m = sigmaX * n_inv
x2_m = sigmaX2 * n_inv
x_m2 = x_m * x_m
y_m = sigmaY * n_inv

# Calcuate the coefficients
Sxx = sigmaX2 * n_inv - x_m2
Sxy = sigmaXY * n_inv - x_m * y_m
Sxx2 = sigmaX3 * n_inv - x_m * x2_m
Sx2x2 = sigmaX4 * n_inv - x2_m * x2_m
Sx2y = sigmaX2Y * n_inv - x2_m * y_m

# Calculate the parabola y = A + Bx + Cx^2
comm = 1.0 / (Sxx * Sx2x2 - Sxx2 * Sxx2)
B = (Sxy * Sx2x2 - Sx2y * Sxx2) * comm
C = (Sx2y * Sxx - Sxy * Sxx2) * comm
A = y_m - B * x_m - C * x2_m

def f(x):
    return A + B * x + C * x * x

# calculate the correlation coefficient r
sigmaUp = sigmaDown = 0
for x,y in data:
    sigmaUp += (y - f(x)) ** 2
    sigmaDown += (y - y_m) ** 2

r = math.sqrt(1.0 - sigmaUp / sigmaDown)

print('f(x) = {} x^2 + {} x + {}'.format(C, B, A))
print('r = {}'.format(r))

The python version have embedded test data and prints the results. As I said, it's just a quick prototype, not intended for real use, so I didn't bother adding proper argument parsing from the command line.

I hope you find it useful, I plan on translating it to the HP 35S later on, if I have time.

EDIT: about the example data in the python code. It's a list of measured diameters (in millimeters) and sizes (in GB) of the reflective layer of several DVD-R that I've had around my house. As all of you may know, you can easily see the occupied area of a DVD-R, so I thought I would calculate a formula to get an approximation of the space left on a disk just by measuring the diameter of the written (lighter) area, and since the area grows with the square of the diameter, it seemed appropriate to use a quadratic regression fitting for this task.


Attached File(s) Thumbnail(s)
   

Software Failure: Guru Meditation

--
Antonio
IU2KIY
Find all posts by this user
Quote this message in a reply
Post Reply 


Messages In This Thread
qreg() quadratic regression with r - TheKaneB - 12-31-2017 12:50 AM



User(s) browsing this thread: 1 Guest(s)