Threaded Mode | Linear Mode

Thomas Klemm · 02-20-2019, 05:31 AM

(02-19-2019 08:39 AM)Csaba Tizedes Wrote: I have another idea: what if, if we just simple walking along on the function's curve until the sign is changing, then we turning back, decreasing the stepsize and walking again. If the sign of the function changed, we are turning back again, decreasing the size of the steps and so on... If the stepsize is less than a small value, we found the root.

Or if we start with two points whose function values have opposite signs and then evaluate the function at the midpoint.
Based on its sign the next step is just a jump to the left or a jump to the right.

Bang: we've just invented binary search.

Cheers
Thomas

Thomas Klemm · (This post was last modified: 02-20-2019 08:39 AM by Thomas Klemm.)

(02-18-2019 10:22 PM)Thomas Klemm Wrote: Which I still do not understand in detail.

(02-19-2019 01:10 AM)Albert Chan Wrote: I think the code add slope data, like this:

Data points (0,90) and (30, 90.2), add to linear regression: (30+0)/2, (90.2-90)/(30-0)
Next point (60, 86.4), add to linear regression: (60+30)/2, (86.4-90.2)/(60-30)
...
N points (sorted) thus generate N-1 slopes to regress.

If we know two points \((x_1, y_1)\) and \((x_2, y_2)\) on a parabola \(y=ax^2+bx+c\) then we can calculate the slope \(y'=2ax+b\) at their midpoint \(\frac{x_1 + x_2}{2}\):

\(\begin{matrix}
y_1 = ax_1^2+bx_1+c \\
y_2 = ax_2^2+bx_2+c
\end{matrix}\)

\(\begin{align*}
y_2-y_1 &= a(x_2^2-x_1^2)+b(x_2-x_1) \\
&= a(x_2+x_1)(x_2-x_1)+b(x_2-x_1)
\end{align*}\)

\(\begin{align*}
\frac{y_2-y_1}{x_2-x_1} &= a(x_2+x_1)+b \\
&= 2a\frac{x_2+x_1}{2}+b
\end{align*}\)

This is a bit surprising since a parabola is determined by three points, but the result is independent of the third point.

With linear regression we try to minimise the distance of the vector

\(y=\begin{bmatrix}
y_1\\
\cdot\\
\cdot\\
\cdot\\
y_n
\end{bmatrix}\)

to the 3-dimensional sub-space spanned by

\(x^2=\begin{bmatrix}
x_1^2\\
\cdot\\
\cdot\\
\cdot\\
x_n^2
\end{bmatrix}\, , \,
x=\begin{bmatrix}
x_1\\
\cdot\\
\cdot\\
\cdot\\
x_n
\end{bmatrix}\, , \,
e=\begin{bmatrix}
1\\
\cdot\\
\cdot\\
\cdot\\
1
\end{bmatrix}\)

The solution is the orthogonal projection.

Now we take Csaba's transformation:

\(\Delta=
\begin{bmatrix}
-\frac{1}{x_2-x_1} & \frac{1}{x_2-x_1} & 0 & 0 & \cdots & 0 \\
0 & -\frac{1}{x_3-x_2} & \frac{1}{x_3-x_2} & 0 & \cdots & 0 \\
\cdot & \cdot & \cdot & \cdot & & \cdot \\
\cdot & \cdot & \cdot & \cdot & & \cdot \\
\cdot & \cdot & \cdot & \cdot & & \cdot \\
0 & 0 & 0 & \cdots & -\frac{1}{x_n-x_{n-1}} & \frac{1}{x_n-x_{n-1}}
\end{bmatrix}\)

and apply it to all the vectors

\(\Delta y=\begin{bmatrix}
\frac{y_2-y_1}{x_2-x1} \\
\cdot\\
\cdot\\
\cdot\\
\frac{y_n-y_{n-1}}{x_n-x_{n-1}}
\end{bmatrix}
\)

\(\Delta x^2=\begin{bmatrix}
\frac{x_2^2-x_1^2}{x_2-x1} \\
\cdot\\
\cdot\\
\cdot\\
\frac{x_n^2-x_{n-1}^2}{x_n-x_{n-1}}
\end{bmatrix}=\begin{bmatrix}
x_2+x_1\\
\cdot\\
\cdot\\
\cdot\\
x_n+x_{n-1}
\end{bmatrix}\)

\(\Delta x=\begin{bmatrix}
\frac{x_2-x_1}{x_2-x1} \\
\cdot\\
\cdot\\
\cdot\\
\frac{x_n-x_{n-1}}{x_n-x_{n-1}}
\end{bmatrix}=\begin{bmatrix}
1\\
\cdot\\
\cdot\\
\cdot\\
1
\end{bmatrix}\)

\(\Delta e=\begin{bmatrix}
0\\
\cdot\\
\cdot\\
\cdot\\
0
\end{bmatrix}\)

Since \(\Delta e = 0\) we realise that we project along the vector \(e\) into a n-1 dimensional subspace.
Here we try to minimise the distance of the vector \(\Delta y\) to the 2-dimensional sub-space spanned by \(\Delta x^2\) and \(\Delta x\).
Again the solution is the orthogonal projection.
But clearly the transformation \(\Delta\) isn't orthogonal.
Thus these two solutions aren't mapped exactly.

What reduces this error?
Csaba suggested sorting the points by their x-coordinates.
In case of his example this leads to an "isotropic" transformation \(\Delta\) since all entries of the matrix are the same except those that are 0 of course.
Is that why it works so well?
Is it enough to keep them in order, or do we also need the same distances?

I could imagine that this leads to a completely wrong result if the transformation \(\Delta\) is a distortion.

Cheers
Thomas

Thomas Klemm · 02-24-2019, 09:21 AM

(02-18-2019 10:22 PM)Thomas Klemm Wrote: Quadratic fit without linear algebra (32SII) - NOT only for statistics lovers - LONG

(02-19-2019 08:39 AM)Csaba Tizedes Wrote: But as a programming question this was an interesting "game" for me.

Since the regression is linear you can remove division by 2 at two places in your program.
First when calculating the midpoints:

Code:

RCL X

RCL+U

2

/

Sum+

And then later on when calculating A:

Code:

LBL C     #Calculating A, B and C

m

2

/

STO A

We can rewrite \(y=m\frac{x_k+x_{k+1}}{2}+b\) as \(y=\frac{m}{2}(x_k+x_{k+1})+b\).
But since \(A=\frac{m}{2}\) this becomes \(y=A(x_k+x_{k+1})+b\).

Cheers
Thomas

Csaba Tizedes · 02-25-2019, 08:39 PM

(02-20-2019 05:31 AM)Thomas Klemm Wrote: Or if we start with two points whose function values have opposite signs and then evaluate the function at the midpoint.
Based on its sign the next step is just a jump to the left or a jump to the right.
Bang: we've just invented binary search.

Not really, because in my idea no needed to bracket the root. The algorithm identifies the right direction, then make one step in that direction. When the sign is changed, the direction changes and the stepsize is reduced. Repeat until the stepsize is small enough (when the sign is changed).

First I wrote in similar way an extremum finder for CASIO fx-4000P, later reduced the length and coded on fx-3650P (see the attachment: variables: A: stop condition, the minimal stepsize, B: function previous value, C: step direction: +1/-1, D(>0!) actual stepsize, X: actual x, Y: actual y, M: subroutine return address, the function: 150×9.81×0.48/(cos(x)+0.48×sin(x)) ).

And of course, later I found totally same - implemented on a HP-9100:

I have never learned computer science, maybe my methods are documented and invented many years before, and of course, it is possible to upgrade them (or available more effective), but my goal is find a workable simplest method which can be coded immediately.

Csaba

Thomas Klemm · (This post was last modified: 02-25-2019 11:18 PM by Thomas Klemm.)

(02-25-2019 08:39 PM)Csaba Tizedes Wrote: Not really, because in my idea no needed to bracket the root.

There's nothing wrong with a digit-by-digit algorithm.
However after the first change of sign you already bracket the root.

So let's assume that this interval has length \(I\).
And we want to bring it down to a small value, say \(\epsilon\).

Digit-by-digit method

With each digit we divide the interval by 10.
After \(k\) digits we end up with:

\(\epsilon=\frac{I}{10^k}\)

Thus \(k=\frac{\log I - \log \epsilon}{\log 10}\).
But on average we need 5 steps until we can decide on the digit.
All in all we use about \(\frac{5}{\log 10}(\log I - \log \epsilon)\) steps.

Binary search

We divide the interval consecutively by 2.
After \(k\) bisections we end up with:

\(\epsilon=\frac{I}{2^k}\)

Thus \(k=\frac{\log I - \log \epsilon}{\log 2}\).
We only need 1 step to decide which interval to keep.
All in all we use \(\frac{1}{\log 2}(\log I - \log \epsilon)\) steps.

Conclusion

Now compare \(\frac{5}{\log 10} \approx 2.1715\) with \(\frac{1}{\log 2} \approx 1.4427\) to see that the binary search is about \(1.5051\) times faster than the digit-by-digit method.

The only advantage to use a digit-by-digit method is when calculating the function value of the next step can be done easier based on the function values of the previous steps.
Therefore, it's used in calculating the square root or the trigonometric functions (CORDIC).

However, I do not see any way to do this with an arbitrary function like the one given in the video:

\(f(x)=\left (\frac{e \ln(\pi)}{\ln(x)} \right )^x-e^\pi\)

Cheers
Thomas

Albert Chan · (This post was last modified: 11-07-2019 01:07 AM by Albert Chan.)

(02-16-2019 03:58 AM)Thomas Klemm Wrote:
(02-15-2019 09:16 PM)Albert Chan Wrote: Amazingly, my rate formula is same as Aitken extrapolation formula !

That's not really a surprise, is it?
It's essentially the same idea: geometric series

\(1\,+\,r\,+\,r^{2}\,+\,r^{3}\,+\,\cdots \;=\;{\frac {1}{1-r}}\)

Except for r=1, rate formula work even if iterations diverges (|r| > 1)

Example, solve x = f(x) = 5 - x^3, guess = 1.5

Above setup, iterations diverges: 1.5 → 1.625 → 0.708984375 → 4.643622734 ...

First 3 numbers, r = (0.708984375 - 1.625) / (1.624 - 1.5) ≈ -7.328

Iterate with x = g(x) = p(5 - x^3) + (1-p)x, with p = 1/(1-r) ≈ 12.0% :

1.5
1.515000000
1.515928095
1.515977481
1.515980083
1.515980220
1.515980227
1.515980228 ← converged

We can speed up the convergence by using r = f'(guess) = -3 * 1.5^2 = -6.75
Iterate with same g(x), but p = 1/(1-r) ≈ 12.9% :

1.5
1.516125000
1.515977551
1.515980277
1.515980227
1.515980228 ← converged

Albert Chan · 11-10-2019, 07:02 PM

(11-03-2019 03:14 PM)Albert Chan Wrote: Except for r=1, rate formula work even if iterations diverges (|r| > 1)

Proof:

Assumed rate r = Δx_i+1 / Δx_i = constant, we extrapolate for the converged value.

if |r| < 1: x_∞ = x₀ + (x₁ - x₀) + (x₂ - x₁) + ... = x₀ (1 + r + r² + ...) = x₀/(1-r)

if |r| > 1: we can visualize it as iterate back-in-time, with rate of 1/r

x₀ - x_-∞ = (x₀ - x_-1) + (x_-1 - x_-2) + (x_-2 - x_-3) + ... = x₀ (1 + 1/r + 1/r² + ...)

x_-∞ = x₀ (1 - 1/(1-1/r)) = x₀/(1-r)

x_∞ and x_-∞ have the same expression !

x₀/(1-r) = (x₁ - (x₁ - x₀)) / (1-r) = (f(x₀) - r x₀) / (1-r)

rate iteration formula: x ← (f(x) - r x) / (1-r)

Albert Chan · (This post was last modified: 01-04-2020 09:58 PM by Albert Chan.)

(11-10-2019 07:02 PM)Albert Chan Wrote: rate iteration formula: x ← (f(x) - r x) / (1-r)

The formula actually approximate Newton's method. Smile

Let g(x) = x - f(x) → solving for x=f(x) is same as solving g(x)=0

g'(x) = 1 - f'(x) ≈ 1 - r

x ← (f(x) - r x) / (1-r) ≈ ((x - g(x)) - (1 - g'(x))*x) / g'(x) = x - g(x) / g'(x)

Albert Chan · 01-04-2020, 07:49 PM

Just realized rate formula is equivalent to secant's method !

Post #6 showed rate formula is equivalent to Aitken's formula.
Post #27 showed we can visualize sequence running backwards, getting the same formula.

Assuming iteration formula f, with b = f(a), c = f(b):

\(Aitken(a,b,c) = Aitken(c,b,a) = a - {(a-b)^2 \over (a-b)-(b-c)}\)

If we make 2 points: \((x_1,y_1) = (a,a-b) \;,\; (x_2, y_2) = (b,b-c)\), we have:

\(Aitken(a,b,c) = x_1 - y_1 \left({x_1 - x_2 \over y_1 - y_2}\right)\)

This is the formula for secant's method, interpolate for y = x - f(x) = 0

QED Big Grin