Thanks to the RCL59 emulator on my iPhone I have finally, after 40 years, been able to inspect the Master Library Pgm 02 code for solving simultaneous linear equations and matrix inversion. All I can say is that apparently not much time has been spent on it, so I set out to see if I could improve upon it.
Some thoughts:
dot products
dot products are inherently faster than axpy operations, because they use less STO instructions, and the way the axpy operation was implemented was very inefficient.
Pgm 02 contained two dot product subroutines (one forward, one backward), but they are not used in the LU-factorisation.
My version uses a single dot routine, used everywhere.
unrolled loops
.max order = 9, so max dot length = 8
.speed improvement is about 11%, while it requires 100 extra steps. That does not seem worthwhile.
permutation register
.max order = 9, so permutations can be kept in 1 reg
.i.o. 1..n, keep delta
.this allows solving a 9x9 system, and a faster inversion algorithm
UXL inversion
Pgm 02 calculates the inverse from the LU-factorisation as inv(U)*inv(L).
So it essentially performs three steps:
- invert U
- invert L
- multiply U*L
The fourth step (re-arranging the columns according to the permutations) is done reading out the columns of the inverted matrix, and is not included in the timing estimates of Pgm 02
UXL inversion calc's X=inv(A) straight away, a row and col at a time, making max use of the dot product's speed and efficiency. It uses a work vector of length n, effectively overwriting the vector b if present
HIR
HIR's 7 and 8 are used. Moreover, the determinant is no longer in REG 06, but in HIR 08, as REG 06 now contains the permutations. As before, though, R07 contains n and R08 the first element of the matrix.
interface
In keeping with the 'minimal' interface, I decided to simplify it even further.
A: input order n
B: input column. column 0 or n+1 is considered the vector b. This allows
entering A and b together, then executing D' to solve.
B' output column (idem for col 0 or n+1)
C: determinant
D: solve, returns 0
D': det+solve, returns 0
E: invert, returns 1
E': det + inverse, returns det
D' and E' will return a flashing 0 when the determinant is zero.
D, D' and E allow reading out the solution x or the inverse matrix by R/S
results
- 640 steps instead of 898, through code re-use
- linear equations up to and including 9x9 systems
- Determinant and LU-decomposition more than 3x faster
- Inversion 2x faster
Speed improvements due to inner loop optimisation.
This does not even take into account that the original code did not put the columns in order after linear equation solve or inversion; rather, the output code showed the entries in the correct order. My version applies the permutations as well, before readout.
The speeds are measured with RCL59 on my iPhone and may bear no relation to the true speeds. (ie. it is not certain that on a real 58/59 the speed improvements would be the same)
See the attached Excel for the code and some colourful aids if you want to understand the code in detail.