[VA] SRC #015 - HP-15C & clones: Big NxN Matrix Inverse & Determinant
|
08-05-2023, 08:24 PM
(This post was last modified: 08-23-2023 06:18 PM by Valentin Albillo.)
Post: #1
|
|||
|
|||
[VA] SRC #015 - HP-15C & clones: Big NxN Matrix Inverse & Determinant
Hi all, Welcome to this new SRC #15 I've just created to commemorate the recent general availability of the HP-15C Collector's Edition, which boasts enormously increased processing speed (up to 180x) and memory (up to 3x more RAM,) relative to the original. However, there's a significant problem which needs addressing.
The problem is that although the memory is vastly increased in most 15C clones (physical or virtual), allowing the dimensioning and use of matrices larger than 8x8, some very important advanced matrix operations (namely matrix inversion, system solving and determinant) can't be confidently performed using such large matrices (unreliable results, blocked calculator) because of firmware limitations having to do with the internal LU decomposition, which are too difficult or even impossible to overcome via ad-hoc patches. The Sleuthing Fully aware of this, I investigated what could I do about it, and first of all I looked at my own NxN Matrix Inversion program for the HP-41C which I wrote and published 40 years ago, in order to try and convert it to run on the CE, but I sadly realized that the conversion wasn't viable because the 41C program is too long (170 steps, 40 registers) and needs 11 numbered data registers to work, for a grand total of 51 registers. This would severely limit the maximum size of the matrices to process. Secondly, the 41C program uses a lot of flags (e.g. 13+1 for 13x13 matrices), which the CE doesn't have, and additionally it uses a lot of indirect register storage and recall using several index registers, while the CE only has one, register I, so this would considerably complicate and slow the conversion, needing special care with the stack contents. This would further increase program size and running time significantly, as the program uses nested loops and thus executes many hundreds or even thousands of RPN instructions to perform the inversion. In short, inviable. Now, I investigated whether it would be possible to write a program to create and use the LU decomposition as the firmware does but without the dreaded 8x8 limitation, and though this would possibly do, it would nevertheless result in long, complicated code which would further need additional registers to work, thus limiting seriously the size of the matrices it could process. It would also need to execute hundreds or thousands of RPN instructions, not firmware microcode. Viable but difficult and with suboptimal performance, so What to do ? (cue dramatic pause ...) My Solution To adopt an entirely new strategy, that's what. Come to think of it, the real problem here is that the HP-15C's firmware can't obtain the LU decomposition (an so invert/system-solve/compute the determinant) of a matrix larger than 8x8. Well, let's avoid it altogether by using some strategy which allows us to use the built-in microcoded inversion instruction for speed, but making sure it never has to invert a matrix larger than 8x8 ! Enter partitioned matrices. A partitioned matrix is a matrix considered to be partitioned into a number of submatrices, which we'll call blocks. The blocks can be of arbitrary sizes, for example, this 5x5 matrix M can be partitioned into four unequal blocks A (4x4), B (4x1), C (1x4) and D (1x1), like this: │ 3 1 4 1 5 │ │ 3 1 4 1 │ │ 5 │ │ 9 2 6 5 3 │ │ 9 2 6 5 │ │ 3 │ M = │ 5 8 9 7 9 │ = | A B | , A = │ 5 8 9 7 │ , B = │ 9 │ , C = │ 6 2 6 4 │ , D = │ 3 │ │ 3 2 3 8 4 │ | C D | │ 3 2 3 8 │ │ 4 │ │ 6 2 6 4 3 │ and of course the idea is that there are efficient algorithms to deal with such partitioned matrices by interacting directly with their blocks, in particular for inverting the matrix or computing its determinant, without ever needing to process any block larger than the largest one, whose size we can choose. Note that regardless of their sizes, every set of blocks represents the same original matrix and operations performed using them will result in the same outcome, but some sizes are better one way or another. For instance, my implementation here of the algorithm to invert a matrix (similarly for computing its determinant) requires that it must be partitioned into 4 blocks and that block A must be a square matrix no larger than 8x8 (which forces block D to also be a square matrix,) because it eventually needs to compute the inverse of A using the built-in [1/x] instruction. Also very important, to achieve maximum speed it's essential that my routine spends most of its time executing microcode, so that's why block A must be dimensioned to be as large as possible without exceeding the 8x8 limit and thus for original NxN matrices larger than 8x8 we'll dimension A to be exactly 8x8, which forces the sizes of all remaining blocks: B will then be 8x(N-8), C will be (N-8)x8, D will be (N-8)x(N-8) and as long as N ≤ 16 no block will be larger than 8x8, as required. And last but not least, all the code here will run on any HP-15C version, including the original, the Limited Edition (patched for extra RAM or not), the Collector's Edition (in its default or extended RAM modes), the DM15's various versions and firmwares, as well as assorted true emulators.
The implementation Part 1: Matrix Inversion This 29-step, 33-byte subroutine will invert in place an NxN partitioned matrix for N ≤ 16 (subject to available memory). It takes no inputs but the caller (the user or another program) must have previously dimensioned and populated the four blocks with the elements of the original matrix. Once it returns, the original values in the blocks will have been replaced with those of the computed inverse matrix, which the user or the caller program may proceed to output or use as desired. In other words, this subroutine can be called from the keyboard or another program (in which case it could be directly embedded into it if it's called just once) but it doesn't perform any input or output operations, it just does the inversion in place, period. Program listing LBL E 001- 42,21,15 RESULT A 002- 42,26,11 RCL MATRIX A 003- 45,16,11 RCL MATRIX B 004- 45,16,12 RCL MATRIX C 005- 45,16,13 RCL MATRIX A 006- 45,16,11 1/x 007- 15 RESULT E 008- 42,26,15 x 009- 20 STO MATRIX C 010- 44,16,13 RESULT D 011- 42,26,14 RCL MATRIX B 012- 45,16,12 MATRIX 6 013- 42,16, 6 1/x 014- 15 RESULT E 015- 42,26,15 x 016- 20 RESULT B 017- 42,26,12 x 018- 20 CHS 019- 16 RESULT A 020- 42,26,11 RCL MATRIX C 021- 45,16,13 MATRIX 6 022- 42,16, 6 RESULT E 023- 42,26,15 RCL MATRIX D 024- 45,16,14 RCL MATRIX C 025- 45,16,13 x 026- 20 CHS 027- 16 STO MATRIX C 028- 44,16,13 RTN 029- 43,32 Notes:
Worked examples The following examples will show you how to use the subroutine and further assume that we're using the newly released HP-15C CE (Collector's Edition) in its default mode, i.e. with just 96 registers available to store matrix elements. 1.- Worked 5x5 Toy Example
│ 3 1 4 1 | 5 │ │ 3 1 4 1 │ │ 5 │ │ 9 2 6 5 | 3 │ │ 9 2 6 5 │ │ 3 │ M = │ 5 8 9 7 | 9 │ , A = │ 5 8 9 7 │ , B = │ 9 │ , C = │ 6 2 6 4 │ , D = │ 3 │ │ 3 2 3 8 | 4 │ │ 3 2 3 8 │ │ 4 │ │ 6 2 6 4 | 3 │ - Initialize and dimension the blocks: A is 4x4, B is 4x1, C is 1x4 and D is 1x1: CF 8, FIX 4 {ensure disabled complex stack and set 4 decimal places} 1, DIM (i), MATRIX 0 {now MEM: 01 91 05-2} 4, ENTER, DIM A {now MEM: 01 75 05-2} 1, DIM B {now MEM: 01 71 05-2} 4, DIM C {now MEM: 01 67 05-2} 1, ENTER, DIM D {now MEM: 01 66 05-2} USER, MATRIX 1 {set USER mode, reset row/col indexes to first element} - Store the 16 elements of block A: 3, STO A, 1, STO A, 4, STO A, 1, STO A, 9, STO A, 2, STO A, 6, STO A, 5, STO A, 5, STO A, 8, STO A, 9, STO A, 7, STO A, 3, STO A, 2, STO A, 3, STO A, 8, STO A - Store the 4 elements of block B: 5, STO B, 3, STO B, 9, STO B, 4, STO B - Store the 4 elements of block C: 6, STO C, 2, STO C, 6, STO C, 4, STO C - Store the single element of block D: 3, STO D - Compute the inverse matrix: E -> E 1 4 {nearly instantaneous} - Recall the inverse matrix elements from blocks A, B, C, D: RCL A -> 0.0265, RCL A -> 0.3591, ..., RCL A -> 0.1638 {16 elements} RCL B -> -0.3685, RCL B -> -0.3254, ..., RCL B -> 0.1054 { 4 elements} RCL C -> 0.2747, RCL C -> 0.1004, ..., RCL C -> 0.0585 { 4 elements} RCL D -> -0.3227 { 1 element } so we've got the following inverse matrix blocks A', B', C', D': │ 0.0265 0.3591 0.0127 -0.0546 │ │ -0.3685 │ A' = │ -0.2101 0.2124 0.2118 -0.1291 │ , B' = │ -0.3254 │ │ -0.0408 -0.4286 -0.0612 -0.0408 │ │ 0.7347 │ │ -0.0794 -0.0772 -0.0381 0.1638 │ │ 0.1054 │ C' = │ 0.2747 0.1004 0.0066 0.0585 │ , D' = │ -0.3227 │ and thus the 5x5 inverse matrix M' is: │ 0.0265 0.3591 0.0127 -0.0546 -0.3685 │ │ -0.2101 0.2124 0.2118 -0.1291 -0.3254 │ M' = │ -0.0408 -0.4286 -0.0612 -0.0408 0.7347 │ │ -0.0794 -0.0772 -0.0381 0.1638 0.1054 │ │ 0.2747 0.1004 0.0066 0.0585 -0.3227 │ 2.- Worked 9x9 Full-fledged Example Invert the following 9x9 matrix M: │ 5 3 4 7 8 0 1 2 | 6 │ │ 6 7 2 0 5 3 4 8 | 1 │ │ 1 0 8 4 2 5 6 7 | 3 │ │ 8 5 0 6 1 4 2 3 | 7 │ M = │ 4 2 6 5 3 7 0 1 | 8 │ │ 7 1 3 2 4 8 5 6 | 0 │ │ 0 6 1 3 7 2 8 4 | 5 │ │ 2 8 7 1 0 6 3 5 | 4 │ │ 3 4 5 8 6 1 7 0 | 2 │ - The A, B, C, D blocks are: │ 5 3 4 7 8 0 1 2 │ │ 6 │ │ 6 7 2 0 5 3 4 8 │ │ 1 │ │ 1 0 8 4 2 5 6 7 │ │ 3 │ A = │ 8 5 0 6 1 4 2 3 │ , B = │ 7 │ , C = │ 3 4 5 8 6 1 7 0 │ , D = │ 2 │ │ 4 2 6 5 3 7 0 1 │ │ 8 │ │ 7 1 3 2 4 8 5 6 │ │ 0 │ │ 0 6 1 3 7 2 8 4 │ │ 5 │ │ 2 8 7 1 0 6 3 5 │ │ 4 │ - Initialize and dimension the blocks: A is 8x8, B is 8x1, C is 1x8, D is 1x1: CF 8, FIX 4 {ensure disabled complex stack and set 4 decimal places} 1, DIM (i), MATRIX 0 {now MEM: 01 91 05-2} 8, ENTER, DIM A {now MEM: 01 27 05-2} 1, DIM B {now MEM: 01 19 05-2} 8, DIM C {now MEM: 01 11 05-2} 1, ENTER, DIM D {now MEM: 01 10 05-2} USER, MATRIX 1 {set USER mode, reset row/col indexes to first element} - Store the 64 elements of block A: 5, STO A, 3, STO A, 4, STO A, 7, STO A, 8, STO A, 0, STO A, 1, STO A, 2, STO A, 6, STO A, 7, STO A, 2, STO A, 0, STO A, 5, STO A, 3, STO A, 4, STO A, 8, STO A, 1, STO A, 0, STO A, 8, STO A, 4, STO A, 2, STO A, 5, STO A, 6, STO A, 7, STO A, 8, STO A, 5, STO A, 0, STO A, 6, STO A, 1, STO A, 4, STO A, 2, STO A, 3, STO A, 4, STO A, 2, STO A, 6, STO A, 5, STO A, 3, STO A, 7, STO A, 0, STO A, 1, STO A, 7, STO A, 1, STO A, 3, STO A, 2, STO A, 4, STO A, 8, STO A, 5, STO A, 6, STO A, 0, STO A, 6, STO A, 1, STO A, 3, STO A, 7, STO A, 2, STO A, 8, STO A, 4, STO A, 2, STO A, 8, STO A, 7, STO A, 1, STO A, 0, STO A, 6, STO A, 3, STO A, 5, STO A - Store the 8 elements of block B: 6, STO B, 1, STO B, 3, STO B, 7, STO B, 8, STO B, 0, STO B, 5, STO B, 4, STO B - Store the 8 elements of block C: 3, STO C, 4, STO C, 5, STO C, 8, STO C, 6, STO C, 1, STO C, 7, STO C, 0, STO C - Store the single element of block D: 2, STO D - Compute the inverse matrix: E -> E 1 8] {~ 0.8 sec.} - Recall the inverse matrix elements from blocks A, B, C, D: RCL A -> -0.8879, RCL A -> 1.1441, ..., RCL A -> 0.5356 {64 elements} RCL B -> 0.4953, RCL B -> -0.1670, ..., RCL B -> -0.3966 { 8 elements} RCL C -> -0.6907, RCL C -> 0.8420, ..., RCL C -> -0.6703 { 8 elements} RCL D -> 0.2930 { 1 element } so we've got the following inverse matrix blocks A', B', C', D': │ -0.8879 1.1441 0.1934 -0.0150 0.9321 -0.7169 -0.2699 -0.8474 │ │ 0.4953 │ │ 0.3822 -0.4559 -0.1674 0.0274 -0.4299 0.2991 0.0893 0.4501 │ │ -0.1670 │ │ -0.5514 0.7337 0.1725 -0.1023 0.6292 -0.5107 -0.2008 -0.4859 │ │ 0.3434 │ A' = │ 1.2107 -1.5120 -0.2495 0.1405 -1.2987 0.9816 0.2347 1.0938 │ , B' = │ -0.5735 │ │ 0.2241 -0.1713 -0.0954 -0.0868 -0.1196 0.1774 0.0830 0.1148 │ │ -0.0983 │ │ 0.6076 -0.8621 -0.2155 0.0166 -0.6233 0.6452 0.1842 0.6312 │ │ -0.3560 │ │ -0.9092 1.0035 0.2297 -0.0243 0.8488 -0.6830 -0.1314 -0.7941 │ │ 0.4877 │ │ 0.6424 -0.6943 -0.0413 0.0732 -0.6939 0.4720 0.1307 0.5356 │ │ -0.3966 │ C' = │ -0.6907 0.8420 0.2012 -0.0015 0.7832 -0.6369 -0.0921 -0.6703 │ , D' = │ 0.2930 │ and thus the 9x9 inverse matrix M' is: │ -0.8879 1.1441 0.1934 -0.0150 0.9321 -0.7169 -0.2699 -0.8474 0.4953 │ │ 0.3822 -0.4559 -0.1674 0.0274 -0.4299 0.2991 0.0893 0.4501 -0.1670 │ │ -0.5514 0.7337 0.1725 -0.1023 0.6292 -0.5107 -0.2008 -0.4859 0.3434 │ │ 1.2107 -1.5120 -0.2495 0.1405 -1.2987 0.9816 0.2347 1.0938 -0.5735 │ M' = │ 0.2241 -0.1713 -0.0954 -0.0868 -0.1196 0.1774 0.0830 0.1148 -0.0983 │ │ 0.6076 -0.8621 -0.2155 0.0166 -0.6233 0.6452 0.1842 0.6312 -0.3560 │ │ -0.9092 1.0035 0.2297 -0.0243 0.8488 -0.6830 -0.1314 -0.7941 0.4877 │ │ 0.6424 -0.6943 -0.0413 0.0732 -0.6939 0.4720 0.1307 0.5356 -0.3966 │ │ -0.6907 0.8420 0.2012 -0.0015 0.7832 -0.6369 -0.0921 -0.6703 0.2930 │ Notes:
The implementation Part 2: Determinant This 15-step, 18-byte subroutine (half the size of the Matrix Inversion one and three times as fast) will compute and display the determinant of an NxN partitioned matrix for N ≤ 16 (subject to available memory). It takes no inputs but the caller (the user or another program) must have previously dimensioned and populated the four blocks with the elements of the original matrix. Once it returns, the computed determinant is left in the X stack register. In other words, this subroutine can be called from the keyboard or another program (in which case it could be directly embedded into it if it's called just once) but it doesn't perform any input operations, it just computes and leaves the determinant in the X stack register (i.e. the display.) Program listing LBL D 001- 42,21,14 RESULT D 002- 42,26,14 RCL MATRIX D 003- 45,16,14 MATRIX 9 004- 42,16, 9 RCL MATRIX B 005- 45,16,12 LASTX 006- 43 36 1/x 007- 15 RESULT E 008- 42,26,15 RCL MATRIX C 009- 45,16,13 x 010- 20 RESULT A 011- 42,26,11 MATRIX 6 012- 42,16, 6 MATRIX 9 013- 42,16, 9 x 014- 20 RTN 015- 43 32 Notes: All the Notes for the Matrix Inversion subroutine exactly apply, with the following changes:
All the Requirements for the Matrix Inversion subroutine apply exactly, with the following changes:
Worked examples The following examples will show you how to use the subroutine and further assume that we're using the newly released HP-15C CE (Collector's Edition) in its default mode, i.e. with just 96 registers available to store matrix elements. 1.- Worked 5x5 Toy Example The 5x5 matrix used in this example is the same as the one used in the corresponding example for Matrix Inversion above, the only difference being that now you're asked to compute its determinant instead of its inverse, so just exactly follow the initialization and input instructions indicated there until you are about to perform the computation part, where you should now do the following: - Compute the determinant: D -> -1,813.0000 {nearly instantaneous} As no inverse matrix es computed, its output instructions don't apply here. 2.- Worked 9x9 Full-fledged Example The 9x9 matrix used in this example is the same as the one used in the corresponding example for Matrix Inversion above, the only difference being that now you're asked to compute its determinant instead of its inverse, so just exactly follow the initialization and input instructions indicated there until you are about to perform the computation part, where you should now do the following: - Compute the determinant: D -> -10,278,575.95 {~ 0.2 sec.; exact is -10,278,576} As no inverse matrix es computed, its output instructions don't apply here. Notes: You might consider applying the same finalization instructions as described in the final Notes for Matrix Inversion above. The implementation Part 3: All together now There are two million-dollar questions: 1.- Can both subroutines fit at the same time in the default CE/96 mode ? Yes, both are completely standalone (neither one requires the other or its results) but the Inversion routine is 33 bytes long while the Determinant routine is 18 bytes long, so they would seem to occupy 51 bytes together. However, if you change the final RTN instruction at step 029 in the former by an R/S instruction and delete the initial LBL D at step 001 and the RTN at step 015 in the latter, you can concatenate both and the combination will fit in 49 bytes, which is 7 registers, which is exactly all the memory that is available for the code if you want to process a 9x9 matrix in the standard CE/96 mode. But if using the extended CE/192 mode, the question is moot as both routines can be present at the same time, unmodified, and there's plenty of room for larger matrices or additional programs. 2.- Is it possible to compute both Inverse and Determinant without having to reintroduce the original data ? Yes, it is quite possible by using either one of two simple tricks. Assuming both routines are present in program memory, say you want to compute the inverse and the determinant of a large matrix without having to reintroduce the matrix elements. Do as follows:
- Execute subroutine E to compute the inverse matrix. - Recall and write down the elements of the inverse matrix - Now you have two options to avoid reintroducing the original matrix again, namely either
b) Execute subroutine D to compute the inverse's determinant, then (out of User mode) press 1/x and you'll get the determinant of the original matrix, as DET(A) = 1 / DET(Inverse of A). That's all, hope you enjoyed it and find it useful for your own purposes. Constructive comments welcome. V. Edit: corrected an omission; the reported values of the sample 9x9 determinant were missing the minus sign, as both are negative. The routine itself is Ok. Thanks, Bert ! All My Articles & other Materials here: Valentin Albillo's HP Collection |
|||
« Next Oldest | Next Newest »
|
User(s) browsing this thread: 2 Guest(s)