(42S) matrix permanent
|
01-15-2024, 12:33 PM
(This post was last modified: 01-15-2024 04:24 PM by Werner.)
Post: #1
|
|||
|
|||
(42S) matrix permanent
Thanks to:
usage: with the square matrix in stack register X, XEQ "PER". The permanent is returned in X, and the original, unchanged matrix is in Y. This is important for low-memory conditions on the 42S. variables used: REGS : v (sum of rows) n : matrix order i : 2^(n-1) k : 1..i-1 p : permanent 00 { 147-Byte Prgm } 01▸LBL "PER" 02 ENTER 03 RSUM 04 X=Y? 05 DET 06 REAL? 07 RTN 08 STO "REGS" 09 DIM? 10 R↓ 11 STO "n" 12 2 13 X<>Y 14 DSE ST X 15 Y^X 16 STO "i" 17 STO "k" 18 R↓ 19 EDIT 20 CLX 21 GTO 14 22▸LBL 10 23 RCL "i" 24 RCL- "k" 25 1 26▸LBL 11 27 RCL ST Y 28 4 29 MOD 30 GTO IND ST X 31▸LBL 01 32▸LBL 03 33 STO+ ST X @ (1,3) -> (-2,2) 34 LASTX 35 - 36 GTO 04 37▸LBL 00 38 X<> ST L 39 STO÷ ST Z @ k := k/4; 40 SQRT @ j := j + 2; 41 + 42 GTO 11 43▸LBL 02 44 SIGN @ j := j + 1; 45 + 46 X<>Y 47 8 48 MOD @ k MOD 8 is 2 or 6 49 4 @ (2,6) -> (-2,2) 50 - 51▸LBL 04 52 RCL "n" 53 1 54 R^ 55 STOIJ 56 R↓ 57 GETM 58 × 59 STO+ "REGS" 60 RCL "p" 61▸LBL 14 62 RCL "n" 63 RCL 00 64 DSE ST Y 65▸LBL 13 66 RCL× IND ST Y 67 DSE ST Y 68 GTO 13 69 RCL- ST Z 70 STO "p" 71 DSE "k" 72 GTO 10 73 RCL÷ "i" 74 +/- 75 RCLEL 76 EXITALL 77 X<>Y 78 END Timing/testing the program is easy, once you realize that the permanent of a matrix of order n, filled with all 1's, is n!. timing: order 42S DM42 5 0:18 0.05s 7 1:31 0.22s 9 7:10 1.03s 12 1:06:15 9.39s The timing of order 12 for the 42S is but an estimate, fitting a logarithmic curve through the three data points (x=time/n, y=n), yielding a .99999.. correlation, so should be pretty good. Cheers, Werner [updated with a shorter version... I seem to always find simple ways of shortening programs, moments after I posted them] 41CV†,42S,48GX,49G,DM42,DM41X,17BII,15CE,DM15L,12C,16CE |
|||
01-15-2024, 02:24 PM
Post: #2
|
|||
|
|||
RE: (42S) matrix permanent
Hi, Werner
Perhaps you can add some comments to code? Example, why does it use SQRT ? Does it have something to do with j = bit.ffs(k) ? With decimal machine, getting bit.ffs (Find First Set) is expensive. My guess is it would cost way more than these lines (j = f[1]) for next loop. f[1] = 1 f[i] = f[i+1] f[i+1] = i+1 |
|||
01-15-2024, 02:45 PM
(This post was last modified: 01-16-2024 04:54 PM by Werner.)
Post: #3
|
|||
|
|||
RE: (42S) matrix permanent
Hi, Albert!
The SQRT is a quick way to turn a 4 into a 2 ;-) The only part, I think, that requires some explanation is lines 23-51. There, I determine j and dj (+/-2) from k, as John Keith explained. So, j is the place of the least significant bit in k. Let >>k be k shifted right till the least significant bit is 1, then dj=2 if >>k MOD 4 equals 3, else dj=-2 (I *add* dj*Mj, so my signs are swapped). I also technically calculate the permanent of the transposed matrix, which is of course the same - to save a TRANS(M) which on a 42S will duplicate the matrix. I subsequently fetch the j-th *column* of M instead of the row. Instead of trying to find the least significant bit of k, successively dividing by 2, I perform k MOD 4 right away, and use GTO IND ST X to branch off: - half of the time, it is either 1 or 3 and we're done (j=1 and dj=-2 for 1 and 2 for 3). - if it is 0, divide k by 4, add 2 to j and loop - if it is 2, add 1 to j, and k MOD 8 is now 2 or 6, and dj -2 or 2, respectively I had a version using v,f and d vectors (even one using only the stack), but this latest version runs almost twice as fast, also because I store the vector v in REGS, so that I don't need to switch indexing/editing matrices, and the calculation of product(v) is faster. Cheers, Werner 41CV†,42S,48GX,49G,DM42,DM41X,17BII,15CE,DM15L,12C,16CE |
|||
01-15-2024, 04:06 PM
(This post was last modified: 01-15-2024 04:17 PM by Albert Chan.)
Post: #4
|
|||
|
|||
RE: (42S) matrix permanent
(01-15-2024 02:45 PM)Werner Wrote: dj=2 if >>k MOD 4 equals 3, else dj=-2 (I *add* dj*Mj, so my signs are swapped). If matrix cells are non-negative, product of initial v (column sums) is biggest. This is the reason my code does v[j] = v[j] - d[i]*M[i][j] Trivia: If n×n matrix are all ones, its' permanent = n! First term alone give upper limit: n^n / 2^(n-1) = 2 * (n/2)^n For comparison, Stirling's approximation: n! ~ √(2*pi*n) * (n/e)^n n=1 --> 2*(n/2)^n = 1 = 1! n=2 --> 2*(n/2)^n = 2 = 2! n=3 --> 2*(n/2)^n = 6.75 > (6 = 3!) n=4 --> 2*(n/2)^n = 32 > (24 = 4!) ... Quote:Instead of trying to find the least significant bit of k, successively dividing by 2, I perform k MOD 4 right away ... Nice optimizing trick! Quote:I had a version using v,f and d vectors (even one using only the stack), but this latest version runs almost twice as fast. Interestingly, LuaJIT code is faster (almost twice as fast) using v,f and d vectors. This despite I already use C to code bit.ffs and bit.test. It is perhaps due to LuaJIT have only 1 kind of number = IEEE double. Bit manipulations required conversion of double to integer, then back to double. Guessing which way is faster is hard. We have to code it to know it. |
|||
01-15-2024, 04:22 PM
Post: #5
|
|||
|
|||
RE: (42S) matrix permanent
(01-15-2024 04:06 PM)Albert Chan Wrote: If matrix cells are non-negative, product of initial v (column sums) is biggest.? That's what I do as well, but the sign of the d[i]'s are inversed in my code - and btw you don't have a choice, you have to loop over all prod(sum(+/-M[i][j])), and d[j] being -2 means you swap a +M[i][j] for a -M[i][j]. Quote:Guessing which way is faster is hard. We have to code it to know it. In the case of the 42S, having to access 4 arrays (M,f,v,d) meant a lot of indexing, as you can have only one matrix indexed/edited at a time - save if you store one in REGS and access it with RCL IND .. Cheers, W. 41CV†,42S,48GX,49G,DM42,DM41X,17BII,15CE,DM15L,12C,16CE |
|||
01-16-2024, 01:00 AM
Post: #6
|
|||
|
|||
RE: (42S) matrix permanent
(01-15-2024 12:33 PM)Werner Wrote: timing: XCas> Digits := 5 XCas> N := float([5,7,9]) XCas> T := float([18,91,430]) XCas> correlation(N, ln(T/N)) → 0.99999 XCas> c := linear_regression(N, ln(T/N)) → (0.64641,-1.954) ln(t/n) = c[0]*n + c[1] t/n = exp(c[1]) * exp(c[0])^n --> time (sec) ≈ 0.14170 * n * 1.9087^n --> n=12 time ≈ 3976 sec = 1:06:16 Fit is excellent, but formula does not make sense. We would expect time proportional to number of terms = 2^(n-1) Each term, time to update v's and get its product, plus getting j and d. Below does not produce as good correlation, but make more sense. XCas> correlation(N, T/2^N) → 0.99917 XCas> c := linear_regression(N, T/2^N) → (0.069336,0.21908) t/2^n = c[0]*n + c[1] = c[0] * (n + c[1]/c[0]) --> time (sec) ≈ 0.069336 * (n + 3.1596) * 2^n --> n=12 time ≈ 4305 sec = 1:11:45 |
|||
01-16-2024, 08:47 AM
Post: #7
|
|||
|
|||
RE: (42S) matrix permanent
(01-16-2024 01:00 AM)Albert Chan Wrote: Fit is excellent, but formula does not make sense. With only 3 sample points, you can't expect miracles. I won't burn through a set of batteries to see which estimate proves to be better, probably yours ;-) Werner 41CV†,42S,48GX,49G,DM42,DM41X,17BII,15CE,DM15L,12C,16CE |
|||
01-16-2024, 12:24 PM
Post: #8
|
|||
|
|||
RE: (42S) matrix permanent
We can use DM42 timings, without n=12 numbers.
Then, we predict n=12 timing, compare against actual time (9.39 sec) N := [5,7,9] T := [.05, .22, 1.03] Linear regression, N vs ln(T/N) --> correlation = 0.99939 --> time (sec) = 0.00046355 * n * 1.8393^n --> n=12 time ≈ 8.34 sec Linear regression, N vs T/2^N --> correlation = 0.98491 --> time (sec) = 0.00011230 * (n + 8.7101) * 2^n --> n=12 time ≈ 9.53 sec |
|||
01-17-2024, 08:17 PM
Post: #9
|
|||
|
|||
RE: (42S) matrix permanent
(01-15-2024 12:33 PM)Werner Wrote: Thanks to: ... and based on a Python/Numpy program posted by GitHub user "lesshaste", who deserves credit as well. Nice to see a version for Free42, I will try your optimization on my RPL program when i get the chance. |
|||
01-26-2024, 10:28 PM
Post: #10
|
|||
|
|||
RE: (42S) matrix permanent
(01-17-2024 08:17 PM)John Keith Wrote: ... I will try your optimization on my RPL program when i get the chance. Well, I tried it but it made the program substantially larger and not much faster. RPL doesn't have anything like GTO IND ST X- the closest would be either a CASE statement or nested IF..THEN..ELSE structures. The method might work well on the 71B using ON..GOTO or ON..GOSUB though. |
|||
« Next Oldest | Next Newest »
|
User(s) browsing this thread: 3 Guest(s)