[VA] SRC #013 - Pi Day 2023 Special
|
03-21-2023, 06:06 PM
(This post was last modified: 03-22-2023 01:03 AM by Valentin Albillo.)
Post: #15
|
|||
|
|||
RE: [VA] SRC #013 - Pi Day 2023 Special
Hi, all, About a week has elapsed since I posted this \(\pi\) Day 2023 Special challenge, thank you very much to those few of you who contributed your valuable solutions and/or comments, namely 2old2randr, J-F Garnier, vaklaff and John Keith, really much appreciated. On the one hand, it's a pity that it failed to attract the attention of other people usually interested in my productions but that's life. On the other hand, this time I got RPL (and even Sys RPL ! ) solutions, which were conspicuously absent from past challenges. Now's the time for my detailed sleuthing process and resulting original solutions, plus additional comments: My sleuthing process First of all, if naively taking at face value the problem's statement, you'd might be tempted to just program a simple loop from 1 to N, compute the factorization for each number and tally up the ones which have repeated factors. This is as simple as it gets but for large N (say 1 million) it's grossly inefficient and here we're most interested in raw speed. What to do ? Search in references or the Internet for a better algorithm, that's what, in particular one which is better than O(N), as any such loop will take ages for large N, even if nothing were done inside the loop. We need something which is at least O(√N), thus transforming a 1,000,000-cycle loop into a 1,000-cycle one. To search for that, we notice that asking for numbers whose factorization has no repeated prime factors is tantamount to asking for numbers who aren't divisible by any square other than 1, which are usually called square-free integers, and a quick Google search for the term reveals many relevant links, among them one at Wolfram Research's MathWorld, where we find a formula with the required order (i.e. the upper limit of the summation goes up to √N,) equivalent to this one: which also appears in text form as a(n) = Sum_{k = 1..floor(sqrt(n))} mu(k)*floor(n/k^2) at OEIS A013928 Number of (positive) squarefree numbers < n, and many other sites and documents. The simple proof of this formula can be obtained using the inclusion-exclusion principle. As for the Möbius function, it's a very important number-theoretical function which is easily computed in various ways, like this simple HP-71B user-defined function FNM (which should be placed at the end of any program using it): 1 DEF FNM(N) @ IF N=1 THEN FNM=1 @ END ELSE F=1 2 D=PRIM(N) @ IF NOT D THEN FNM=(-1)^F ELSE IF MOD(N,D*D) THEN F=F+1 @ N=N/D @ GOTO 2 >FOR N=96673 TO 96686 @ FNM(N); @ NEXT N @@ ► 1 1 0 0 0 0 0 0 1 1 1 0 -1 -1 >S=0 @ FOR N=1 TO 1109 @ S=S+FNM(N) @ NEXT N @ S ► -15 { Mertens(1109) } but the important thing here is that we only need √N evaluations of it instead of N, which makes all the difference in the world. My original solutions Yes, solutions, plural, as I created two of them, optimized for different cases. My first solution is optimized for speed, regardless of memory use, and uses the S(n) formula above, but instead of computing each individual value for the Möbius function inside the loop, it does compute all √N values en masse before starting the summation loop, which can be done without factoring, multiplications or divisions by using a trivial sieve procedure similar to the well-known ancient Sieve of Eratosthenes algorithm used to find all prime numbers up to a limit. The sieve procedure used here runs fast, ultimately winning over for large N, and all needed values of Möbius(n) are left stored in an array to be later used inside the summation loop. First solution: speed This little 4-line, 217-byte HP-71B program implements the procedure, first sieving and storing in INTEGER array M all √N Möbius values needed, then computing the summation and outputting results and timings: (uses Math and JPC ROMs) 1 DESTROY ALL @ INPUT T @ SETTIME 0 @ N=SQR(T) @ INTEGER M(N) @ MAT M=CON @ P=2 @ WHILE P<=N 2 S=P*P @ FOR K=S TO N STEP S @ M(K)=0 @ NEXT K @ FOR K=P TO N STEP P @ M(K)=-M(K) @ NEXT K 3 P=FPRIM(P+1) @ END WHILE @ S=T @ FOR K=2 TO N @ S=S+M(K)*(T DIV (K*K)) @ NEXT K 4 DISP USING "2(3DC3DC3DC3D,2X),2(Z.8D,X),5DZ.2D";T,S,SQR(6*T/S),ABS(PI-RES),TIME
@128x @976x HP-71B ---------------------------------------------------------------------------- 12,345 7,503 3.14198205 0.00038939 0.10" 0.01" 13" 100,000 60,794 3.14155933 0.00003333 0.28" 0.04" 36" 567,890 345,237 3.14158684 0.00000582 0.69" 0.09" 1' 28" 1,000,000 607,926 3.14159550 0.00000285 0.93" 0.12" 1' 59" 10,000,000 6,079,291 3.14158749 0.00000516 3.03" 0.40" 6' 28" 25,000,000 15,198,180 3.14159240 0.00000025 4.87" 0.64" 10' 23" 33,000,000 20,061,593 3.14159276 0.00000011 5.61" 0.74" 11' 58" 1E8 60,792,694 3.14159307 0.00000042 9.93" 1.30" 21' 11" 1E9 607,927,124 3.14159260 0.00000006 32.45" 4.26" 69' 14"
- For N = 1E9, the program uses ~ 94,927 bytes of RAM (M has 31,623 elements, 94,869 bytes) Larger values of N are problematic, e.g. N = 1E10 would require in excess of 300 Kb of RAM, at the very limit of what the HP-71B can manage, which leads us to my second solution, where I'll turn from computing all Möbius values en masse to computing them individually, thus avoiding the large memory requirements, albeit at the cost of speed. Second solution: memory This even smaller 3-line, 178-byte HP-71B program uses in-lined Möbius function code (as oposed to an user-defined function) for speed, and needs only ~ 63 bytes of RAM to run no matter the size of N, all the way to a trillion (1012) - 1, i.e. 999,999,999,999, which is the largest integer the HP-71B can represent: (uses JPC ROM) 1 DESTROY ALL @ INPUT T @ SETTIME 0 @ U=-1 @ S=T @ FOR K=2 TO SQR(T) @ N=K @ F=1 2 D=PRIM(N) @ IF NOT D THEN S=S+U^F*IP(T/(K*K)) ELSE IF MOD(N,D*D) THEN F=F+1 @ N=N/D @ GOTO 2 3 NEXT K @ DISP USING "2(3DC3DC3DC3D,2X),2(Z.8D,X),5DZ.2D";T,S,SQR(6*T/S),ABS(PI-RES),TIME
@128x @976x HP-71B ---------------------------------------------------------------------------- 12,345 7,503 3.14198205 0.00038939 0.09" 0.01" 12" 100,000 60,794 3.14155933 0.00003333 0.26" 0.03" 33" 567,890 345,237 3.14158684 0.00000582 0.67" 0.09" 1' 26" 1,000,000 607,926 3.14159550 0.00000285 0.91" 0.12" 1' 56" 10,000,000 6,079,291 3.14158749 0.00000516 3.15" 0.41" 6' 43" 25,000,000 15,198,180 3.14159240 0.00000025 5.15" 0.68" 10' 59" 33,000,000 20,061,593 3.14159276 0.00000011 5.98" 0.78" 12' 45" 1E8 60,792,694 3.14159307 0.00000042 10.81" 1.42" 23' 4" 1E9 607,927,124 3.14159260 0.00000006 36.93" 4.84" 78' 47" 1E10 6,079,270,942 3.14159267 0.00000002 2' 8" 16.82" 4h 33' 1E11 60,792,710,280 3.14159265 0.00000000* 7' 37" 59.95" 16h 15' 1E12-1 607,927,102,274 3.14159265 0.00000000* 28' 9" 3' 41" 60h 2'
Thus, although this second solution can achieve the maximum possible integer range 1 .. 1012 - 1 with no memory problems and even seems to be a little faster for N up to 1 million, this is only because the HP-71B hardware and BASIC language are optimized for fast mathematical operations performed entirely in assembler without user loops or branching, which are pretty slow and thus slow down the sieving procedure. This is why a matrix inversion (MAT INV) or a Fast Fourier Transform (FOUR) are performed at speeds competing with much faster CPUs, while an empty FOR-NEXT construct does only ~100 loops/sec. on a physical HP-71B. Even so, the second solution will get increasingly slower than the first one for large N. Additional Comments ● In the very distant past (2004), I posted the pair of threads:
HP Challenge VA115 - The 71B Turtle and 49Gp Hare Harder 2nd Half ● You may be wondering why I selected N = 33,000,000 as a test case. It's an innocuous number in and of itself but I found it while reading this truly excellent book eons ago:
(464 pages, ISBN-10: 9780395929681)
The "expected number" is 33.000.000*(1 -6/\(\pi\)2). which works out to 12,938.405.6. They call this "an astonishingly close fit, better than we deserved." A nonrigorous argument has predicted a mathematical result to 8 place accuracy. In physics or chemistry. experimental agreement with theory to 8 place accuracy would be regarded as a very strong confirmation of the theory. Here, also, it is impossible to believe that such agreement is accidental. The principle by which the calculation was made must be right. " Both timings would be reduced at least an order of magnitude if rewritten in assembler and run in the same hardware (perhaps J-F Garnier would obligue and create a MAT M=MOB matrix statement which would fill up array M with the results of the sieving procedure implemented in BASIC in my first solution ... ) By the way, the aforementioned 1967 paper by I. J. Good and R. F. Churchhouse is:
● These are the exact counts S(10N) for selected N:
----------------------------------------- 0 1 1 7 2 61 3 608 4 6083 5 60794 6 607926 7 6079291 8 60792694 9 607927124 10 6079270942 11 60792710280 12 607927102274 13 6079271018294 14 60792710185947 16 6079271018540405 18 607927101854022750 24 607927101854026628773299 30 607927101854026628663278087296 36 607927101854026628663276779463775476 6/\(\pi\)2 = 0.607927101854026628663276779258365833... That's all for now, I hope you enjoyed it. Regrettably, due to a number of factors (pun intended,) I'm now taking a leave of absence from further challenges for a while. Cya. V. Edit: rephrased one sentence. All My Articles & other Materials here: Valentin Albillo's HP Collection |
|||
« Next Oldest | Next Newest »
|
User(s) browsing this thread: 2 Guest(s)