Threaded Mode | Linear Mode

Valentin Albillo · (This post was last modified: 03-22-2023 01:03 AM by Valentin Albillo.)

Hi, all,

About a week has elapsed since I posted this \(\pi\) Day 2023 Special challenge, thank you very much to those few of you who contributed your valuable solutions and/or comments, namely 2old2randr, J-F Garnier, vaklaff and John Keith, really much appreciated.

On the one hand, it's a pity that it failed to attract the attention of other people usually interested in my productions but that's life. On the other hand, this time I got RPL (and even Sys RPL ! ) solutions, which were conspicuously absent from past challenges.

Now's the time for my detailed sleuthing process and resulting original solutions, plus additional comments:

My sleuthing process

First of all, if naively taking at face value the problem's statement, you'd might be tempted to just program a simple loop from 1 to N, compute the factorization for each number and tally up the ones which have repeated factors. This is as simple as it gets but for large N (say 1 million) it's grossly inefficient and here we're most interested in raw speed.

What to do ? Search in references or the Internet for a better algorithm, that's what, in particular one which is better than O(N), as any such loop will take ages for large N, even if nothing were done inside the loop. We need something which is at least O(√N), thus transforming a 1,000,000-cycle loop into a 1,000-cycle one.

To search for that, we notice that asking for numbers whose factorization has no repeated prime factors is tantamount to asking for numbers who aren't divisible by any square other than 1, which are usually called square-free integers, and a quick Google search for the term reveals many relevant links, among them one at Wolfram Research's MathWorld, where we find a formula with the required order (i.e. the upper limit of the summation goes up to √N,) equivalent to this one:
[Image: SRC-13-1-3-jkfkjf.jpg]

which also appears in text form as a(n) = Sum_{k = 1..floor(sqrt(n))} mu(k)*floor(n/k^2) at OEIS A013928 Number of (positive) squarefree numbers < n, and many other sites and documents. The simple proof of this formula can be obtained using the inclusion-exclusion principle.

As for the Möbius function, it's a very important number-theoretical function which is easily computed in various ways, like this simple HP-71B user-defined function FNM (which should be placed at the end of any program using it):

   1  DEF FNM(N) @ IF N=1 THEN FNM=1 @ END ELSE F=1
   2  D=PRIM(N) @ IF NOT D THEN FNM=(-1)^F ELSE IF MOD(N,D*D) THEN F=F+1 @ N=N/D @ GOTO 2

      >FOR N=96673 TO 96686 @ FNM(N); @ NEXT N @@       ►  1 1 0 0 0 0 0 0 1 1 1 0 -1 -1
      >S=0 @ FOR N=1 TO 1109 @ S=S+FNM(N) @ NEXT N @ S ►  -15   { Mertens(1109) }

but the important thing here is that we only need √N evaluations of it instead of N, which makes all the difference in the world.

My original solutions

Yes, solutions, plural, as I created two of them, optimized for different cases. My first solution is optimized for speed, regardless of memory use, and uses the S(n) formula above, but instead of computing each individual value for the Möbius function inside the loop, it does compute all √N values en masse before starting the summation loop, which can be done without factoring, multiplications or divisions by using a trivial sieve procedure similar to the well-known ancient Sieve of Eratosthenes algorithm used to find all prime numbers up to a limit.

The sieve procedure used here runs fast, ultimately winning over for large N, and all needed values of Möbius(n) are left stored in an array to be later used inside the summation loop.

First solution: speed

This little 4-line, 217-byte HP-71B program implements the procedure, first sieving and storing in INTEGER array M all √N Möbius values needed, then computing the summation and outputting results and timings: (uses Math and JPC ROMs)

1  DESTROY ALL @ INPUT T @ SETTIME 0 @ N=SQR(T) @ INTEGER M(N) @ MAT M=CON @ P=2 @ WHILE P<=N
2  S=P*P @ FOR K=S TO N STEP S @ M(K)=0 @ NEXT K @ FOR K=P TO N STEP P @ M(K)=-M(K) @ NEXT K
3  P=FPRIM(P+1) @ END WHILE @ S=T @ FOR K=2 TO N @ S=S+M(K)*(T DIV (K*K)) @ NEXT K
4  DISP USING "2(3DC3DC3DC3D,2X),2(Z.8D,X),5DZ.2D";T,S,SQR(6*T/S),ABS(PI-RES),TIME

\(\pi\)

go71b

Emu71/Win

Physical

@128x @976x HP-71B

3.141

98205

13"

3.1415

5933

36"

3.1415

8684

1' 28"

3.14159

550

1' 59"

6' 28"

10' 23"

11' 58"

21' 11"

69' 14"

The caveat is, storing all Möbius values in INTEGER array M does require large amounts of memory if N is huge, e.g.:

N

1E8

30,055 bytes

RAM

M

10,000

30,000 bytes

N

1E9

94,927 bytes

RAM

M

31,623

94,869 bytes

Larger values of N are problematic, e.g. N = 1E10 would require in excess of 300 Kb of RAM, at the very limit of what the HP-71B can manage, which leads us to my second solution, where I'll turn from computing all Möbius values en masse to computing them individually, thus avoiding the large memory requirements, albeit at the cost of speed.

Second solution: memory

This even smaller 3-line, 178-byte HP-71B program uses in-lined Möbius function code (as oposed to an user-defined function) for speed, and needs only ~ 63 bytes of RAM to run no matter the size of N, all the way to a trillion (10¹²) - 1, i.e. 999,999,999,999, which is the largest integer the HP-71B can represent: (uses JPC ROM)

1 DESTROY ALL @ INPUT T @ SETTIME 0 @ U=-1 @ S=T @ FOR K=2 TO SQR(T) @ N=K @ F=1
2 D=PRIM(N) @ IF NOT D THEN S=S+U^F*IP(T/(K*K)) ELSE IF MOD(N,D*D) THEN F=F+1 @ N=N/D @ GOTO 2
3 NEXT K @ DISP USING "2(3DC3DC3DC3D,2X),2(Z.8D,X),5DZ.2D";T,S,SQR(6*T/S),ABS(PI-RES),TIME

\(\pi\)

go71b

Emu71/Win

Physical

@128x @976x HP-71B

3.141

98205

12"

3.1415

5933

33"

3.1415

8684

1' 26"

3.14159

550

1' 56"

6' 43"

10' 59"

12' 45"

23' 4"

78' 47"

4h 33'

16h 15'

60h 2

3.14159265115

0.00000000244

3.14159265250

0.00000000109

Thus, although this second solution can achieve the maximum possible integer range 1 .. 10¹² - 1 with no memory problems and even seems to be a little faster for N up to 1 million, this is only because the HP-71B hardware and BASIC language are optimized for fast mathematical operations performed entirely in assembler without user loops or branching, which are pretty slow and thus slow down the sieving procedure.

This is why a matrix inversion (MAT INV) or a Fast Fourier Transform (FOUR) are performed at speeds competing with much faster CPUs, while an empty FOR-NEXT construct does only ~100 loops/sec. on a physical HP-71B. Even so, the second solution will get increasingly slower than the first one for large N.

Additional Comments

● In the very distant past (2004), I posted the pair of threads:

HP Challenge VA114 - The Turtle (HP-71B) and the Hare (HP49G+)

HP Challenge VA115 - The 71B Turtle and 49Gp Hare Harder 2nd Half

where I pitted the HP-71B vs. the HP49G+, executing various quite hard computations, and the resulting times showed that the HP49G+ was usually about 4.6x to 6.4x faster than the HP-71B, so any timings must take this into account in order to ascertain the relative performance of the various solutions running on different hardware.

● You may be wondering why I selected N = 33,000,000 as a test case. It's an innocuous number in and of itself but I found it while reading this truly excellent book eons ago:

The Mathematical Experience

Phillip J. Davis

Reuben Hersh

ISBN-10:

where they say:

"A more refined piece of natural scientific research into prime Numbers was reported in a paper by I. J. Good and R. F. Churchhouse in 1967 [...]"

and they elaborate in page 367, I quote:

"To check their probabilistic reasoning, Good and Churchhouse did some numerical work. [...] In a separate calculation, they found that the total number of zeros of μ(n) [VA: the Möbius function] for n between 0 and 33,000,000 is 12,938,407.

The "expected number" is 33.000.000*(1 -6/\(\pi\)²). which works out to 12,938.405.6. They call this "an astonishingly close fit, better than we deserved." A nonrigorous argument has predicted a mathematical result to 8 place accuracy. In physics or chemistry. experimental agreement with theory to 8 place accuracy would be regarded as a very strong confirmation of the theory. Here, also, it is impossible to believe that such agreement is accidental. The principle by which the calculation was made must be right.

I was sure it took them considerable time to get the tally for 33,000,000 using some 1967-era computer and wanted to know how long it would take a 1984-era handheld HP-71B to get the result, which is less than 12 min. for a physical machine and less than a second for a modern emulation (Emu71/Win).

Both timings would be reduced at least an order of magnitude if rewritten in assembler and run in the same hardware (perhaps J-F Garnier would obligue and create a MAT M=MOB matrix statement which would fill up array M with the results of the sieving procedure implemented in BASIC in my first solution ... Smile

)

By the way, the aforementioned 1967 paper by I. J. Good and R. F. Churchhouse is:

The Riemann Hypothesis and Pseudo-random Features of the Möbius Sequence

where among other very interesting things, they say:

"Thus the Möbius sequence contains arbitrarily long runs of zeros, but these long runs presumably occur extremely rarely."

I showed above such a run of zeros from N = 96675 to 96680, six consecutive zeros in all. It would make for an interesting challenge to locate longer runs.

● These are the exact counts S(10^N) for selected N:

      N  S(10^N)
-----------------------------------------
      0  1
      1  7
      2  61
      3  608
      4  6083
      5  60794
      6  607926
      7  6079291
      8  60792694
      9  607927124
     10  6079270942
     11  60792710280
     12  607927102274

     13  6079271018294
     14  60792710185947
     16  6079271018540405
     18  607927101854022750
     24  607927101854026628773299
     30  607927101854026628663278087296
     36  607927101854026628663276779463775476
6/\(\pi\)² = 0.607927101854026628663276779258365833...

So you see, you only need to tally up numbers up to 10³⁶ with no repeated prime factors and you'll get an approximation to \(\pi\) correct to 27 decimal digits ! Smile

That's all for now, I hope you enjoyed it. Regrettably, due to a number of factors (pun intended,) I'm now taking a leave of absence from further challenges for a while. Cya.

V.
Edit: rephrased one sentence.