RPN83P: RPN calculator for TI-83+ TI-84+ inspired by HP-42S

Threaded Mode | Linear Mode

RPN83P: RPN calculator for TI-83+ TI-84+ inspired by HP-42S

11-21-2023, 05:24 PM (This post was last modified: 11-21-2023 05:31 PM by bxparks.)

Post: #7

bxparks Offline

Member

Posts: 119
Joined: Nov 2023

RE: RPN83P: RPN calculator for TI-83+ TI-84+ inspired by HP-42S

(11-21-2023 09:26 AM)Gerald H Wrote: Very nice work, bxparks, bravo!

PRIM actually finds a factor, otherwise returning 1 for prime input.

PRIM works very fast - What algorithm have you used?

Thanks! Early versions of PRIM just returned a boolean (0 or 1) to indicate prime, but it was more useful to return the smallest prime factor. I have updated the README.md and USER_GUIDE.md to make that more clear. I don't know if you saw it, but I explained how to get all the prime factors of a number N (< 2^32) iteratively (https://github.com/bxparks/rpn83p/blob/d...me-factors), because PRIM lifts the original N to Y, then pushes the prime factor into X. So you can just hit / to get the next number to factor, and hit PRIM again.

The largest prime number <= 2^16 is 65521, so the longest execution time of PRIM is the factorization of 65521*65221=4 293 001 441. That takes 33 seconds on a TI-84+SE. I spent far too much time optimizing the PRIM function. :-) I just added notes about how PRIM works in the DEVELOPER.md notes here: https://github.com/bxparks/rpn83p/blob/d...VELOPER.md . I am copying the info below to save a click:

The basics of the algorithm is to test all the candidate prime factors from 2 to sqrt(N).
We could simply start at 3 and increment by 2 to test every odd number to sqrt(N). But we can do slightly better. All prime numbers >=5 are of the form 6k-1 and 6k+1. So each iteration can increment by 6, but perform 2 checks. This effectively means that we step by 3 through the candidate prime factors, instead of just by 2 (for all odd numbers), which makes the loop 50% faster.
We use integer operations instead of TI-OS floating point ops. If I recall, this makes it about 2-3X faster (I thought it would be more, but floating point ops in TI-OS are surprisingly fast).
Z80 does not support integer division operations in hardware, so we have to write our own in software. The integer size of N is limited to 32 bits, so we need to write a div(u32, u32) routine.
But the loop only needs to go up to sqrt(N), so we actually only need a div(u32, u16) routine, which if I recall is about 2X faster. This is because the bit-wise loop is reduced by 2X, but also because the dividend can be stored in a 16-bit Z80 register, instead of stored in 4 bytes of RAM.
Finally, we actually don't need a full div() operation for the PRIM function. We don't need the quotient, we need only the remainder. So we implement a custom mod(u32, u16) function which is about 25% faster than the full div(u32, u16) function.

I think there are additional micro-optimizations left on the table that could make the PRIM function maybe 1.5X to 2X faster, without resorting to a completely different algorithm. But I suspect that the resulting code would be difficult to understand and maintain. So I decided to stop here.