Post Reply 
New Saturn asm "add loop" benchmark for the HP48G
11-04-2023, 06:26 PM
Post: #21
RE: New Saturn asm "add loop" benchmark for the HP48G
What also confused me into thinking that dividing by two was necessary is that there are various disparate instruction cycle count lists -- one by Mika Heiskanen, the official one from SASM.DOC and the cycle times reported by CTIM of the HACK library. All these list lower cycle counts ( Also, Cycles du Saturn list ~7MHz to ~8MHz clock speeds, which made me think they were talking about the HFO ). Also, CTIM reports half and quarter cycles, which are absent in Cycles du Saturn. It makes me think that there are no completely accurate cycle counts out there for the Yorke Saturn.

Regards,

Jonathan

Aeternitas modo est. Longa non est, paene nil.
Find all posts by this user
Quote this message in a reply
11-04-2023, 07:45 PM
Post: #22
RE: New Saturn asm "add loop" benchmark for the HP48G
(10-26-2023 06:11 PM)Xorand Wrote:  I realize this isn't the Prime forum, but I was curious what a Prime G2 could do with just a straight PPL program:

EXPORT ADDD()
BEGIN
A:=0;
WHILE 1 DO
A:=A+1;
END;
END;

Result was 4,979,849

here there is a program (likely on the G1) that does it a bit faster. http://www.wiki4hp.com/doku.php?id=benchmarks:addloop - 6,646,300

(and I likely missed some updates over time, I don't keep track of everything but on the wiki everyone could contribute if they register!)

Wikis are great, Contribute :)
Find all posts by this user
Quote this message in a reply
11-04-2023, 07:47 PM
Post: #23
RE: New Saturn asm "add loop" benchmark for the HP48G
(10-27-2023 12:56 AM)Valentin Albillo Wrote:  
(10-26-2023 06:11 PM)Xorand Wrote:  [...] I was curious what a Prime G2 could do with just a straight PPL program:[...] Result was 4,979,849

Speaking of curiosity, running the equivalent program written in plain old RPN on an old Free42 version running on a 12-year-old iPad 2 gives 5,235,602.

V.

!

Though I would expect that the Ipad2 is much more beefy than the prime (G1 or G2) and of course could add even more if the free42 could run multiple time (one per core).

I wonder if here is a version of the prime for iOS. I know that is there for android.

Wikis are great, Contribute :)
Find all posts by this user
Quote this message in a reply
11-04-2023, 08:26 PM
Post: #24
RE: New Saturn asm "add loop" benchmark for the HP48G
(11-04-2023 07:47 PM)pier4r Wrote:  I wonder if here is a version of the prime for iOS. I know that is there for android.

There is.

https://apps.apple.com/us/app/hp-prime-pro/id1064702857

--Bob Prosperi
Find all posts by this user
Quote this message in a reply
11-05-2023, 02:49 PM
Post: #25
RE: New Saturn asm "add loop" benchmark for the HP48G
Assuming the cycle counts from "Cycles du Saturn" are accurate ( which I doubt as they don't contain any half or quarter cycles due to Saturn bus NSTR cycle stretching ), then an estimated count for Werner's original non-unrolled add loop would be :

\( \dfrac{3900000 \cdot 60}{49} \approx 4775510 \)

This is 392325 larger than the actual count. Again, assuming the cycle counts I've been referencing are accurate, then this means something is stealing about (392325/60)*49 ~ 320398 cycles/sec . I know TIMER2, when running, steals a small amount of cycles for the purposes of keyboard polling every 1mS. Also, the card detect circuitry steals cycles while it's enabled. I'm dubious as to whether the aforementioned cumulative overhead steals ~320398 cycles/S, but I'm not sure. If the lower performance of my code can't be explained by taking the above into account, then we have a mystery.

Regards,

Jonathan

Aeternitas modo est. Longa non est, paene nil.
Find all posts by this user
Quote this message in a reply
11-06-2023, 12:10 PM
Post: #26
RE: New Saturn asm "add loop" benchmark for the HP48G
Maybe you will find correct cycle count here:
https://www.hpcalc.org/hp48/docs/programming/cycles.zip

This document contain quarter cycles, compiled by Mika Heiskanen
From the title of the document it is for the 48GX model.

br
Gjermund
Find all posts by this user
Quote this message in a reply
11-06-2023, 04:30 PM
Post: #27
RE: New Saturn asm "add loop" benchmark for the HP48G
(11-06-2023 12:10 PM)Gjermund Skailand Wrote:  Maybe you will find correct cycle count here:
https://www.hpcalc.org/hp48/docs/programming/cycles.zip

This document contain quarter cycles, compiled by Mika Heiskanen
From the title of the document it is for the 48GX model.

br
Gjermund

Yes, I know about Mika's cycle count doc Smile The problem is that -- I think -- the cycle counts only take into account the number of NSTR cycles taken up by the Saturn bus and the internal execution time taken up by the CPU disregarding the time taken up by the memory controllers. The same goes for SASM.DOC .

If we use Mika's cycle counts, we have :
  • C=C+1 A 4.75 cycles
  • GONC ( branch taken ) 8 cycles

This means ( at least at even addresses ) that, according to Mika's counts, Werner's inner add loop takes 12.75 cycles which means :

(3900000*60)/12.75 ~ 18352941

which is obviously wrong.

Also, with SASM.DOC :
  • C=C+1 A 7 cycles
  • GONC ( branch taken ) 10 cycles

which is a total of 17 cycles, and therefore we have :

(3900000*60)/17 ~ 13764705

which is also obviously wrong.

AFAIK, there are no completely accurate cycle counts that take into account the memory controllers, the Saturn bus and the CPU execution time.

Regards,

Jonathan

Aeternitas modo est. Longa non est, paene nil.
Find all posts by this user
Quote this message in a reply
11-06-2023, 05:26 PM
Post: #28
RE: New Saturn asm "add loop" benchmark for the HP48G
(11-05-2023 02:49 PM)Jonathan Busby Wrote:  [snip]...something is stealing about (392325/60)*49 ~ 320398 cycles/sec . I know TIMER2, when running, steals a small amount of cycles for the purposes of keyboard polling every 1mS.

I think keyboard polling may be the culprit : (320398/1024) ~ 313 -- which means that each keyboard poll takes about 313 cycles.

Quote:Also, the card detect circuitry steals cycles while it's enabled.

Actually, although I'm not completely sure, I think this isn't the case.

Quote:I'm dubious as to whether the aforementioned cumulative overhead steals ~320398 cycles/S, but I'm not sure. If the lower performance of my code can't be explained by taking the above into account, then we have a mystery.

The only way to determine this for sure is to use a hardware Saturn bus analyzer.

Regards,

Jonathan

Aeternitas modo est. Longa non est, paene nil.
Find all posts by this user
Quote this message in a reply
11-07-2023, 06:12 PM
Post: #29
RE: New Saturn asm "add loop" benchmark for the HP48G
(11-05-2023 02:49 PM)Jonathan Busby Wrote:  Assuming the cycle counts from "Cycles du Saturn" are accurate ( which I doubt as they don't contain any half or quarter cycles due to Saturn bus NSTR cycle stretching ), then an estimated count for Werner's original non-unrolled add loop would be :

\( \dfrac{3900000 \cdot 60}{49} \approx 4775510 \)

This is 392325 larger than the actual count. Again, assuming the cycle counts I've been referencing are accurate, then this means something is stealing about (392325/60)*49 ~ 320398 cycles/sec . I know TIMER2, when running, steals a small amount of cycles for the purposes of keyboard polling every 1mS. Also, the card detect circuitry steals cycles while it's enabled. I'm dubious as to whether the aforementioned cumulative overhead steals ~320398 cycles/S, but I'm not sure. If the lower performance of my code can't be explained by taking the above into account, then we have a mystery.

Regards,

Jonathan

Have you thought about the UMA architecture sharing the main RAM between CPU und display controller in connection with the Clarke and York chip?

Not ony keyboard polling or card detection need some extra cycles, quote from the Clarke document: "The CPU is halted for 22-23 uS every 244 uS to read the data."

To avoid this, switch off the display during program execution.

(11-05-2023 02:49 PM)Jonathan Busby Wrote:  Assuming the cycle counts from "Cycles du Saturn" are accurate ( which I doubt as they don't contain any half or quarter cycles due to Saturn bus NSTR cycle stretching )

This is an extraction of the HP17BII =BPUTL source code. This code was designed to run on on the Saturn-ROM inside the Lewis chip or with different timing contants on an external 8-bit ROM chip.

The Saturn cycles are in [] where as the MEMC cycles are in ().
Code:

=BPUTL
...

*                                                 ---
BP10    B=A    S        (6)      Init f-counter      |     [4]
        OUT=C          (10)      Toggle beeper       |     [6]
        CPEX   2       (10)                          |     [6]
*                                   ---              |
BP20    CSL    W       (22)  NOP20     |             |    [20]
        CSR    W       (24)  NOP20     |             |    [20]
        B=B-1  S        (6)  Freq cntl |             |     [4]
        GONC   BP20  (14/7)            |             |  [10/3]
*                                   ---              |
        B=B+1  X        (8)     Incr dd*16; Done?    |     [6]
        GONC   BP10  (14/7)                          |  [10/3]
*                                                 ---
...

For thoose who may ask, sorry I don't have the complete HP17BII souece code, only the =BPUTL and =CHECKSUM implementations. Also no further information about the NSTR cycle stretching. The HP17BII entry point table is part of the Emu42 installer distribution so when runnung a HP17BII inside Emu42 you can easily jump to the =BPUTL code entrypoint in the Debugger.
Visit this user's website Find all posts by this user
Quote this message in a reply
11-08-2023, 06:26 PM
Post: #30
RE: New Saturn asm "add loop" benchmark for the HP48G
(11-07-2023 06:12 PM)Christoph Giesselink Wrote:  Have you thought about the UMA architecture sharing the main RAM between CPU und display controller in connection with the Clarke and York chip?

Not ony keyboard polling or card detection need some extra cycles, quote from the Clarke document: "The CPU is halted for 22-23 uS every 244 uS to read the data."

To avoid this, switch off the display during program execution.

At the very beginning of the program the display is turned off via =DispOff Smile

Also, I thought the card detection circuitry ran independently of the CPU?

Quote:Also no further information about the NSTR cycle stretching.

This seems to be undocumented and a big mystery.

Regards,

Jonathan

Aeternitas modo est. Longa non est, paene nil.
Find all posts by this user
Quote this message in a reply
11-09-2023, 10:51 AM (This post was last modified: 11-11-2023 11:03 AM by dlidstrom.)
Post: #31
RE: New Saturn asm "add loop" benchmark for the HP48G
This program will run until keypress and will calculate the per minute number of additions. Using iHP48 on my iPhone Xs it reports about 5-7 million (!) additions. Imagine a physical device at that speed!

Code:

::
 SysTime
 %0
 BEGIN
 ZERO
 # 1000 ( try different values here )
 #1+_ONE_DO
 #1+
 LOOP
 UNCOERCE
 %+
 GETTOUCH
 UNTIL
 DROP
 SysTime
 ROT
 bit-
 HXS>%
 % 8192
 %/
 2DUP
 %/
 % 60
 %*
;

2xHP48GX, HP 50g, two Retrotronik ram cards, DM42
/Daniel Lidström
Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 2 Guest(s)