Threaded Mode | Linear Mode

HP67 · (This post was last modified: 06-23-2014 07:08 AM by HP67.)

This program seems like it takes much longer than it should to run. How long should it take? I have no idea, but not as long as it is taking. I have another program (not shown) that does some string substitution (replacement) and the seat of the pants feeling is it's faster than it should be. I would have thought that scanning for substrings and replacing them with unlike-sized substrings would be one of the more expensive operations. But this simple piece of code that just concatenates a few strings can't get out of its own way.

As you can see there is not a whole lot going on in this code. 1000 tests of converting 2014 (5 while loop executions) to roman numerals in a loop take 970.3 seconds on a 50g on USB power.

Code:

%%HP: T(3)A(R)F(.);

@

@ ( real or int ==> roman numerals )

@

\<< "" SWAP

  WHILE DUP 0 > REPEAT

     CASE 

        DUP 1000 \>= THEN 1000 "M"  END

        DUP  900 \>= THEN  900 "CM" END

        DUP  500 \>= THEN  500 "D"  END 

        DUP  400 \>= THEN  400 "CD" END 

        DUP  100 \>= THEN  100 "C"  END

        DUP   90 \>= THEN   90 "XC" END

        DUP   50 \>= THEN   50 "L"  END

        DUP   40 \>= THEN   40 "XL" END

        DUP   10 \>= THEN   10 "X"  END

        DUP    9 \>= THEN    9 "IX" END

        DUP    5 \>= THEN    5 "V"  END

        DUP    4 \>= THEN    4 "IV" END

        DUP    1 \>= THEN    1 "I"  END

    END

    ROT ROT -

    ROT ROT +

    SWAP

  END DROP

What is the source of the expression "dog slow?" The dogs I had were anything but slow. And even a slow dog is faster than a man can run.

Dave Britten · 06-11-2014, 06:58 PM

First observation: make sure you use reals rather than ints (e.g. 1000. rather than 1000) as they're handled much faster.

I'll play with it on my 48 and see if I can spot anything else. I have a couple hunches for what to look at.

HP67 · (This post was last modified: 06-11-2014 07:11 PM by HP67.)

Thank you, Dave. I knew that. But for some reason (playing around with CAS today) I left it in = mode and that seems to cause numbers typed in to default to ints. I did not notice this happened until you mentioned it. Will run the test again and update!

YOWZA! 142.6449 seconds with reals. Couldn't see the forest for the trees! Thanks again.

Dave Britten · 06-11-2014, 07:51 PM

Huh, I figured the WHILE/CASE anti-pattern would be slowing it down, but I've tried several different approaches to pulling it apart, and all the locally defined subroutine tricks are actually slower than the extraneous CASE evaluations.

I've got a couple other weird ideas to try, though.

Dave Britten · 06-11-2014, 08:23 PM

Alright, this was as fast as I could get it. It runs in about half the time, but it's also about 140 bytes larger. Anything I tried involving local variables and subroutines actually ran slower. :/ I'm guessing it could be even faster if you get rid of the A subroutine and unroll it wherever it's used, but then the program will get even more bloated.

Code:

\<<

  \<< ROT ROT - ROT ROT + SWAP \>> \-> A

  \<< "" SWAP

    WHILE DUP 1000 \>= REPEAT 1000 "M" A EVAL END

    WHILE DUP 900 \>= REPEAT 900 "CM" A EVAL END

    WHILE DUP 500 \>= REPEAT 500 "D" A EVAL END

    WHILE DUP 400 \>= REPEAT 400 "CD" A EVAL END

    WHILE DUP 100 \>= REPEAT 100 "C" A EVAL END

    WHILE DUP 90 \>= REPEAT 90 "XC" A EVAL END

    WHILE DUP 50 \>= REPEAT 50 "L" A EVAL END

    WHILE DUP 40 \>= REPEAT 40 "XL" A EVAL END

    WHILE DUP 10 \>= REPEAT 10 "X" A EVAL END

    WHILE DUP 9 \>= REPEAT 9 "IX" A EVAL END

    WHILE DUP 5 \>= REPEAT 5 "V" A EVAL END

    WHILE DUP 4 \>= REPEAT 4 "IV" A EVAL END

    WHILE DUP 1 \>= REPEAT 1 "I" A EVAL END

    DROP

  \>>

\>>

I tried it with IF instead of WHILE for the multi-letter terms (since they can only be appended once), but it was slightly slower. I can't explain that. Maybe extra parsing overhead looking for an ELSE or something.

HP67 · (This post was last modified: 06-12-2014 11:50 AM by HP67.)

(06-11-2014 08:23 PM)Dave Britten Wrote: I'm guessing it could be even faster if you get rid of the A subroutine and unroll it wherever it's used, but then the program will get even more bloated.

That was actually my first version since I develop on the calc and don't have enough screen real estate and not enough experience with RPL to hit a solution I like on the first try. Then I went through it and factored out the redundant stuff, and moved it outside the CASE structure. Unfortunately, I didn't time it. I just knew it was redundant and got rid of it. It wasn't exactly the same as the subroutine, actually it was less instructions because i could do things more in order, closer to where I had the numerical value and symbol. Next time I'll be a little more careful, since it's obvious I can't tell a good performing RPL program from a bad one just by looking at it.

Dave Britten Wrote:the extraneous CASE evaluations.

Are you talking about inside the CASE statement where the debugger appears to be stepping over all the unselected clauses one by one? If so, I noticed this and I don't understand it. Obviously CASE does work, and once a clause is selected the other clauses are certainly not selected. But sometimes (always?) the debugger appears to be testing all the clauses when you STEP.

Thanks for looking at this. Some interesting stuff for me in what you did. I had not seen a subroutine made local like you mentioned possibly trying earlier and showed in your example here. And I didn't think about getting rid of the while/case and using a bunch of whiles. I'll play around with this a little more.

Dave Britten · 06-12-2014, 12:23 PM

(06-12-2014 11:44 AM)HP67 Wrote: Are you talking about inside the CASE statement where the debugger appears to be stepping over all the unselected clauses one by one? If so, I noticed this and I don't understand it. Obviously CASE does work, and once a clause is selected the other clauses are certainly not selected. But sometimes (always?) the debugger appears to be testing all the clauses when you STEP.

Yeah, the CASE statement would have to evaluate each condition sequentially until it finds the correct block to execute. As the remaining value gets smaller, you'll see it taking longer to work its way down to the bottom conditions. The 48 CASE statement is really more if an IF/ELSE IF/ELSE structure than the case structure from C, etc. In other words, you're getting a lot of extra DUP >= checks when the remaining value is small.

Thomas Klemm · 06-12-2014, 01:48 PM

Just moved the WHILE-loop to the subroutine A. But I didn't check whether this is faster or slower.

Code:

\<<

  "" SWAP

  \<< \-> k r

    \<<

      WHILE k DUP2 \>=

      REPEAT - SWAP r + SWAP

      END DROP

    \>>

  \>> \-> A

  \<<

    1000 "M" A EVAL

    900 "CM" A EVAL

    500 "D" A EVAL

    400 "CD" A EVAL

    100 "C" A EVAL

    90 "XC" A EVAL

    50 "L" A EVAL

    40 "XL" A EVAL

    10 "X" A EVAL

    9 "IX" A EVAL

    5 "V" A EVAL

    4 "IV" A EVAL

    1 "I" A EVAL

  \>>

  DROP

\>>

Cheers
Thomas

Dave Britten · 06-12-2014, 01:53 PM

(06-12-2014 01:48 PM)Thomas Klemm Wrote: Just moved the WHILE-loop to the subroutine A. But I didn't check whether this is faster or slower.

Code:

\<< "" SWAP \<< \-> k r \<< WHILE k DUP2 \>= REPEAT - SWAP r + SWAP END DROP \>> \>> \-> A \<< 1000 "M" A EVAL 900 "CM" A EVAL 500 "D" A EVAL 400 "CD" A EVAL 100 "C" A EVAL 90 "XC" A EVAL 50 "L" A EVAL 40 "XL" A EVAL 10 "X" A EVAL 9 "IX" A EVAL 5 "V" A EVAL 4 "IV" A EVAL 1 "I" A EVAL \>> DROP \>>

Cheers
Thomas

I think I tried that too (I did about 6 different modified versions), and it was a tiny bit slower, despite the code being cleaner.

Also, doing it with binary ints rather than reals was a little bit slower as well.

Thomas Klemm · 06-12-2014, 02:04 PM

(06-12-2014 01:53 PM)Dave Britten Wrote: Also, doing it with binary ints rather than reals was a little bit slower as well.

Sorry, I did that on an HP-48.

Dave Britten · 06-12-2014, 02:49 PM

(06-12-2014 02:04 PM)Thomas Klemm Wrote:
(06-12-2014 01:53 PM)Dave Britten Wrote: Also, doing it with binary ints rather than reals was a little bit slower as well.

Sorry, I did that on an HP-48.

Me too; I meant, e.g., #1000d vs. 1000. Sorry, wasn't directly referring to your code there, just throwing out something else I tried.

I'm not too surprised that ran slower, given how the Saturn and RPL are built for BCD. I just tried it because of a vague, nagging memory that I had something that ran faster with binary rather than real.

HP67 · (This post was last modified: 06-12-2014 05:57 PM by HP67.)

(06-12-2014 12:23 PM)Dave Britten Wrote:
(06-12-2014 11:44 AM)HP67 Wrote: Are you talking about inside the CASE statement where the debugger appears to be stepping over all the unselected clauses one by one? If so, I noticed this and I don't understand it. Obviously CASE does work, and once a clause is selected the other clauses are certainly not selected. But sometimes (always?) the debugger appears to be testing all the clauses when you STEP.

Yeah, the CASE statement would have to evaluate each condition sequentially until it finds the correct block to execute. As the remaining value gets smaller, you'll see it taking longer to work its way down to the bottom conditions. The 48 CASE statement is really more if an IF/ELSE IF/ELSE structure than the case structure from C, etc. In other words, you're getting a lot of extra DUP >= checks when the remaining value is small.

Oh, that's not what I meant. When debugging CASE statements in other code, the debugger seemed to be evaluating other clauses after the clause that was selected. It's easy to prove the code doesn't do that but in the debugger it sometimes looks like it does.

As far as what I think you're talking about now, having to DUP DUP DUP did bother me because it doesn't look very sano. But at the same time DUP is cheap, probably one of the cheapest ops in RPL so in practice it shouldn't do anything bad. And, if the code has to check for a certain condition, CASE and IF/THEN/ELSE are all the same. There are always going to be a bunch of redunant checks until you get to condition you want. The only way around it would be to code a very complicated CASE or IF/THEN/ELSE that would repeat only relevant tests. I thought about it, but didn't want to try it badly enough to actually code it for this.

Thanks Dave, Thomas for your comments and code.

HP67 · (This post was last modified: 06-12-2014 06:13 PM by HP67.)

Dave, your replacement with the local subroutine screams on my 50g. Time is 102 seconds average for two runs of 1,000 conversions. Nice work. Lots to be said for local subroutines.

I'll try Thomas' and post the 50g time later.

Ok, Thomas' version takes 157 seconds. Although as is usual for him the code is very clean I would guess calling the subroutine whether you need to or not accounts for the extra time. I think the fact that Dave turned my CASE into a bunch of descending whiles was pure genius. Thanks for the insights guys!

Thomas Klemm · 06-12-2014, 09:17 PM

Yet another variant without using a subroutine:

Code:

\<< \-> n

  \<<

    {

      { ""      n }

      { "M"  1000 }

      { "CM"  900 }

      { "D"   500 }

      { "CD"  400 }

      { "C"   100 }

      { "XC"   90 }

      { "L"    50 }

      { "XL"   40 }

      { "X"    10 }

      { "IX"    9 }

      { "V"     5 }

      { "IV"    4 }

      { "I"     1 }

    }

    \<<

      + OBJ\-> DROP

      \-> r k

      \<<

        WHILE DUP k \>=

        REPEAT

          k - SWAP

          r + SWAP

        END

        2 \->LIST

      \>>

    \>>

    STREAM HEAD

  \>>

\>>

But I doubt that it will be faster with all that creating and destroying of lists.

Cheers
Thomas

PS: I tried using DOLIST but I failed. However I noticed something I didn't know:
{ 1 2 3 }
1
\<< DUP \>>
DOLIST

This produces:
{ 1 1 2 2 3 3 }

Whatever the program produces will be added to the list. I thought this will lead to:
1
2
3
{ 1 2 3 }

Dave Britten · (This post was last modified: 06-12-2014 09:51 PM by Dave Britten.)

(06-12-2014 09:17 PM)Thomas Klemm Wrote: PS: I tried using DOLIST but I failed. However I noticed something I didn't know:
{ 1 2 3 }
1
\<< DUP \>>
DOLIST

This produces:
{ 1 1 2 2 3 3 }

Whatever the program produces will be added to the list. I thought this will lead to:
1
2
3
{ 1 2 3 }

Yeah, DOLIST (and I think maybe DOSUBS) is a little quirky. If the program leaves anything on the stack, the outputs will be collected in a list. Otherwise DOLIST doesn't have any final output.

This example produces just {1 2 3}:

Code:

{ 1 2 3 }

1

\<< \>>

DOLIST

The empty program does nothing and just leaves its one input as an output, which DOLIST obediently collects back into a list.

DavidM · 06-13-2014, 01:01 AM

(06-12-2014 09:17 PM)Thomas Klemm Wrote: Yet another variant without using a subroutine:
...
PS: I tried using DOLIST...

I kept thinking there would be a DOLIST option for this as well. I'm not sure how the following compares with the rest speed-wise, but I did finally find a way to shoehorn a DOLIST (and a sigmaLIST) into the mix.

Code:

\<<

    { } SWAP

    { 1000 900 500 400 100 90 50 40 10 9 5 4 1 }    

    1 13 FOR n

        DUP n GET

        ROT SWAP IDIV2

        ROT ROT 4 ROLL

        SWAP + UNROT

    NEXT

    DROP2

    { "M" "CM" "D" "CD" "C" "XC" "L" "XL" "X" "IX" "V" "IV" "I" } SWAP

    2 \<< NDUPN DROP \>> DOLIST

    \GSLIST

\>>

Thomas Klemm · 06-13-2014, 08:12 AM

(06-13-2014 01:01 AM)DavidM Wrote: I did finally find a way to shoehorn a DOLIST (and a sigmaLIST) into the mix.

Very nice! My only suggestion is to use a local variable L for the list. This makes the FOR-loop simpler:

Code:

\<<

  { 1000 900 500 400 100 90 50 40 10 9 5 4 1 } \-> L

  \<<

    { } SWAP

    1 13 FOR n

      L n GET

      IDIV2 UNROT + SWAP

    NEXT

    DROP

  \>>

  { "M" "CM" "D" "CD" "C" "XC" "L" "XL" "X" "IX" "V" "IV" "I" } SWAP

  2 \<< NDUPN DROP \>> DOLIST

  \GSLIST

\>>

However I can't test this at the moment.

Cheers
Thomas

Dave Britten · (This post was last modified: 06-13-2014 12:43 PM by Dave Britten.)

(06-13-2014 08:12 AM)Thomas Klemm Wrote:
(06-13-2014 01:01 AM)DavidM Wrote: I did finally find a way to shoehorn a DOLIST (and a sigmaLIST) into the mix.

Very nice! My only suggestion is to use a local variable L for the list. This makes the FOR-loop simpler:

Code:

\<< { 1000 900 500 400 100 90 50 40 10 9 5 4 1 } \-> L \<< { } SWAP 1 13 FOR n L n GET IDIV2 UNROT + SWAP NEXT DROP \>> { "M" "CM" "D" "CD" "C" "XC" "L" "XL" "X" "IX" "V" "IV" "I" } SWAP 2 \<< NDUPN DROP \>> DOLIST \GSLIST \>>

However I can't test this at the moment.

Cheers
Thomas

Very nice. IDIV2 isn't present on the 48, however. I think that was new on the 49. You could probably get away with some combination of DUP2/MOD/IP since this program would only need to deal with positive numbers.

EDIT: Simple numeric IDIV2 replacement for anybody that wants to run it on a 48.

Code:

\<< DUP2 / FLOOR SWAP OVER * ROT SWAP - \>>

DavidM · 06-13-2014, 05:15 PM

(06-13-2014 08:12 AM)Thomas Klemm Wrote: My only suggestion is to use a local variable L for the list. This makes the FOR-loop simpler...
However I can't test this at the moment.

The simplification of the "stackrobatics" (I love that word, Dave) made up for the performance penalty of the local var, and it brought the overall run time down slightly.

I was curious as to how these various algorithms performed, so I did some testing and came up with the following for cycle timings:

HP1 (HP67's original): 0.11592 sec
DB1: 0.07968 sec
TK1: 0.13363 sec
TK2: 5.02521 sec
DM1: 0.91066 sec
TK3: 0.89567 sec

These were all timed on a 50g, converting the range of numbers from 1000-1499. The one exception was TK2 - I ran out of patience and reduced it to 10 iterations just to get a reasonable estimate.

Dave's algorithm appears to be the fastest, with HP67's and Thomas' first attempts being not too far behind. The algorithms (mine and Thomas') that use list processing are, at best, an order of magnitude slower. At worst, well we might have to find a different animal...

While doing the timings, I also found a bug in my app. If the resulting string only has one character (as in "M" for 1000), \GSLIST will error out proclaiming an invalid dimension. So I had to insert an empty string into the list ("" +) just before \GSLIST was called to make sure it always had something to add.

- David

P.S. My first attempt at this app was actually a SysRPL version, which is more in my comfort zone. I'd be happy to share it if anyone's interested, but it seemed to be off-topic so I didn't want to introduce it to the mix.

**rprosperi** · 06-13-2014, 05:40 PM

(06-13-2014 05:15 PM)DavidM Wrote: My first attempt at this app was actually a SysRPL version, which is more in my comfort zone. I'd be happy to share it if anyone's interested, but it seemed to be off-topic so I didn't want to introduce it to the mix.

Please do share it. The folks that know SysRPL will enjoy it (and likely attempt to improve it) and the folks that don't will get a taste of what SysRPL is like, to compare. Also, please post the analogous performance numbers as an additional incentive for the non-SysRPL folks.

Interesting thread to watch the techniques evolve, my thanks to contributors. It seems that using DOLIST always has the exact same 2 effects; makes code much cleaner/shorter and arguably easier to follow, and always slows the run-times.