Post Reply 
Optimized Stack Operations
02-26-2019, 10:40 AM
Post: #21
RE: Optimized Stack Operations
You are right: Every ROTup command can be substituted by a SWAP (which is slightly faster than a ROT), but I don't see any saving of stack operation steps.
Find all posts by this user
Quote this message in a reply
02-26-2019, 10:52 AM
Post: #22
RE: Optimized Stack Operations
(02-25-2019 06:35 AM)deetee Wrote:  every operation costs 2 bytes of flash memory

Do you have more than 256 operations defined?
That's a lot for a small memory device.

If there are more operations than this, they could be packed. Two twelve bit codes per three bytes e.g. This could save a fair bit of flash.


As for stack operations, it might be worth implementing a few extra ones natively. The shuffle on the WP 34S is very handy, it allows any combination of the first four stack levels to be rearranged, duplicated and dropped. I suspect you'll not have space for quite as rich a command, but something might be possible. Two bits could specify the top four levels as a PICK. Four bits could function as two picks from the top four levels.


Pauli
Find all posts by this user
Quote this message in a reply
02-26-2019, 11:21 AM
Post: #23
RE: Optimized Stack Operations
(02-26-2019 10:40 AM)deetee Wrote:  You are right: Every ROTup command can be substituted by a SWAP

Except for the last one. That needs to be a ROTup.

Quote:(which is slightly faster than a ROT),

That's why I asked.

Quote:but I don't see any saving of stack operation steps.

Not sure if understand that correctly, but I replaced CHS + by and INV * by /.
Therefore 2 steps are saved.

(02-26-2019 10:52 AM)Paul Dale Wrote:  As for stack operations, it might be worth implementing a few extra ones natively.

I'd vote for OVER. That's useful and expensive to implement only with ENTER, SWAP, ROTup and ROTdown.

Cheers
Thomas
Find all posts by this user
Quote this message in a reply
02-26-2019, 03:00 PM (This post was last modified: 02-27-2019 05:10 AM by deetee.)
Post: #24
RE: Optimized Stack Operations
@Thomas: Thanks - you are great!

@ Paul: Thanks for involving (I waited for your comment, because you helped me with my ScArY project very much) - and - as you are in a "Guru-League" - I need some time to understand each of your valuable words. Till now I was very engaged with CORDIC (very impressive).
Find all posts by this user
Quote this message in a reply
02-27-2019, 11:04 AM
Post: #25
RE: Optimized Stack Operations
I tried to test CORDIC and developed the following code for sin(x) ... calculates cos(x) too:

Code:

#define CORDICSTEPS 16 // (>8)
#define AIMAX 8
double const ai[AIMAX] = {
  0.7853982,
  0.4636476,
  0.2449787,
  0.1243550,
  0.0624188,
  0.0312398,
  0.0156237,
  0.0078123
};
static double _sin(double f) { // Calculate sin
  long i2 = 1;
  int8_t sign;
  double tani, a, angle = 0.0, x = 1.0, y = 0.0, xtmp;
  for (byte i = 0; i < CORDICSTEPS; i++) {
    tani = 1.0 / i2;
    a = i < AIMAX ? ai[i] : tani;
    sign = angle > f ? -1 : 1;
    angle += sign * a;
    xtmp = x - sign * tani * y; // Rotating vector
    y = sign * tani * x + y;
    x = xtmp;
    i2 *= 2;
  }
  return (y * 0.607253);
}

Unfortunately this (surely not optimized) code seems less precise (personal feeling; even if I extend the loop to 20) and needs more memory than my old Taylor routine for calculating exp, sin and asin.

What did I miss? ... to many doubles?
How can I calculate exp or asin?
Find all posts by this user
Quote this message in a reply
02-27-2019, 11:13 AM
Post: #26
RE: Optimized Stack Operations
You need to ditch the "double" format altogether and use your own internal format for numbers. As long as you use "double" or "float" anywhere, the stdlib math libraries will be linked to your project and weigh it down.
Find all posts by this user
Quote this message in a reply
03-03-2019, 08:25 AM
Post: #27
RE: Optimized Stack Operations
I like CORDIC - but I can't bring the horsepowers to the ground.

So far I avoided doubles and multiplications and used long variables (shifting values with 1E6):

Code:

#define CSTEPS 20 // Cordic steps
#define CMULT 1e6 // To convert long to double - 6 digit precision
long const ai[CSTEPS] = { // Cordic A-values (atan(2^-m)) in degrees - 6 digit precision
  45000000,
  26565051,
  14036243,
  7125016,
  3576334,
  1789911,
  895174,
  447614,
  223811,
  111906,
  55953,
  27976,
  13988,
  6994,
  3497,
  1749,
  874,
  437,
  219,
  109
};
static double _cordic(double f) { // Calculate sin
  byte m = 0;
  long a = ai[0];
  long z = f * CMULT;
  long c = 607253, s = 0;
  for (byte i = 1; i < CSTEPS; i++) {
    long ctmp = c;
    if (z >= 0) {
      c -= s >> m;
      s += ctmp >> m;
      z = z - a;
    }
    else {
      c += s >> m;
      s -= ctmp >> m;
      z +=  a;
    }
    a = ai[i];
    m++;
  }
  return (s / CMULT);
}

But finally my compiler claims almost 500 bytes. It seems to me that avoiding doubles and bit shifting (instead of divisions) even bloats the memsize.
And involving the intrinsic function for sin(x) needs approx. 250 bytes.
My combined function for _exp_sin_asin uses roughly 600 bytes.
???
Find all posts by this user
Quote this message in a reply
03-03-2019, 08:32 AM
Post: #28
RE: Optimized Stack Operations
The multiplication and division using CMULT will pull in the maths routines.

Better would be to decompose and recompose the double into its mantissa and exponent using a union or byte operations. A shift by 20 bits will give the six digits you're after.

Alternatively, store your "reals" as pairs of integers:

Code:
struct real_st {
    long int mantissa;
    short int exponent;
};

and use integer operations exclusively.


Pauli
Find all posts by this user
Quote this message in a reply
03-05-2019, 08:36 AM
Post: #29
RE: Optimized Stack Operations
Hi Pauli!

Thanks for your (always) inspiring words. To use an own format for floating figures sounds very challenging - even it is far beyond my programming skills.

So far it seems to me that using CORDIC needs in total more flash memory (at least for the existing program structure).

But do you mind to take a look at my source code? I guess you will find many places to optimize or have a lot of cool ideas.
Please have a look at: SCOTT

Thanks in advance
deetee
Find all posts by this user
Quote this message in a reply
03-06-2019, 08:50 PM
Post: #30
RE: Optimized Stack Operations
I've been working for years on a system RPL implementation in C++. After getting sick of "stackrobatics" I implemented these two generalized stack manipulation commands. You might want to do something similar in your implementation. These can be used in place of multiple stack manipulations. You could also implement many of the existing stack manipulation routines using these, at a cost in compute time but a possible savings in space.

// New comand SHUFFLE. This shuffles up to 8
// levels of the stack. For example # 2314 SHUFFLE does
// ob4 ob3 ob2 ob1 -> ob2 ob3 ob1 ob4
// In other words, each hex digit specifies the old stack level that
// should go in that position. The "old" level is the level before
// the command executes (but after popping the arg).
// If a hex digit is zero it means "no change in this position."
// This means # 132 SHUFFLE is the same as UNROT
//
// Larger digits will specify larger (original) stack levels, so you
// can actually access up to level 15. E.g., # FFFF SHUFFLE will
// copy level 15 to levels 1-4.

The next command is REARRANGE. It's like SHUFFLE, but lets you add or remove levels from the stack at the same time:

# delta # shuffleSpec -> new stack
// Okay, there are commands and commands that manipulate the stack. Here
// is an efficient way to generalize it. Delta is the number of items
// to add to (delta>0) or remove from (delta<0) the stack.
// ShuffleSpec is a number that says how to shuffle the stack, similar
// to the SHUFFLE command. Note that the values in the spec refer to
// levels BEFORE the delta is applied.

// Example: 2DUP is the same as 2 # 2121 REARRANGE
Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 1 Guest(s)