HP Forums
Quicksort vs. Shell - Printable Version

+- HP Forums (https://www.hpmuseum.org/forum)
+-- Forum: Not HP Calculators (/forum-7.html)
+--- Forum: Not quite HP Calculators - but related (/forum-8.html)
+--- Thread: Quicksort vs. Shell (/thread-1643.html)

Pages: 1 2


Quicksort vs. Shell - Claudio L. - 06-16-2014 09:38 PM

Does anyone has experience with these two sort methods on calculators?
Quicksort is theoretically faster for large sets, but that wouldn't apply to calculators, where we are talking of sorting lists with a few objects.
I've seen some benchmarks where for less than 500 elements, Shell sort is actually faster (is it true in general?).
Shell sort seems more consistent for all cases, while quicksort is faster in average, but it hits bad cases quite often (but all this is general talk for large sets).
The use case is to sort lists in RPL. The algorithm itself would be sorting only the pointers to the objects, and generic objects will have to be compared with the RPL operator '>'. So this use case could be considered a fast-copy, slow-compare case.

I wonder if anyone knows of a good comparison for small sets of both algorithms. Most of the benchmarks found online are for large sets, where quicksort has the advantage, but it's outside of the normal calculator use.

Claudio


RE: Quicksort vs. Shell - Paul Dale - 06-16-2014 10:18 PM

There is always this sort.

- Pauli


RE: Quicksort vs. Shell - Brad Barton - 06-17-2014 02:51 AM

(06-16-2014 10:18 PM)Paul Dale Wrote:  There is always this sort.

Very nice. Does this also mean that the 43S is already created...and that the creators have no need to evolve the hardware or firmware? Should I order one, or just wait for it to flash into existence?


RE: Quicksort vs. Shell - Katie Wasserman - 06-17-2014 04:25 AM

There have been some discussions of sorting on calculators here in the past. I don't specifically recall a quicksort vs shell sort discussion, however. But I think that you're correct that quicksort is not a good choice for small data sets. For really small sets bubble or insertion sort is often a good choice.

Trying not to hijack your thread completely.... this reminded me of a column in Communications of the ACM recently about how to sort using only 2 LIFO stacks, a single register and no other memory. This was presented as a reader challenge here and answered in the following month's column.


RE: Quicksort vs. Shell - Paul Dale - 06-17-2014 05:22 AM

The 34S built in register sort command uses quick sort.
Comparisons are relatively very expensive so reducing the number of these is a win.


- Pauli


RE: Quicksort vs. Shell - walter b - 06-17-2014 06:24 AM

(06-17-2014 02:51 AM)Brad Barton Wrote:  Does this also mean that the 43S is already created...and that the creators have no need to evolve the hardware or firmware? Should I order one, or just wait for it to flash into existence?

Just wait until it drops into your lap (or lab?). At least that's what I do now. Sigh!

d:-/


RE: Quicksort vs. Shell - Werner - 06-17-2014 09:22 AM

In RPL, there's no beating (drumroll) .. binary insertion.
Since insertion is a stack roll (very fast) and binary insertion uses the minimum number of comparisons, there you are.
BTW this is what the 48/49/50 internal SORT command uses, as it is written in SysRPL, where the same conditions hold.

Cheers, Werner


RE: Quicksort vs. Shell - Claudio L. - 06-17-2014 03:03 PM

(06-17-2014 05:22 AM)Paul Dale Wrote:  The 34S built in register sort command uses quick sort.
Comparisons are relatively very expensive so reducing the number of these is a win.
- Pauli

How did you manage the recursion part?

My main concern on using quicksort (besides the fact that it may be slower for some cases) is that it's recursive. Our C stack will be tiny and limited, so a recursive algorithm might never be able to work on larger sets (larger within the context of the calculator, with let's say 1000 elements). If the list is already sorted, standard quicksort will recurse a worst-case of 999 times (less if it's the median-of-3 version), needing about 16kbytes of RAM in the stack (assuming there's no local variables at all, just the return address saved and 3 arguments per call).
Another option would be to do the recursion in RPL, where we have all RAM for the stack to grow, but we'll take a big hit in performance.
There's non-recursive versions of quicksort, but they use temporary memory to operate as a simulated stack, so I don't see any advantage, unless you have a hardware-limited return stack like in the Saturn.

I found some performance benchmarks here, which were done for strings. Strings have slow comparison, so it is similar to our use case.
The comparison is old, so it doesn't use the best sequence for Shell sort (there should be a slight improvement), and also it's a 5000 elements and 10000 elements test, which is a bit too many for our target.
But it shows some trends: Shell sort seems to be the fastest of the insertion methods (better than binary insertion, as suggested by Werner). Shell is also one of the simplest to implement, that's why I chose it for this comparison.
The only other real contenders are heap sort and quick sort. Heap sort is more complex than shell sort, and about the same performance, so it's not worth the effort.
That leaves us with Quicksort vs Shell, hence the title of this thread.


RE: Quicksort vs. Shell - Marcus von Cube - 06-17-2014 05:12 PM

(06-17-2014 03:03 PM)Claudio L. Wrote:  How did you manage the recursion part?

Our C stack is small (4K minus working memory) but the sort is limited to a maximum of 100 registers. If I'm not mistaken, the recursion cannot get deeper than ceil(log2(100))=7 levels.


RE: Quicksort vs. Shell - Claudio L. - 06-17-2014 06:07 PM

(06-17-2014 05:12 PM)Marcus von Cube Wrote:  
(06-17-2014 03:03 PM)Claudio L. Wrote:  How did you manage the recursion part?

Our C stack is small (4K minus working memory) but the sort is limited to a maximum of 100 registers. If I'm not mistaken, the recursion cannot get deeper than ceil(log2(100))=7 levels.

Ideally, if you partition exactly in half every time, you get log2(N), but you don't know where your pivot falls within the list. If you have a list that's already sorted, and pick the last element as a pivot, for example, you get the worst case, where after each recursion, your partitions are not half the size, but merely (n-1) elements on one side and zero in the other, with 'n'= size of the previous partition.
So log2(N) is a best-case scenario, N-1 is the worst-case, and in reality it's somewhere in between. Even with more advanced pivot picking techniques, there's always the chance that a wickedly-sorted list makes you hit the worst case.
If you sort 100 registers, and happen to hit the worst case, you'll need 99*sizeof(int)*(1+3) (assuming 1 for return address, and 3 arguments, but your implementation could have just 2 arguments), which is less than 1600 bytes. Your 4 kbytes of stack should be plenty.
I'm thinking of using 4 kbytes of stack as well, but we have no plans to put a limit on the list size, so the possibility of stack overrun is always there.

Claudio


RE: Quicksort vs. Shell - Werner - 06-17-2014 06:51 PM

(06-17-2014 06:07 PM)Claudio L. Wrote:  So log2(N) is a best-case scenario, N-1 is the worst-case, and in reality it's somewhere in between. Even with more advanced pivot picking techniques, there's always the chance that a wickedly-sorted list makes you hit the worst case.
If you sort 100 registers, and happen to hit the worst case, you'll need 99*sizeof(int)*(1+3) (assuming 1 for return address, and 3 arguments, but your implementation could have just 2 arguments), which is less than 1600 bytes. Your 4 kbytes of stack should be plenty.

Claudio

At every step you split the set in two halves, and if you sort the smaller half first you'll need at most log2(N) stack size.
Werner


RE: Quicksort vs. Shell - Werner - 06-17-2014 06:53 PM

(06-17-2014 03:03 PM)Claudio L. Wrote:  Shell sort seems to be the fastest of the insertion methods (better than binary insertion, as suggested by Walter). Shell is also one of the simplest to implement, that's why I chose it for this comparison.

You were specifically talking about RPL, and believe me, there's no beating simple binary insertion in RPL. Shell sort may not be hard to implement in regular languages, but in RPL it is at least an order of magnitude harder than implementing binary insertion, as is Quicksort.

Werner


RE: Quicksort vs. Shell - Claudio L. - 06-17-2014 08:32 PM

(06-17-2014 06:51 PM)Werner Wrote:  
(06-17-2014 06:07 PM)Claudio L. Wrote:  So log2(N) is a best-case scenario, N-1 is the worst-case, and in reality it's somewhere in between. Even with more advanced pivot picking techniques, there's always the chance that a wickedly-sorted list makes you hit the worst case.
If you sort 100 registers, and happen to hit the worst case, you'll need 99*sizeof(int)*(1+3) (assuming 1 for return address, and 3 arguments, but your implementation could have just 2 arguments), which is less than 1600 bytes. Your 4 kbytes of stack should be plenty.

Claudio

At every step you split the set in two halves, and if you sort the smaller half first you'll need at most log2(N) stack size.
Werner

Yes, I just read about that trick a few moments ago. It's also recommended not to recurse the second half, but do a tail call, so no stack is used. So it's not so bad after all, I can live with log2(N). I'll give it a shot.


RE: Quicksort vs. Shell - Claudio L. - 06-17-2014 09:15 PM

(06-17-2014 06:53 PM)Werner Wrote:  
(06-17-2014 03:03 PM)Claudio L. Wrote:  Shell sort seems to be the fastest of the insertion methods (better than binary insertion, as suggested by Walter). Shell is also one of the simplest to implement, that's why I chose it for this comparison.

You were specifically talking about RPL, and believe me, there's no beating simple binary insertion in RPL. Shell sort may not be hard to implement in regular languages, but in RPL it is at least an order of magnitude harder than implementing binary insertion, as is Quicksort.

Werner

Maybe I wasn't clear in my post. The purpose was to sort lists in RPL, I didn't say the algorithm was going to be written in RPL. The project I'm working on is precisely reimplementing all RPL commands in C.
Binary insertion is still a very good algorithm for small sets. Shell is just an improvement over the standard insertion to "extend its life" to larger sets.
Now you made me wonder: what would happen if we use binary insertion instead of plain insertion on each gap run? I can't find anybody investigating that, so it's probably a bad idea, but would be nice to know.

BTW: Sorry I renamed you Walter in one of my posts above, I fixed that.


RE: Quicksort vs. Shell - Paul Dale - 06-17-2014 09:45 PM

(06-17-2014 03:03 PM)Claudio L. Wrote:  How did you manage the recursion part?

It isn't. Recursion is the naïve implementation. The amount of state required at each level is two indexes. Put these in an array instead.


Quote:My main concern on using quicksort (besides the fact that it may be slower for some cases) is that it's recursive. Our C stack will be tiny and limited, ...

As Marcus notes, we've got 4kb for the stack and an amount of global state (but not all of it). This is limited but still vastly more space than we need here. Vastly as in many orders of magnitude larger than needed -- we could sort all digital memory on the planet with this space and still have headroom.


Quote:If the list is already sorted, standard quicksort will recurse a worst-case of 999 times (less if it's the median-of-3 version), needing about 16kbytes of RAM in the stack (assuming there's no local variables at all, just the return address saved and 3 arguments per call).

The unnecessary extra state is exactly why quicksort shouldn't actually be implemented recursively in low memory environments. I'd almost go as far as saying it shouldn't be recursive in any environment.

The important point you are missing is the worst case depth. As Marcus notes, this is indeed O(log n). Why? Remember that you have a choice as to which partition to sort first. Choose wisely and the worst case depth is guaranteed to be small.


Quote:Heap sort is more complex than shell sort, and about the same performance, so it's not worth the effort.

Unless you want a heap for something else. e.g. managing alarms is best done via a heap.


Quote:That leaves us with Quicksort vs Shell, hence the title of this thread.

Quicksort all the way Smile The sorting wars of the 60s have been won. Quicksort was the victor. Pretty much everything uses a flavour of quicksort these days.


Why the need for these questions? The 34S source code is available and a search for quicksort will find you the implementation.


- Pauli


RE: Quicksort vs. Shell - Paul Dale - 06-17-2014 09:50 PM

(06-17-2014 06:07 PM)Claudio L. Wrote:  I'm thinking of using 4 kbytes of stack as well, but we have no plans to put a limit on the list size, so the possibility of stack overrun is always there.

I would have a larger stack on the 34S if I could. For some of the statistical distributions and our matrix support, we hit the limit. If you're planning to implement these in RPL, don't worry but in C more space would be beneficial.

I guess it comes down to the RPL vs C trade off. The more you do in RPL the smaller the C stack can be. A pure RPL implementation could possibly get away with a hundred bytes or less of stack space. The more in C the more concerns you'll have over stack space. I've had to manually track stack use through a number of call chains on the 34S and that isn't fun.


- Pauli


RE: Quicksort vs. Shell - Paul Dale - 06-17-2014 10:02 PM

(06-17-2014 03:03 PM)Claudio L. Wrote:  Strings have slow comparison, so it is similar to our use case.

Strings are typically a lot faster than software floating point.


- Pauli


RE: Quicksort vs. Shell - Werner - 06-18-2014 05:10 AM

(06-17-2014 09:15 PM)Claudio L. Wrote:  Binary insertion is still a very good algorithm for small sets. Shell is just an improvement
Now you made me wonder: what would happen if we use binary insertion instead of plain insertion on each gap run? I can't find anybody investigating that, so it's probably a bad idea, but would be nice to know.

Straight insertion is best for small sets and almost ordered sets, and that's exactly what shell sort exploits: the gaps are large at the beginning, so you sort small subsets, and smaller at the end, but by then the sets are almost in order. Binary insertion will likely degrade shellsort's performance instead of improving it.

Werner


RE: Quicksort vs. Shell - David Hayden - 06-18-2014 12:27 PM

(06-16-2014 09:38 PM)Claudio L. Wrote:  Quicksort is theoretically faster for large sets, but that wouldn't apply to calculators, where we are talking of sorting lists with a few objects.
What are the requirements? Since we're talking about small sets, the speed difference may be negligible. In that case, code size may be more important.

Also, should we guarantee that the sort is stable?

Dave


RE: Quicksort vs. Shell - Claudio L. - 06-18-2014 04:29 PM

(06-18-2014 12:27 PM)David Hayden Wrote:  
(06-16-2014 09:38 PM)Claudio L. Wrote:  Quicksort is theoretically faster for large sets, but that wouldn't apply to calculators, where we are talking of sorting lists with a few objects.
What are the requirements? Since we're talking about small sets, the speed difference may be negligible. In that case, code size may be more important.

Also, should we guarantee that the sort is stable?

Dave

That's a good point. The speed difference between Shell and Quicksort seems to be around 30% to 80 % longer time for Shell, for lists up to 5000 elements (which is what I could find on the internet). However, for 20 to 50 elements we won't even be able to measure the difference.
As of now, we don't know how long it will take the 50g to sort a 1000 element list, considering the RPL overhead, so how do we know if the difference in speed is negligible?

I found this:
http://www.hpcalc.org/details.php?id=2828

It's a quicksort implementation written by Werner (is it the same Werner that prefers binary insertion? if so, what made you change your mind over the years?)
It's implemented in Saturn assembly, and somebody clocked 0.58 seconds for 1000 elements on a 50g. Considering a significant speedup due to ARM C, you might be right: we don't need to worry about speed that much.

Regarding stability... do we really need to? Doesn't matter for sorting numbers. It might matter if sorting some kind of custom list with data (same key but different content). On the other hand, we can make any method stable by introducing the original order as a secondary sorting index. Since we'll be sorting pointers to objects within the same list, the pointers themselves have the original order of the elements. If an element is equal to other, we can compare their pointers, and pick the lowest pointer as the lowest element. That won't add much overhead and needs no extra memory.

Claudio