Post Reply 
Sorting Strings
07-10-2017, 12:18 AM
Post: #1
Sorting Strings
Hello,

I'm starting to work on a project for the HP Prime calculator, and I quickly realized that it would be quite useful to have a "natural order" sort for lists of strings for this project, see http://www.davekoelle.com/alphanum.html. As far as I know and can tell, this is not currently implemented with the SORT command.

Before I take a crack at implementing the algorithm, I wanted to check if anyone had made such an effort already? If not, it might be an interesting challenge to collaborate on with anyone who is interested. At first glance it looks like it could be somewhat involved because the PPL language doesn't have the wealth of string functionality built in that a language like Python does, for example.

My thoughts so far:
1. Input would be a list of strings
2. Flag to sort by case-sensitive or not, for example A, a, B, b ... instead of A, B ... a, b ...
3. Return sorted list, or original list if less than 2 elements in list

Any comments or interest?

Jacob
Visit this user's website Find all posts by this user
Quote this message in a reply
07-10-2017, 04:58 AM
Post: #2
RE: Sorting Strings
Have you tried to use the SORT command with a list of strings? If that doesn't work, then you might convert each string to a list of characters, so you'll have a list of lists and use the SORT command with that.

Actually, I just tried it with:
L1:={"HELLO","BAT","AXE","CAT","DOG"}
SORT(L1)

This gives me the result of:
{"AXE","BAT","CAT","DOG","HELLO"}

Sure looks like it works to me....???
Find all posts by this user
Quote this message in a reply
07-10-2017, 07:22 AM
Post: #3
RE: Sorting Strings
(07-10-2017 04:58 AM)webmasterpdx Wrote:  Have you tried to use the SORT command with a list of strings? ...
Sure looks like it works to me....???

Ordinary "alphabetical" sorting is not what Jacob is asking for. He wants so-called "natural sorting" of numbers inside strings. Example: right now, SORT({"1", "2", "11", "22"}) returns {"1", "11", "2", "22"} which is correctly alphabetized but is NOT the correct numerical order which is what is being sought here, namely, {"1", "2", "11", "22"}.

Jacob, it's unclear to me what kinds of inputs you want to handle. If the strings contain only digits, it's an easy task (just convert the whole list to numbers, sort them, then convert back to strings). Or do you want inputs like the example in your linked article, with digits mixed with non-digit characters? If the latter, will the digits have to be hunted for? Specific examples would help.

<0|ɸ|0>
-Joe-
Visit this user's website Find all posts by this user
Quote this message in a reply
07-10-2017, 05:02 PM
Post: #4
RE: Sorting Strings
A sample list:
{"TOP1", "TOP2", "TOP3", "TOP14", "TOP26", "TOE4", "TOE14", "TOE22", "BOT1", "bot2", "BOT230", "1", "2", "33", "999", "A1", "1A", "5000-B", "M234", "77", "B25", "111C", "1000", "A9", "A10", "A11", "100", "CP1", "CP2", "CP3", "CP4", "CP5"}

Sorted result:
{"1", "1A", "2", "33", "77", "100", "111C", "999", "1000", "5000-B", "A1", "A9", "A10", "A11", "B25", "BOT1", "bot2", "BOT230", "CP1", "CP2", "CP3", "CP4", "CP5", "M234", "TOE4", "TOE14", "TOE22", "TOP1", "TOP2", "TOP3", "TOP14", "TOP26"}

Inputs could be a mix of numbers and other characters. The solution would have to break up the input strings into numerical and non-numerical chunks, then sort the numerical chunks by value, non-numerical chunks by alphabetical order.
Visit this user's website Find all posts by this user
Quote this message in a reply
07-11-2017, 05:03 AM
Post: #5
RE: Sorting Strings
Bonjour

J'ai personnellement déjà créer une fonction de tri de chaînes par la longueur,
par exemple : {"an","ca","bal","sel"}
Dans vôtre cas je pense qu'il faut utiliser un caractère inutilisable dans vos chaînes qui vient
avant dans l'ordre ASCII et mettre à la même longueur les chaînes par la droite avec ce caractère.
Ensuite Sort pour trier, puis supprimer les caractères inutilisables des chaînes.

Une idée que je testerai quand j'en aurai le temps, si vous ne le faîtes pas.
Espérant vous aider.

Hello

I personally already create a function of sorting strings by length,
For example: {"an", "ca", "bal", "salt"}
In your case I think you have to use an unusable character in your strings that comes
Before in the ASCII order and set the strings to the same length by the right with this character.
Then Sort to sort, and then remove the unusable characters from the strings.

An idea that I will test when I have time, if you do not.
Hoping to help you.

Sorry for my english
Find all posts by this user
Quote this message in a reply
07-11-2017, 06:06 PM
Post: #6
RE: Sorting Strings
Oubliez cela, mauvaise idée, désolé.

Forget that, bad idea, sorry.

Sorry for my english
Find all posts by this user
Quote this message in a reply
07-11-2017, 06:21 PM
Post: #7
RE: Sorting Strings
I am currently researching some methods already implemented for other languages to decide on a path that best fits PPL. There are numerous interesting discussions online on this topic, including https://blog.codinghorror.com/sorting-fo...ort-order/ I will try to get a functional alpha version done in the next few weeks, but it is the summer holidays months so it doesn't help my spare time right now. I think once a working version is complete, the process of optimizing could be the most interesting part.
Visit this user's website Find all posts by this user
Quote this message in a reply
07-11-2017, 11:32 PM (This post was last modified: 07-12-2017 12:13 AM by Didier Lachieze.)
Post: #8
RE: Sorting Strings
Here is a first version doing this natural string sorting. It is based on the CAS sort function with a custom string comparison function as described in English here, and with more details in French here.

Note: this version is limited to integer numbers without any period or comma which are considered as strings. For example "12.6" will be sorted before "12.49".

Code:
// Alphanum: sort a list of strings_containing 
// a mix of letters and numbers. Given strings 
// of mixed characters and numbers, it sorts 
// the numbers in value order, while sorting 
// the non-numbers in ASCII order. 
// The end result is a natural sorting order.

Chunk();
Comp();

EXPORT Alphanum(List)
BEGIN
  LOCAL s:=SIZE(List);
  IF s==1 OR ΣLIST(MAKELIST(TYPE(List(I))≠2,I,1,s)) THEN 
    RETURN List; 
  ELSE
    RETURN CAS("sort(List,(x,y)→Comp(x,y))");
  END;
END;

// Compare two strings
// returns 0 if s1≥s2
// returns 1 if s1<s2

Comp(s1,s2)
BEGIN
  LOCAL l1,l2,s,j;
  l1:=Chunk(s1);
  l2:=Chunk(s2);
  s:=MIN(SIZE(l1),SIZE(l2));
  j:=1;
  WHILE IFTE(TYPE(l1(j))==TYPE(l2(j)),l1(j)==l2(j),0) AND j<s DO
    j:=j+1;
  END;
  IF TYPE(l1(j))==0 AND TYPE(l2(j))==2 THEN
    RETURN 1;
  END;
  IF TYPE(l1(j))==2 AND TYPE(l2(j))==0 THEN
    RETURN 0;
  END;
  IF l1(j)>l2(j) THEN
    RETURN 0;
  ELSE
    IF l1(j)==l2(j) AND j==s AND s<SIZE(l1) THEN
      RETURN 0;
    END;
    RETURN 1;
  END;
END;

// Breaks strings into chunks, where a chunk contains 
// either all alphabetic characters, or all numeric characters.

Chunk(s)
BEGIN
  LOCAL n:=DIM(s),j,t,m;
  t:=48≤s(1)≤57;
  IF n>1 THEN
    FOR j FROM 2 TO n DO
      IF (48≤s(j)≤57)≠t THEN 
        m:=IFTE(t,EXPR(LEFT(s,j-1)),LEFT(s,j-1));
        RETURN CONCAT(m,Chunk(RIGHT(s,n-j+1))); 
      END;
    END;
  END;
  RETURN IFTE(t,{EXPR(s)},{s});
END;

With the following test program it generates the expected sorted list:
Code:
EXPORT TEST()
BEGIN
LOCAL l1,l2,j;

l1:={"1000X Radonius Maximus","10X Radonius",
"200X Radonius","20X Radonius","20X Radonius Prime",
"30X Radonius","40X Radonius","Allegia 50 Clasteron",
"Allegia 500 Clasteron","Allegia 51 Clasteron",
"Allegia 51B Clasteron","Allegia 52 Clasteron",
"Allegia 60 Clasteron","Alpha 100","Alpha 2",
"Alpha 200","Alpha 2A","Alpha 2A-8000","Alpha 2A-900",
"Callisto Morphamax","Callisto Morphamax 500",
"Callisto Morphamax 5000","Callisto Morphamax 600",
"Callisto Morphamax 700","Callisto Morphamax 7000",
"Callisto Morphamax 7000 SE","Callisto Morphamax 7000 SE2",
"QRS-60 Intrinsia Machine","QRS-60F Intrinsia Machine",
"QRS-62 Intrinsia Machine","QRS-62F Intrinsia Machine",
"Xiph Xlater 10000","Xiph Xlater 2000","Xiph Xlater 300",
"Xiph Xlater 40","Xiph Xlater 5","Xiph Xlater 50",
"Xiph Xlater 500","Xiph Xlater 5000","Xiph Xlater 58",
"Allegia 6R Clasteron"};

l2:=Alphanum(l1);
PRINT();
FOR j FROM 1 TO  SIZE(l2) DO
  PRINT(l2(j));
END;
END;

Code:
10X Radonius
20X Radonius
20X Radonius Prime
30X Radonius
40X Radonius
200X Radonius
1000X Radonius Maximus
Allegia 6R Clasteron
Allegia 50 Clasteron
Allegia 51 Clasteron
Allegia 51B Clasteron
Allegia 52 Clasteron
Allegia 60 Clasteron
Allegia 500 Clasteron
Alpha 2
Alpha 2A
Alpha 2A-900
Alpha 2A-8000
Alpha 100
Alpha 200
Callisto Morphamax
Callisto Morphamax 500
Callisto Morphamax 600
Callisto Morphamax 700
Callisto Morphamax 5000
Callisto Morphamax 7000 SE
Callisto Morphamax 7000 SE2
Callisto Morphamax 7000
QRS-60 Intrinsia Machine
QRS-60F Intrinsia Machine
QRS-62 Intrinsia Machine
QRS-62F Intrinsia Machine
Xiph Xlater 5
Xiph Xlater 40
Xiph Xlater 50
Xiph Xlater 58
Xiph Xlater 300
Xiph Xlater 500
Xiph Xlater 2000
Xiph Xlater 5000
Xiph Xlater 10000
Find all posts by this user
Quote this message in a reply
07-12-2017, 12:32 AM (This post was last modified: 07-12-2017 12:32 AM by Han.)
Post: #9
RE: Sorting Strings
How should the natural sorting program sort: 1.2.3 versus 1.20.3 versus 1.02.3. In other words, when there are more than two decimal points, which "side" (left/right) has higher priority?

1.2.3 == 1.20.3 since .2 and .20 are the same
1.2.3 < 1.20.3 since 2 is less than 20
1.2.3 > 1.20.3 since .2 and .20 are equal, but .2 is shorter in characters

Similar thoughts for comparing with 1.02.3

Graph 3D | QPI | SolveSys
Find all posts by this user
Quote this message in a reply
07-12-2017, 05:10 AM
Post: #10
RE: Sorting Strings
@Didier Lachieze - It works as expected, amazing job! I just tried it quickly and the results are good. I will look at your solution in depth later this week.

@Han - How should the natural sorting program sort: 1.2.3 versus 1.20.3 versus 1.02.3. - For my purposes integer sorting is sufficient, however in this case i would say that:
1.2.3 < 1.20.3 since 2 is less than 20, general rule

1.02.3 and 1.2.3 would essentially be equal, in this case I would say 1.02.3 is before 1.2.3 because of the leading zero but it really is splitting hairs at that stage.
Visit this user's website Find all posts by this user
Quote this message in a reply
07-12-2017, 08:28 PM
Post: #11
RE: Sorting Strings
(07-11-2017 11:32 PM)Didier Lachieze Wrote:  Here is a first version doing this natural string sorting. It is based on the CAS sort function with a custom string comparison function as described in English here, and with more details in French here.

Chapeau Didier. Entre de bonnes mains cette HP PRIME est une machine remarquable ;D

In English : waoo !!
Find all posts by this user
Quote this message in a reply
07-14-2017, 07:02 AM
Post: #12
RE: Sorting Strings
I have reviewed Didier Lachieze's code more in depth and I am very impressed with it, I have not come up with any way to improve it yet, unfortunately. I say unfortunately because on a real calculator a speed test on a list of 1000 items ran at ~30 seconds. On a virtual calculator on my PC it is closer to ~1 second.

One thing I tried was to replace the CAS sort with a custom QuickSort algorithm that uses the Comp function from Didier. The result is essentially exactly the same as CAS sort for time when tested with the 1000 element list.

Possible ways to improve speed:
- Reduce the number of times the Chunk() function is called. Ideally just once for each element in the list, but overhead to maintain the indices of separate lists has looked to be too expensive. May return to this idea because I have a hunch that roughly half the time might be consumed by the repeated Chunk() calls at each iteration of the sort.
- Other?
Visit this user's website Find all posts by this user
Quote this message in a reply
07-14-2017, 08:11 AM (This post was last modified: 07-14-2017 08:24 AM by Didier Lachieze.)
Post: #13
RE: Sorting Strings
Well my first version was more a proof of concept and was not optimized.

As you mention, calling Chunk() in the comparison function is suboptimal as each string will be split multiple times, each time the sort algorithm will compare it to another string. So we need to move the "chunkification" out of the sort function, to do it just once. This can be done by replacing each string by a list composed of the string followed by the chunks, and applying the sort function to this list of lists which requires just minimal changes to Comp(). Then extracting the first element of each sublist will return the original strings sorted.

Here is the second and faster version of Alphanum implementing this change :

Code:
// Alphanum: sort a list of strings_containing 
// a mix of letters and numbers. Given strings 
// of mixed characters and numbers, it sorts 
// the numbers in value order, while sorting 
// the non-numbers in ASCII order. 
// The end result is a natural sorting order.

Chunk();
Comp();

EXPORT Alphanum(List)
BEGIN
  LOCAL s:=SIZE(List),l;
  IF s==1  OR ΣLIST(MAKELIST(TYPE(List(I))≠2,I,1,s)) THEN 
    RETURN List; 
  ELSE
    l:=MAKELIST(CONCAT(List(I),Chunk(List(I))),I,1,s);
    l:=CAS("sort(l,(x,y)→Comp(x,y))");
    RETURN MAKELIST(l(I,1),I,1,s);
  END;
END;

// Compare two custom lists l1, l2
// each list is composed of a string followed by it's chunks 
// returns 0 if l1>l2
// returns 1 if l1≤l2

Comp(l1,l2)
BEGIN
  LOCAL s,j;
  s:=MIN(SIZE(l1),SIZE(l2));
  j:=2;
  WHILE IFTE(TYPE(l1(j))==TYPE(l2(j)),l1(j)==l2(j),0) AND j<s DO
    j:=j+1;
  END;
  IF TYPE(l1(j))==0 AND TYPE(l2(j))==2 THEN
    RETURN 1;
  END;
  IF TYPE(l1(j))==2 AND TYPE(l2(j))==0 THEN
    RETURN 0;
  END;
  IF l1(j)>l2(j) THEN
    RETURN 0;
  ELSE
    IF l1(j)==l2(j) AND j==s AND s<SIZE(l1) THEN
      RETURN 0;
    END;
    RETURN 1;
  END;
END;

// Breaks strings into chunks, where a chunk contains 
// either all alphabetic characters, or all numeric characters.

Chunk(s)
BEGIN
  LOCAL n:=DIM(s),j,t,m;
  t:=48≤s(1)≤57;
  IF n>1 THEN
   FOR j FROM 2 TO n DO
    IF (48≤s(j)≤57)≠t THEN 
      m:=IFTE(t,EXPR(LEFT(s,j-1)),LEFT(s,j-1));
      RETURN CONCAT(m,Chunk(RIGHT(s,n-j+1))); 
    END;
   END;
  END;
  RETURN IFTE(t,{EXPR(s)},{s});
END;

Another thing hitting the performance is that Comp() is written as a HOME program. So there is a penalty for the conversion from CAS to HOME to CAS environment every time sort() is calling Comp(). The solution is to rewrite Comp as a CAS function. But right now I have some troubles with the CAS "while" statement. I've converted all other instructions from HOME to CAS functions but "while" is not working, I'm getting an error message: "Error: Bad argument count"

Code:
// Compare two custom lists l1, l2
// each list is composed of a string followed by it's chunks 
// returns 0 if l1>l2
// returns 1 if l1≤l2
#cas
Comp(l1,l2):=
BEGIN
  local s,j;
  s:=min(size(l1),size(l2));
  j:=2;
  while l1(j)==l2(j) and j<s do
    j:=j+1;
  end;
  if type(l1(j))==DOM_INT and type(l2(j))==DOM_STRING then
    return 1;
  end;
  if type(l1(j))==DOM_STRING and type(l2(j))==DOM_INT then
    return 0;
  end;
  if l1(j)>l2(j) then
    return 0;
  else
    if l1(j)==l2(j) and j==s and s<size(l1) then
      return 0;
    end;
    return 1;
  end;
end;
#end
Find all posts by this user
Quote this message in a reply
07-15-2017, 03:00 AM
Post: #14
RE: Sorting Strings
Again brilliant, I am learning a lot from reviewing your code as I'm quite new to programming on the HP Prime. I have a lot of System RPL experience on the HP 50g, but of course this is completely different.

With your revisions, 1000 elements sort now in ~12 seconds compared to ~30 seconds, on a real HP Prime. That is a huge improvement and of course makes sense why it is so much faster.

As for the CAS function, this is a completely new area for me and I'll be doing a lot of research trying to get more familiar with these two separate environments for programming.
Visit this user's website Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 1 Guest(s)