Sorting Strings
|
07-10-2017, 12:18 AM
Post: #1
|
|||
|
|||
Sorting Strings
Hello,
I'm starting to work on a project for the HP Prime calculator, and I quickly realized that it would be quite useful to have a "natural order" sort for lists of strings for this project, see http://www.davekoelle.com/alphanum.html. As far as I know and can tell, this is not currently implemented with the SORT command. Before I take a crack at implementing the algorithm, I wanted to check if anyone had made such an effort already? If not, it might be an interesting challenge to collaborate on with anyone who is interested. At first glance it looks like it could be somewhat involved because the PPL language doesn't have the wealth of string functionality built in that a language like Python does, for example. My thoughts so far: 1. Input would be a list of strings 2. Flag to sort by case-sensitive or not, for example A, a, B, b ... instead of A, B ... a, b ... 3. Return sorted list, or original list if less than 2 elements in list Any comments or interest? Jacob |
|||
07-10-2017, 04:58 AM
Post: #2
|
|||
|
|||
RE: Sorting Strings
Have you tried to use the SORT command with a list of strings? If that doesn't work, then you might convert each string to a list of characters, so you'll have a list of lists and use the SORT command with that.
Actually, I just tried it with: L1:={"HELLO","BAT","AXE","CAT","DOG"} SORT(L1) This gives me the result of: {"AXE","BAT","CAT","DOG","HELLO"} Sure looks like it works to me....??? |
|||
07-10-2017, 07:22 AM
Post: #3
|
|||
|
|||
RE: Sorting Strings
(07-10-2017 04:58 AM)webmasterpdx Wrote: Have you tried to use the SORT command with a list of strings? ... Ordinary "alphabetical" sorting is not what Jacob is asking for. He wants so-called "natural sorting" of numbers inside strings. Example: right now, SORT({"1", "2", "11", "22"}) returns {"1", "11", "2", "22"} which is correctly alphabetized but is NOT the correct numerical order which is what is being sought here, namely, {"1", "2", "11", "22"}. Jacob, it's unclear to me what kinds of inputs you want to handle. If the strings contain only digits, it's an easy task (just convert the whole list to numbers, sort them, then convert back to strings). Or do you want inputs like the example in your linked article, with digits mixed with non-digit characters? If the latter, will the digits have to be hunted for? Specific examples would help. <0|ɸ|0> -Joe- |
|||
07-10-2017, 05:02 PM
Post: #4
|
|||
|
|||
RE: Sorting Strings
A sample list:
{"TOP1", "TOP2", "TOP3", "TOP14", "TOP26", "TOE4", "TOE14", "TOE22", "BOT1", "bot2", "BOT230", "1", "2", "33", "999", "A1", "1A", "5000-B", "M234", "77", "B25", "111C", "1000", "A9", "A10", "A11", "100", "CP1", "CP2", "CP3", "CP4", "CP5"} Sorted result: {"1", "1A", "2", "33", "77", "100", "111C", "999", "1000", "5000-B", "A1", "A9", "A10", "A11", "B25", "BOT1", "bot2", "BOT230", "CP1", "CP2", "CP3", "CP4", "CP5", "M234", "TOE4", "TOE14", "TOE22", "TOP1", "TOP2", "TOP3", "TOP14", "TOP26"} Inputs could be a mix of numbers and other characters. The solution would have to break up the input strings into numerical and non-numerical chunks, then sort the numerical chunks by value, non-numerical chunks by alphabetical order. |
|||
07-11-2017, 05:03 AM
Post: #5
|
|||
|
|||
RE: Sorting Strings
Bonjour
J'ai personnellement déjà créer une fonction de tri de chaînes par la longueur, par exemple : {"an","ca","bal","sel"} Dans vôtre cas je pense qu'il faut utiliser un caractère inutilisable dans vos chaînes qui vient avant dans l'ordre ASCII et mettre à la même longueur les chaînes par la droite avec ce caractère. Ensuite Sort pour trier, puis supprimer les caractères inutilisables des chaînes. Une idée que je testerai quand j'en aurai le temps, si vous ne le faîtes pas. Espérant vous aider. Hello I personally already create a function of sorting strings by length, For example: {"an", "ca", "bal", "salt"} In your case I think you have to use an unusable character in your strings that comes Before in the ASCII order and set the strings to the same length by the right with this character. Then Sort to sort, and then remove the unusable characters from the strings. An idea that I will test when I have time, if you do not. Hoping to help you. Sorry for my english |
|||
07-11-2017, 06:06 PM
Post: #6
|
|||
|
|||
RE: Sorting Strings
Oubliez cela, mauvaise idée, désolé.
Forget that, bad idea, sorry. Sorry for my english |
|||
07-11-2017, 06:21 PM
Post: #7
|
|||
|
|||
RE: Sorting Strings
I am currently researching some methods already implemented for other languages to decide on a path that best fits PPL. There are numerous interesting discussions online on this topic, including https://blog.codinghorror.com/sorting-fo...ort-order/ I will try to get a functional alpha version done in the next few weeks, but it is the summer holidays months so it doesn't help my spare time right now. I think once a working version is complete, the process of optimizing could be the most interesting part.
|
|||
07-11-2017, 11:32 PM
(This post was last modified: 07-12-2017 12:13 AM by Didier Lachieze.)
Post: #8
|
|||
|
|||
RE: Sorting Strings
Here is a first version doing this natural string sorting. It is based on the CAS sort function with a custom string comparison function as described in English here, and with more details in French here.
Note: this version is limited to integer numbers without any period or comma which are considered as strings. For example "12.6" will be sorted before "12.49". Code: // Alphanum: sort a list of strings_containing With the following test program it generates the expected sorted list: Code: EXPORT TEST() Code: 10X Radonius |
|||
07-12-2017, 12:32 AM
(This post was last modified: 07-12-2017 12:32 AM by Han.)
Post: #9
|
|||
|
|||
RE: Sorting Strings
How should the natural sorting program sort: 1.2.3 versus 1.20.3 versus 1.02.3. In other words, when there are more than two decimal points, which "side" (left/right) has higher priority?
1.2.3 == 1.20.3 since .2 and .20 are the same 1.2.3 < 1.20.3 since 2 is less than 20 1.2.3 > 1.20.3 since .2 and .20 are equal, but .2 is shorter in characters Similar thoughts for comparing with 1.02.3 Graph 3D | QPI | SolveSys |
|||
07-12-2017, 05:10 AM
Post: #10
|
|||
|
|||
RE: Sorting Strings
@Didier Lachieze - It works as expected, amazing job! I just tried it quickly and the results are good. I will look at your solution in depth later this week.
@Han - How should the natural sorting program sort: 1.2.3 versus 1.20.3 versus 1.02.3. - For my purposes integer sorting is sufficient, however in this case i would say that: 1.2.3 < 1.20.3 since 2 is less than 20, general rule 1.02.3 and 1.2.3 would essentially be equal, in this case I would say 1.02.3 is before 1.2.3 because of the leading zero but it really is splitting hairs at that stage. |
|||
07-12-2017, 08:28 PM
Post: #11
|
|||
|
|||
RE: Sorting Strings
(07-11-2017 11:32 PM)Didier Lachieze Wrote: Here is a first version doing this natural string sorting. It is based on the CAS sort function with a custom string comparison function as described in English here, and with more details in French here. Chapeau Didier. Entre de bonnes mains cette HP PRIME est une machine remarquable ;D In English : waoo !! |
|||
07-14-2017, 07:02 AM
Post: #12
|
|||
|
|||
RE: Sorting Strings
I have reviewed Didier Lachieze's code more in depth and I am very impressed with it, I have not come up with any way to improve it yet, unfortunately. I say unfortunately because on a real calculator a speed test on a list of 1000 items ran at ~30 seconds. On a virtual calculator on my PC it is closer to ~1 second.
One thing I tried was to replace the CAS sort with a custom QuickSort algorithm that uses the Comp function from Didier. The result is essentially exactly the same as CAS sort for time when tested with the 1000 element list. Possible ways to improve speed: - Reduce the number of times the Chunk() function is called. Ideally just once for each element in the list, but overhead to maintain the indices of separate lists has looked to be too expensive. May return to this idea because I have a hunch that roughly half the time might be consumed by the repeated Chunk() calls at each iteration of the sort. - Other? |
|||
07-14-2017, 08:11 AM
(This post was last modified: 07-14-2017 08:24 AM by Didier Lachieze.)
Post: #13
|
|||
|
|||
RE: Sorting Strings
Well my first version was more a proof of concept and was not optimized.
As you mention, calling Chunk() in the comparison function is suboptimal as each string will be split multiple times, each time the sort algorithm will compare it to another string. So we need to move the "chunkification" out of the sort function, to do it just once. This can be done by replacing each string by a list composed of the string followed by the chunks, and applying the sort function to this list of lists which requires just minimal changes to Comp(). Then extracting the first element of each sublist will return the original strings sorted. Here is the second and faster version of Alphanum implementing this change : Code: // Alphanum: sort a list of strings_containing Another thing hitting the performance is that Comp() is written as a HOME program. So there is a penalty for the conversion from CAS to HOME to CAS environment every time sort() is calling Comp(). The solution is to rewrite Comp as a CAS function. But right now I have some troubles with the CAS "while" statement. I've converted all other instructions from HOME to CAS functions but "while" is not working, I'm getting an error message: "Error: Bad argument count" Code: // Compare two custom lists l1, l2 |
|||
07-15-2017, 03:00 AM
Post: #14
|
|||
|
|||
RE: Sorting Strings
Again brilliant, I am learning a lot from reviewing your code as I'm quite new to programming on the HP Prime. I have a lot of System RPL experience on the HP 50g, but of course this is completely different.
With your revisions, 1000 elements sort now in ~12 seconds compared to ~30 seconds, on a real HP Prime. That is a huge improvement and of course makes sense why it is so much faster. As for the CAS function, this is a completely new area for me and I'll be doing a lot of research trying to get more familiar with these two separate environments for programming. |
|||
« Next Oldest | Next Newest »
|
User(s) browsing this thread: 1 Guest(s)