Post Reply 
newRPL: Adding string processing commands?
09-21-2016, 02:47 AM
Post: #6
RE: newRPL: Adding string processing commands?
Lots of ideas!...

(09-20-2016 04:04 PM)Didier Lachieze Wrote:  The Prime commands ASC and CHAR are pretty handy:
- ASC(string) returns a list with the ASCII codes of the string characters
- CHAR(list) does the opposite

ASC("abc") -> {97,98,99}
CHAR({97,98,99}) -> "abc"

I like it, especially since strings in newRPL are UTF-8, these commands would convert Unicode codepoints into a string and vice versa, in other words encode/decode UTF-8.
ASC is perhaps not the most appropriate name since it's not ASCII anymore. Alternative name suggestions are welcome ("UTF→" and "→UTF", or "STR2LST" and "LST2STR").

"abc" UTF→ -> { 97 98 99 }
{ 97 98 99 } →UTF -> "abc"

What to do with composite Unicode characters? should they be split into various codepoints or perhaps a list of lists?

(09-20-2016 05:08 PM)Vtile Wrote:  Linecount - Counts lines of text object
Wordcount - Counts words of text object ( separator as char )
ExtractWord - Extract the N-th word out of a string. ( separator as char )
WordPosition - Search position of Nth word in a string.
StrTrimSet - Remove any leading or trailing characters in a set from a string and returns the result

??? I must admit that I'm not fully aware of the capabilities of the uRPL set of commands and what can and can not be done to strings with them. Also some of these could be just a overloaded functionality of the list commands like POS, SIZE, GET, PUT

Above commands are mainly from FPC RTL http://www.freepascal.org/docs-html/rtl/...dex-5.html

These are good too. In RPL slang it would be something like this (I'm renaming words into tokens to make it more generic, other name suggestions are welcome):

"STR" NLINES -> N (count of lines in a text)
"STR" N NTHLINE -> "LINE" (extract the nth line of text)
"STR" N NTHLINEPOS -> POS (position of the nth line within STR)

"STR" "SEP" NTOKENS -> N (count of tokens in "STR", separated by "SEP")
"STR" "SEP" N NTHTOKEN -> "TOKEN" (extract the nth token in STR)
"STR" "SEP" N NTHTOKENPOS -> POS (position of the nth token within the string)

Notice how the lines version is the same as tokens, just using newlines as the separator. I think they may not need to be included, just the TOKEN versions.

To trim a string:

"STR" "WHITES" TRIM -> "TRIMMED" (removes any charaters present in "WHITES" from the end of "STR")
"STR" "WHITES" RTRIM -> "TRIMMED" (same as TRIM, but removes at the beginning of the string)

(09-20-2016 10:02 PM)David Hayden Wrote:  Definitely SREV - reverse a string.
Convert string to list. In fact, I think it would be handy to have a function to convert any container sort of object into a list of the components. This would include arrays, matrices and, of course, composites.

Or maybe a more generalized version of DOLIST and DOSUB - one that would iterate over any type of container object.

Looking at the C++ string class, I see the following that might be handy and aren't(?) currently in RPL:
back() - return the last character in a string.
pop_back() - remove the last character
rfind() - find the last occurence of the arg
find_first_not_of() - find the first occurence of a char that is NOT in the arg.
find_last_not_of() - find the last occurence of a char that is not in the arg.

I see some good ones here too:

"STR" SREV -> "RTS" (reverse a string)
"STR" RHEAD -> "R" (last character, the name RHEAD is for consistent naming with HEAD/TAIL)
"STR" RTAIL -> "ST" (all but last character, reverse of TAIL)

The find_first_not_of() are the same as NTHTOKENPOS above if you request the 1st token and put all your forbidden characters as white spaces.


(09-21-2016 12:27 AM)DavidM Wrote:  
(09-20-2016 03:27 PM)Claudio L. Wrote:  The latest HHC contest made me think that perhaps SUB and POS are insufficient to properly handle strings ... a reverse POS?

Having just gone through the exercise of attempting to convert 3298's SysRPL code to UserRPL, I can definitely see the advantage of a reverse POS, and I think the SysRPL parameters for both POS$ and POSCHRREV equivalents would be good (namely, being able to choose the starting position for the search).

OK, here we go:

"STR" "SEARCH" RPOS -> pos (find the last occurrence of "SEARCH" within "STR", same as POS from the end)

"STR" "SEARCH" N NPOS -> pos (first occurrence of "SEARCH", but start from position N)
"STR" "SEARCH" N NRPOS -> pos (last occurrence of "SEARCH", but start from position N towards the first character)


(09-21-2016 01:42 AM)compsystems Wrote:  commands to send strings to a printer

That's I/O, not really string manipulation. In the future there will be a command to send strings over the serial port, and if I ever write an IRda driver perhaps infrared too.
Although to send things to a printer in the 21st century perhaps newRPL should be able to render text and graphics to a PDF file more than anything.



Finally, I need to add a couple of commands that are a necessary evil due to multibyte characters:

"STR" STRLEN -> N (get the length in Unicode characters, SIZE is in bytes)

There's also perhaps the need to have "byte" versions of all the commands, to treat strings as a stream of bytes rather than a Unicode string (???)
Find all posts by this user
Quote this message in a reply
Post Reply 


Messages In This Thread
RE: newRPL: Adding string processing commands? - Claudio L. - 09-21-2016 02:47 AM



User(s) browsing this thread: 2 Guest(s)