Post Reply 
newRPL: Adding string processing commands?
09-20-2016, 03:27 PM
Post: #1
newRPL: Adding string processing commands?
The latest HHC contest made me think that perhaps SUB and POS are insufficient to properly handle strings, and since newRPL provides the ability to read/write arbitrary text files to the SD card, it would make sense to provide a more powerful set of string processing commands (although the winner got an impressive solution using SUB and POS only, kudos to him).

What are text processing commands that you would like to see implemented in RPL?

Perhaps something to split text in tokens? a reverse POS?

Any ideas welcome.
Find all posts by this user
Quote this message in a reply
09-20-2016, 04:04 PM
Post: #2
RE: newRPL: Adding string processing commands?
The Prime commands ASC and CHAR are pretty handy:
- ASC(string) returns a list with the ASCII codes of the string characters
- CHAR(list) does the opposite

ASC("abc") -> {97,98,99}
CHAR({97,98,99}) -> "abc"
Find all posts by this user
Quote this message in a reply
09-20-2016, 05:08 PM
Post: #3
RE: newRPL: Adding string processing commands?
Linecount - Counts lines of text object
Wordcount - Counts words of text object ( separator as char )
ExtractWord - Extract the N-th word out of a string. ( separator as char )
WordPosition - Search position of Nth word in a string.
StrTrimSet - Remove any leading or trailing characters in a set from a string and returns the result

??? I must admit that I'm not fully aware of the capabilities of the uRPL set of commands and what can and can not be done to strings with them. Also some of these could be just a overloaded functionality of the list commands like POS, SIZE, GET, PUT

Above commands are mainly from FPC RTL http://www.freepascal.org/docs-html/rtl/...dex-5.html
Find all posts by this user
Quote this message in a reply
09-20-2016, 10:02 PM
Post: #4
RE: newRPL: Adding string processing commands?
Definitely SREV - reverse a string.
Convert string to list. In fact, I think it would be handy to have a function to convert any container sort of object into a list of the components. This would include arrays, matrices and, of course, composites.

Or maybe a more generalized version of DOLIST and DOSUB - one that would iterate over any type of container object.

Looking at the C++ string class, I see the following that might be handy and aren't(?) currently in RPL:
back() - return the last character in a string.
pop_back() - remove the last character
rfind() - find the last occurence of the arg
find_first_not_of() - find the first occurence of a char that is NOT in the arg.
find_last_not_of() - find the last occurence of a char that is not in the arg.
Find all posts by this user
Quote this message in a reply
09-21-2016, 12:27 AM
Post: #5
RE: newRPL: Adding string processing commands?
(09-20-2016 03:27 PM)Claudio L. Wrote:  The latest HHC contest made me think that perhaps SUB and POS are insufficient to properly handle strings ... a reverse POS?

Having just gone through the exercise of attempting to convert 3298's SysRPL code to UserRPL, I can definitely see the advantage of a reverse POS, and I think the SysRPL parameters for both POS$ and POSCHRREV equivalents would be good (namely, being able to choose the starting position for the search).
Find all posts by this user
Quote this message in a reply
09-21-2016, 02:47 AM
Post: #6
RE: newRPL: Adding string processing commands?
Lots of ideas!...

(09-20-2016 04:04 PM)Didier Lachieze Wrote:  The Prime commands ASC and CHAR are pretty handy:
- ASC(string) returns a list with the ASCII codes of the string characters
- CHAR(list) does the opposite

ASC("abc") -> {97,98,99}
CHAR({97,98,99}) -> "abc"

I like it, especially since strings in newRPL are UTF-8, these commands would convert Unicode codepoints into a string and vice versa, in other words encode/decode UTF-8.
ASC is perhaps not the most appropriate name since it's not ASCII anymore. Alternative name suggestions are welcome ("UTF→" and "→UTF", or "STR2LST" and "LST2STR").

"abc" UTF→ -> { 97 98 99 }
{ 97 98 99 } →UTF -> "abc"

What to do with composite Unicode characters? should they be split into various codepoints or perhaps a list of lists?

(09-20-2016 05:08 PM)Vtile Wrote:  Linecount - Counts lines of text object
Wordcount - Counts words of text object ( separator as char )
ExtractWord - Extract the N-th word out of a string. ( separator as char )
WordPosition - Search position of Nth word in a string.
StrTrimSet - Remove any leading or trailing characters in a set from a string and returns the result

??? I must admit that I'm not fully aware of the capabilities of the uRPL set of commands and what can and can not be done to strings with them. Also some of these could be just a overloaded functionality of the list commands like POS, SIZE, GET, PUT

Above commands are mainly from FPC RTL http://www.freepascal.org/docs-html/rtl/...dex-5.html

These are good too. In RPL slang it would be something like this (I'm renaming words into tokens to make it more generic, other name suggestions are welcome):

"STR" NLINES -> N (count of lines in a text)
"STR" N NTHLINE -> "LINE" (extract the nth line of text)
"STR" N NTHLINEPOS -> POS (position of the nth line within STR)

"STR" "SEP" NTOKENS -> N (count of tokens in "STR", separated by "SEP")
"STR" "SEP" N NTHTOKEN -> "TOKEN" (extract the nth token in STR)
"STR" "SEP" N NTHTOKENPOS -> POS (position of the nth token within the string)

Notice how the lines version is the same as tokens, just using newlines as the separator. I think they may not need to be included, just the TOKEN versions.

To trim a string:

"STR" "WHITES" TRIM -> "TRIMMED" (removes any charaters present in "WHITES" from the end of "STR")
"STR" "WHITES" RTRIM -> "TRIMMED" (same as TRIM, but removes at the beginning of the string)

(09-20-2016 10:02 PM)David Hayden Wrote:  Definitely SREV - reverse a string.
Convert string to list. In fact, I think it would be handy to have a function to convert any container sort of object into a list of the components. This would include arrays, matrices and, of course, composites.

Or maybe a more generalized version of DOLIST and DOSUB - one that would iterate over any type of container object.

Looking at the C++ string class, I see the following that might be handy and aren't(?) currently in RPL:
back() - return the last character in a string.
pop_back() - remove the last character
rfind() - find the last occurence of the arg
find_first_not_of() - find the first occurence of a char that is NOT in the arg.
find_last_not_of() - find the last occurence of a char that is not in the arg.

I see some good ones here too:

"STR" SREV -> "RTS" (reverse a string)
"STR" RHEAD -> "R" (last character, the name RHEAD is for consistent naming with HEAD/TAIL)
"STR" RTAIL -> "ST" (all but last character, reverse of TAIL)

The find_first_not_of() are the same as NTHTOKENPOS above if you request the 1st token and put all your forbidden characters as white spaces.


(09-21-2016 12:27 AM)DavidM Wrote:  
(09-20-2016 03:27 PM)Claudio L. Wrote:  The latest HHC contest made me think that perhaps SUB and POS are insufficient to properly handle strings ... a reverse POS?

Having just gone through the exercise of attempting to convert 3298's SysRPL code to UserRPL, I can definitely see the advantage of a reverse POS, and I think the SysRPL parameters for both POS$ and POSCHRREV equivalents would be good (namely, being able to choose the starting position for the search).

OK, here we go:

"STR" "SEARCH" RPOS -> pos (find the last occurrence of "SEARCH" within "STR", same as POS from the end)

"STR" "SEARCH" N NPOS -> pos (first occurrence of "SEARCH", but start from position N)
"STR" "SEARCH" N NRPOS -> pos (last occurrence of "SEARCH", but start from position N towards the first character)


(09-21-2016 01:42 AM)compsystems Wrote:  commands to send strings to a printer

That's I/O, not really string manipulation. In the future there will be a command to send strings over the serial port, and if I ever write an IRda driver perhaps infrared too.
Although to send things to a printer in the 21st century perhaps newRPL should be able to render text and graphics to a PDF file more than anything.



Finally, I need to add a couple of commands that are a necessary evil due to multibyte characters:

"STR" STRLEN -> N (get the length in Unicode characters, SIZE is in bytes)

There's also perhaps the need to have "byte" versions of all the commands, to treat strings as a stream of bytes rather than a Unicode string (???)
Find all posts by this user
Quote this message in a reply
09-21-2016, 11:36 AM
Post: #7
RE: newRPL: Adding string processing commands?
A pair of functions split & rsplit working a la Python.

I suppose that including support for regular expressions is an overkill...
Find all posts by this user
Quote this message in a reply
09-21-2016, 02:01 PM (This post was last modified: 09-21-2016 02:03 PM by Vtile.)
Post: #8
RE: newRPL: Adding string processing commands?
(09-21-2016 11:36 AM)emece67 Wrote:  A pair of functions split & rsplit working a la Python.

I suppose that including support for regular expressions is an overkill...

regular expressions?
Find / search is implemented in vanilla HP50g so I would assume that Claudio have planned something for it anyway?

I would personally propose "less is more" philosophy what comes to string manipulation and trying to make the commands more multiuse if possible. Only providing the basic building blocks which then allow building a libraries of broader string/text object utilities.
Find all posts by this user
Quote this message in a reply
09-21-2016, 04:32 PM
Post: #9
RE: newRPL: Adding string processing commands?
(09-21-2016 02:01 PM)Vtile Wrote:  
(09-21-2016 11:36 AM)emece67 Wrote:  A pair of functions split & rsplit working a la Python.

I suppose that including support for regular expressions is an overkill...

regular expressions?
Find / search is implemented in vanilla HP50g so I would assume that Claudio have planned something for it anyway?

I would personally propose "less is more" philosophy what comes to string manipulation and trying to make the commands more multiuse if possible. Only providing the basic building blocks which then allow building a libraries of broader string/text object utilities.

I'd say no regular expressions. They are almost impossible to understand by humans without a reference manual in hand, and would only make RPL even harder to read.

I just learned about split/rsplit. I think the proposed NTOKENS, NTHTOKEN and NTHTOKENPOS cover the same functionality without explicitly creating a list with all tokens. If such list was desired, it's not hard to implement in RPL:

"STR" "sep" SPLIT -> { "tok1" ... "tokN" }
can easily be implemented as:
Code:

<<
'sep' LSTO 'str' LSTO
 str sep NTOKENS 'ntok' LSTO
1 ntok FOR K
str sep K NTHTOKEN
NEXT
ntok ->LIST
>>

Search is already given with POS, and we are proposing to complement it with RPOS, NPOS and NRPOS per post above.

In general, I agree on a small subset of powerful commands rather than a global do-it-all engine like regexp.
Find all posts by this user
Quote this message in a reply
09-22-2016, 05:12 PM
Post: #10
RE: newRPL: Adding string processing commands?
"Hello Earthlings" =)

1: a function to covertir in algebraic expression

"A+B+12" -> 'A+B+12'

another en RPN

"A+B+12" -> "A B +12 +"

2: expand the function of the operator |

'X+3|(X=6)' EVAL 9

"X+Y"|("X"="W") -> "W+Y"

3: get with [] or ()

"X+Y"[1] -> "X"

"X+Y"[2] -> "+"

"X+Y"[3] -> "Y"
Find all posts by this user
Quote this message in a reply
09-22-2016, 06:28 PM
Post: #11
RE: newRPL: Adding string processing commands?
(09-22-2016 05:12 PM)compsystems Wrote:  "Hello Earthlings" =)

1: a function to covertir in algebraic expression

"A+B+12" -> 'A+B+12'
This is just a simple compilation ->OBJ, adding the single quotes is trivial

(09-22-2016 05:12 PM)compsystems Wrote:  another en RPN

"A+B+12" -> "A B +12 +"

What exactly would this achieve? if you are planning to evaluate the symbolic, there's the symbolic object, the string won't help much.


(09-22-2016 05:12 PM)compsystems Wrote:  2: expand the function of the operator |

'X+3|(X=6)' EVAL 9

"X+Y"|("X"="W") -> "W+Y"

3: get with [] or ()

"X+Y"[1] -> "X"

"X+Y"[2] -> "+"

"X+Y"[3] -> "Y"
When we get to symbolic manipulation, we'll talk about some commands to do some primitives like extracting identifiers and operators out of a symbolic. Symbolic "rules" is probably what you are looking for and it's half-implemented already, but disabled until I get back to symbolic manipulation (a lot of work to do before then) to finish it.

In any case, that's off-topic here as this is for string manipulation.
Find all posts by this user
Quote this message in a reply
08-31-2017, 01:29 AM
Post: #12
RE: newRPL: Adding string processing commands?
It looks like in the latest ROM (0.9-Alpha-Build 849) the following string commands are available:
STR->
->STR
OBJ->
SIZE
POS
REPL
SUB
HEAD
TAIL

But the following (in 50g stock ROM) are not yet there:
NUM
CHR

Are there others that I'm missing? Or is this still an active area of development?

-Steve
Find all posts by this user
Quote this message in a reply
08-31-2017, 03:15 AM (This post was last modified: 08-31-2017 03:22 AM by Claudio L..)
Post: #13
RE: newRPL: Adding string processing commands?
(08-31-2017 01:29 AM)smartin Wrote:  It looks like in the latest ROM (0.9-Alpha-Build 849) the following string commands are available:
STR->
->STR
OBJ->
SIZE
POS
REPL
SUB
HEAD
TAIL

But the following (in 50g stock ROM) are not yet there:
NUM
CHR

Are there others that I'm missing? Or is this still an active area of development?

-Steve

String management is complete. Those 2 commands will not be implemented, they are being replaced with →UTF8 and UTF8→. The main reason to rename them is to remind people that you are no longer reading a byte from the string, but decoding/encoding a Unicode code point.
*EDIT*: I just recalled that I renamed them also because CHR creates a DOCHAR object, which doesn't exist in newRPL.
Other commands are:
SREPL
→UTF8
UTF8→
SREV
NTOKENS
NTHTOKEN
NTHTOKENPOS
TRIM
RTRIM
STRLEN
STRLENCP
→NFC
POS
POSREV
NPOS
NPOSREV
Find all posts by this user
Quote this message in a reply
09-02-2017, 07:37 PM
Post: #14
RE: newRPL: Adding string processing commands?
(08-31-2017 03:15 AM)Claudio L. Wrote:  String management is complete. Those 2 commands will not be implemented, they are being replaced with →UTF8 and UTF8→. The main reason to rename them is to remind people that you are no longer reading a byte from the string, but decoding/encoding a Unicode code point.
*EDIT*: I just recalled that I renamed them also because CHR creates a DOCHAR object, which doesn't exist in newRPL.
Other commands are:
SREPL
→UTF8
UTF8→
SREV
NTOKENS
NTHTOKEN
NTHTOKENPOS
TRIM
RTRIM
STRLEN
STRLENCP
→NFC
POS
POSREV
NPOS
NPOSREV

I noticed that →UTF8 and its inverse behave opposite to what I'd expect based on →STR and its inverse. For example:
25.6 →STR yields "25.6"

so similarly I'd expect:
#61h UTF8→ to yield "a", but instead it throws an error, since it is currently works opposite of what I'd expect,
#61h →UTF8 yields "a"

I intuitively substitute the following when using the STR commands:
→STR, convert object to string
STR→, convert string to object
but this doesn't work for the UTF8 commands. I'd rather it be:
→UTF8, convert object to UTF8
UTF8→, convert UTF8 to object

-Steve
Find all posts by this user
Quote this message in a reply
09-02-2017, 07:44 PM
Post: #15
RE: newRPL: Adding string processing commands?
smartin I cannot follow you last request.

If I have an number in userRPL, and I want the string, I do \->STR . In newRPL I would do →UTF8 and indeed it works.

Wikis are great, Contribute :)
Find all posts by this user
Quote this message in a reply
09-02-2017, 08:16 PM
Post: #16
RE: newRPL: Adding string processing commands?
(09-02-2017 07:44 PM)pier4r Wrote:  smartin I cannot follow you last request.

If I have an number in userRPL, and I want the string, I do \->STR . In newRPL I would do →UTF8 and indeed it works.

My thinking is as follows:

#61h is the UTF8 encoding for the printable character "a" (compatible with the smaller subset of the older ASCII encoding, which is what we have on the 50g stock ROM). So if I want to convert TO the UTF8 encoding, wouldn't I want the command to be →UTF8? And if I wanted to convert to the equivalent character from the UTF8 encoding, UTF8→?
Find all posts by this user
Quote this message in a reply
09-05-2017, 08:51 PM
Post: #17
RE: newRPL: Adding string processing commands?
(09-02-2017 08:16 PM)smartin Wrote:  My thinking is as follows:

#61h is the UTF8 encoding for the printable character "a" (compatible with the smaller subset of the older ASCII encoding, which is what we have on the 50g stock ROM). So if I want to convert TO the UTF8 encoding, wouldn't I want the command to be →UTF8? And if I wanted to convert to the equivalent character from the UTF8 encoding, UTF8→?

Correct reasoning, minor conceptual error. UTF8 is the encoded string, while the numbers are Unicode Code Points, those are the characters which can be encoded in UTF8, UTF16, UTF32, etc. Therefore →UTF8 converts Code Points into UTF8 encoded strings, and vice versa.
#61h is the code point, the UTF8 encoded string is "a" (UTF8 encodes ASCII as plain ASCII, only complex characters like → have other weird encoding).
Find all posts by this user
Quote this message in a reply
09-06-2017, 12:48 AM
Post: #18
RE: newRPL: Adding string processing commands?
(09-05-2017 08:51 PM)Claudio L. Wrote:  
(09-02-2017 08:16 PM)smartin Wrote:  My thinking is as follows:

#61h is the UTF8 encoding for the printable character "a" (compatible with the smaller subset of the older ASCII encoding, which is what we have on the 50g stock ROM). So if I want to convert TO the UTF8 encoding, wouldn't I want the command to be →UTF8? And if I wanted to convert to the equivalent character from the UTF8 encoding, UTF8→?

Correct reasoning, minor conceptual error. UTF8 is the encoded string, while the numbers are Unicode Code Points, those are the characters which can be encoded in UTF8, UTF16, UTF32, etc. Therefore →UTF8 converts Code Points into UTF8 encoded strings, and vice versa.
#61h is the code point, the UTF8 encoded string is "a" (UTF8 encodes ASCII as plain ASCII, only complex characters like → have other weird encoding).

Is it obvious that I don't deal with UTF8 regularly?

Thanks for the clarification and explanation, it makes sense now.

Steve
Find all posts by this user
Quote this message in a reply
09-06-2017, 01:29 AM
Post: #19
RE: newRPL: Adding string processing commands?
Rather than regular expressions which are awful to read, what about a parsing expression grammar? I quite like the LPEG naming.

Pauli
Find all posts by this user
Quote this message in a reply
09-14-2017, 05:07 PM
Post: #20
RE: newRPL: Adding string processing commands?
The above-mentioned RHEAD and RTAIL would be great, for lists as well as strings.
Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 4 Guest(s)