Post Reply 
asc("≤") -> {32,8804,32} ?
07-23-2020, 07:08 PM
Post: #12
RE: asc("≤") -> {32,8804,32} ?
(07-17-2020 08:27 AM)ijabbott Wrote:  Preferable (IMHO):

asc("πi") -> {960, 105} # UCS sequence U+03C0, U+0069
char({960, 105}) -> "πi"
ord("πi") -> 960 # UCS U+03C0
char(960) -> "π"
It is somewhat better, I implemented that in newRPL but...
Unfortunately, doesn't solve the whole issue, since some characters are composed of more than one code point, so a string with 2 symbols may return a list with 3 codes, and list(2) doesn't necessarily have the second character. This right there renders the list of codes format pretty much useless as a way of accessing string characters.
To make it even more complex, in some cases the same symbol may be represented as a unique code or as a sequence of codes, so 2 strings may look exactly the same but produce 2 different lists of codes.
Ideally you would have Unicode-aware routines that let you do string(2) and guarantee to return the second character (which may in itself be a string of several codes).
Then you also need a Unicode-aware comparison that can do NFC normalization so it can detect the case when characters are the same expressed differently.
Converting to a list to use generic list functions will never give you a perfect answer, hence there's not a lot of effort put into that conversion.
ASCII was easy, Unicode is no easy subject.
Find all posts by this user
Quote this message in a reply
Post Reply 


Messages In This Thread
asc("≤") -> {32,8804,32} ? - compsystems - 07-14-2020, 01:56 AM
RE: asc("≤") -> {32,8804,32} ? - parisse - 07-14-2020, 05:59 PM
RE: asc("≤") -> {32,8804,32} ? - parisse - 07-17-2020, 08:05 PM
RE: asc("≤") -> {32,8804,32} ? - DrD - 07-14-2020, 06:56 PM
RE: asc("≤") -> {32,8804,32} ? - parisse - 07-17-2020, 05:01 AM
RE: asc("≤") -> {32,8804,32} ? - parisse - 07-17-2020, 08:09 PM
RE: asc("≤") -> {32,8804,32} ? - Claudio L. - 07-23-2020 07:08 PM



User(s) browsing this thread: 1 Guest(s)