newRPL - build 1255 released! [updated to 1299]
|
05-16-2019, 11:05 PM
Post: #430
|
|||
|
|||
RE: newRPL - build 1089 released! [update:build 1158]
(05-16-2019 06:05 PM)JoJo1973 Wrote: IIRC, Unicode tables which you derive data from have flags to classify the code points: you could limit valid characters for identifiers to 0-9, letter-like symbols and a few selected symbols - underscore, dot (except a single dot character) and the likes. Right now it only has a list of forbidden characters (which are operators and separators in symbolic expressions), but anything else is valid. Since there's thousands of valid characters and only a few forbidden, it made sense at the time. (05-16-2019 06:05 PM)JoJo1973 Wrote: Alternatively, you could limit the letter-like characters by blocks: Basic Latin, Greek, Cyrillic and Hebrew blocks have consecutive code points and cover 99.999% of practical uses. newRPL has only the minimum information needed to perform a NFC normalization of a string. Other than that, it does not really mess with Unicode, other than properly decoding/encoding UTF-8 strings. So the information to know if a character is a punctuation symbol or a letter isn't really there (probably wouldn't fit in the entire flash!). Separating by blocks would be OK if there wasn't a lot of punctuations and symbols included within the blocks. Having a lookup table for a few limited blocks with a bit indicating if it's a letter or not would be perhaps doable. I'll have to think about it some more. |
|||
« Next Oldest | Next Newest »
|
User(s) browsing this thread: 1 Guest(s)