Post Reply 
ToUpper() ?
03-12-2015, 04:19 PM (This post was last modified: 03-12-2015 04:58 PM by Claudio L..)
Post: #19
RE: ToUpper() ?
(03-10-2015 06:28 AM)cyrille de brébisson Wrote:  Hello,

They are ~65536 chars in Prime, your solution would use 128KB of table...
Clearly the fastest solution, but not the most memory friendly one...

Cyrille

Here's my solution in C (somebody can translate to Prime? Han perhaps?). These tables were prepared by me based on the Unicode standard, with help from a public document about case folding. I did it a few years back, so they might have added more symbols/ranges afterwards.

Code:

static const struct {
   unsigned short start;
   unsigned short end;
   signed int diff;
} folding_table16[] = {
{0x0041,0x005A,32},
{0x00B5,0x00B5,775},
{0x00C0,0x00D6,32},
{0x00D8,0x00DE,32},
{0x0100,0x012E,1},
{0x0132,0x0136,1},
{0x0139,0x0147,1},
{0x014A,0x0176,1},
{0x0178,0x0178,-121},
{0x0179,0x017D,1},
{0x017F,0x017F,-268},
{0x0181,0x0181,210},
{0x0182,0x0184,1},
{0x0186,0x0186,206},
{0x0187,0x0187,1},
{0x0189,0x018A,205},
{0x018B,0x018B,1},
{0x018E,0x018E,79},
{0x018F,0x018F,202},
{0x0190,0x0190,203},
{0x0191,0x0191,1},
{0x0193,0x0193,205},
{0x0194,0x0194,207},
{0x0196,0x0196,211},
{0x0197,0x0197,209},
{0x0198,0x0198,1},
{0x019C,0x019C,211},
{0x019D,0x019D,213},
{0x019F,0x019F,214},
{0x01A0,0x01A4,1},
{0x01A6,0x01A6,218},
{0x01A7,0x01A7,1},
{0x01A9,0x01A9,218},
{0x01AC,0x01AC,1},
{0x01AE,0x01AE,218},
{0x01AF,0x01AF,1},
{0x01B1,0x01B2,217},
{0x01B3,0x01B5,1},
{0x01B7,0x01B7,219},
{0x01B8,0x01B8,1},
{0x01BC,0x01BC,1},
{0x01C4,0x01C4,2},
{0x01C5,0x01C5,1},
{0x01C7,0x01C7,2},
{0x01C8,0x01C8,1},
{0x01CA,0x01CA,2},
{0x01CB,0x01DB,1},
{0x01DE,0x01EE,1},
{0x01F1,0x01F1,2},
{0x01F2,0x01F4,1},
{0x01F6,0x01F6,-97},
{0x01F7,0x01F7,-56},
{0x01F8,0x021E,1},
{0x0220,0x0220,-130},
{0x0222,0x0232,1},
{0x023A,0x023A,10795},
{0x023B,0x023B,1},
{0x023D,0x023D,-163},
{0x023E,0x023E,10792},
{0x0241,0x0241,1},
{0x0243,0x0243,-195},
{0x0244,0x0244,69},
{0x0245,0x0245,71},
{0x0246,0x024E,1},
{0x0345,0x0345,116},
{0x0370,0x0372,1},
{0x0376,0x0376,1},
{0x0386,0x0386,38},
{0x0388,0x038A,37},
{0x038C,0x038C,64},
{0x038E,0x038F,63},
{0x0391,0x03A1,32},
{0x03A3,0x03AB,32},
{0x03C2,0x03C2,1},
{0x03CF,0x03CF,8},
{0x03D0,0x03D0,-30},
{0x03D1,0x03D1,-25},
{0x03D5,0x03D5,-15},
{0x03D6,0x03D6,-22},
{0x03D8,0x03EE,1},
{0x03F0,0x03F0,-54},
{0x03F1,0x03F1,-48},
{0x03F4,0x03F4,-60},
{0x03F5,0x03F5,-64},
{0x03F7,0x03F7,1},
{0x03F9,0x03F9,-7},
{0x03FA,0x03FA,1},
{0x03FD,0x03FF,-130},
{0x0400,0x040F,80},
{0x0410,0x042F,32},
{0x0460,0x0480,1},
{0x048A,0x04BE,1},
{0x04C0,0x04C0,15},
{0x04C1,0x04CD,1},
{0x04D0,0x0526,1},
{0x0531,0x0556,48},
{0x10A0,0x10C5,7264},
{0x10C7,0x10C7,7264},
{0x10CD,0x10CD,7264},
{0x1E00,0x1E94,1},
{0x1E9B,0x1E9B,-58},
{0x1E9E,0x1E9E,-7615},
{0x1EA0,0x1EFE,1},
{0x1F08,0x1F0F,-8},
{0x1F18,0x1F1D,-8},
{0x1F28,0x1F2F,-8},
{0x1F38,0x1F3F,-8},
{0x1F48,0x1F4D,-8},
{0x1F59,0x1F59,-8},
{0x1F5B,0x1F5B,-8},
{0x1F5D,0x1F5D,-8},
{0x1F5F,0x1F5F,-8},
{0x1F68,0x1F6F,-8},
{0x1F88,0x1F8F,-8},
{0x1F98,0x1F9F,-8},
{0x1FA8,0x1FAF,-8},
{0x1FB8,0x1FB9,-8},
{0x1FBA,0x1FBB,-74},
{0x1FBC,0x1FBC,-9},
{0x1FBE,0x1FBE,-7173},
{0x1FC8,0x1FCB,-86},
{0x1FCC,0x1FCC,-9},
{0x1FD8,0x1FD9,-8},
{0x1FDA,0x1FDB,-100},
{0x1FE8,0x1FE9,-8},
{0x1FEA,0x1FEB,-112},
{0x1FEC,0x1FEC,-7},
{0x1FF8,0x1FF9,-128},
{0x1FFA,0x1FFB,-126},
{0x1FFC,0x1FFC,-9},
{0x2126,0x2126,-7517},
{0x212A,0x212A,-8383},
{0x212B,0x212B,-8262},
{0x2132,0x2132,28},
{0x2160,0x216F,16},
{0x2183,0x2183,1},
{0x24B6,0x24CF,26},
{0x2C00,0x2C2E,48},
{0x2C60,0x2C60,1},
{0x2C62,0x2C62,-10743},
{0x2C63,0x2C63,-3814},
{0x2C64,0x2C64,-10727},
{0x2C67,0x2C6B,1},
{0x2C6D,0x2C6D,-10780},
{0x2C6E,0x2C6E,-10749},
{0x2C6F,0x2C6F,-10783},
{0x2C70,0x2C70,-10782},
{0x2C72,0x2C72,1},
{0x2C75,0x2C75,1},
{0x2C7E,0x2C7F,-10815},
{0x2C80,0x2CE2,1},
{0x2CEB,0x2CED,1},
{0x2CF2,0x2CF2,1},
{0xA640,0xA66C,1},
{0xA680,0xA696,1},
{0xA722,0xA72E,1},
{0xA732,0xA76E,1},
{0xA779,0xA77B,1},
{0xA77D,0xA77D,-35332},
{0xA77E,0xA786,1},
{0xA78B,0xA78B,1},
{0xA78D,0xA78D,-42280},
{0xA790,0xA792,1},
{0xA7A0,0xA7A8,1},
{0xA7AA,0xA7AA,-42308},
{0xFF21,0xFF3A,32},
{0,0,0}
};
static const struct {
   int start;
   int end;
   signed int diff;
} folding_table32[] = {
{0x10400,0x10427,40},
{0,0,0}
};


Each table entry has the start and end codes of a range, and an offset value. If your character is within that range (both ends included), you need to add the offset to obtain the equivalent lowercase character (this table is for case-folding = tolower()).
To be used for toupper(), you should simply add the offset to the start and end values to get the lowercase range, check if your character is within, and subtract the offset instead of adding it.
EDIT: Forgot to mention, when the offset is 1, the uppercase and lowercase symbols are alternated. See the example code below.

There's actually 2 tables. One for 16-bit unicode values and the second one for 32-bit unicode characters (there's only one range defined in 32-bits that can be folded).

Code:

uint32_t casefold(uint32_t character)
{
// THESE TABLES ARE FOR PROPER UNICODE CASE-INSENSITIVE COMPARISON
// TABLES ADD ABOUT 1400 BYTES TO LIBRARY
#include "folding_table.h"
    int idx;

    if(character<0x10000) {
        for(idx=0;folding_table16[idx].start!=0;++idx)
        {
            if(character<folding_table16[idx].start) return character;
            if(character<=folding_table16[idx].end) {
                if(folding_table16[idx].diff==1) {
                    if( (character-folding_table16[idx].start)&1) return character;
                }
                return character+folding_table16[idx].diff;
            }
    }
        return character;
    }
    for(idx=0;folding_table32[idx].start!=0;++idx)
    {
        if(character<folding_table32[idx].start) return character;
        if(character<=folding_table32[idx].end) {
            if(folding_table32[idx].diff==1) {
                if( (character-folding_table32[idx].start)&1) return character;
            }
            return character+folding_table32[idx].diff;
        }
    }

    return character;
}

I'll leave it for the Prime gurus to make a proper Unicode compliant ToUpper() and ToLower(). In C the tables are only 1400 bytes, not sure how much space you need on the Prime. It's not the fastest, but it's the most embeddable method I could find. Feel free to use it.

Claudio
Find all posts by this user
Quote this message in a reply
Post Reply 


Messages In This Thread
ToUpper() ? - Angus - 03-09-2015, 01:02 PM
RE: ToUpper() ? - Thomas_Sch - 03-09-2015, 06:41 PM
RE: ToUpper() ? - PANAMATIK - 03-09-2015, 07:50 PM
RE: ToUpper() ? - Mark Hardman - 03-09-2015, 08:25 PM
RE: ToUpper() ? - jebem - 03-09-2015, 08:45 PM
RE: ToUpper() ? - Angus - 03-10-2015, 06:27 AM
RE: ToUpper() ? - cyrille de brébisson - 03-10-2015, 06:28 AM
RE: ToUpper() ? - Claudio L. - 03-12-2015 04:19 PM
RE: ToUpper() ? - Didier Lachieze - 03-10-2015, 07:51 AM
RE: ToUpper() ? - Thomas_Sch - 03-10-2015, 08:10 AM
RE: ToUpper() ? - Angus - 03-10-2015, 08:10 AM
RE: ToUpper() ? - Didier Lachieze - 03-10-2015, 08:34 AM
RE: ToUpper() ? - Angus - 03-10-2015, 08:26 AM
RE: ToUpper() ? - Angus - 03-10-2015, 09:16 AM
RE: ToUpper() ? - Didier Lachieze - 03-10-2015, 12:12 PM
RE: ToUpper() ? - Tim Wessman - 03-10-2015, 12:55 PM
RE: ToUpper() ? - BruceH - 03-11-2015, 11:44 PM
RE: ToUpper() ? - bobkrohn - 03-12-2015, 06:25 AM
RE: ToUpper() ? - cyrille de brébisson - 03-12-2015, 03:49 PM
RE: ToUpper() ? - bobkrohn - 03-12-2015, 04:44 PM
RE: ToUpper() ? - Didier Lachieze - 03-12-2015, 04:53 PM
RE: ToUpper() ? - bobkrohn - 03-12-2015, 08:05 PM
RE: ToUpper() ? - DrD - 03-12-2015, 08:44 PM
RE: ToUpper() ? - cyrille de brébisson - 03-13-2015, 06:18 AM
RE: ToUpper() ? - bobkrohn - 03-13-2015, 11:28 PM
RE: ToUpper() ? - DrD - 03-14-2015, 04:45 AM
RE: ToUpper() ? - cyrille de brébisson - 03-17-2015, 07:11 AM



User(s) browsing this thread: 1 Guest(s)