Post Reply 
Unicode char comparison puzzle
05-30-2018, 12:50 PM
Post: #1
Unicode char comparison puzzle
I'd like to print out all the juicy non '⛾' unicode characters between #2600h and #26FFh as per http://www.unicode-symbol.com/block/Misc_Symbols.html

Code:

EXPORT UNIFUN()
BEGIN
LOCAL cp, S:="";
PRINT();
FOR cp FROM #2600h TO #26FFh DO
  IF CHAR(cp) ≠ "⛾" THEN  // doesn't work 
    S := S + CHAR(cp);
  END;
END;
PRINT(S);
END;

The output on the prime prints most characters as '⛾', so I tried to filter out those chars out with the conditional expression
Code:
IF CHAR(cp) ≠ "⛾" THEN
but this doesn't work, as the '⛾' seems to refer to a specific unicode character/code point rather than to any unicode characters that look like '⛾'. Question 1: how do I effectively filter out the chars that look like '⛾' ?

When pasting the terminal output of the prime code above, from the prime emulator on my iPad to this thread, the output was surprising. More of the characters are rendered and there are many fewer '⛾' characters.

☀☁☂☃☄★☆☇☈☉☊☋☌☍☎☏☐☑☒☓☔☕☖☗☘☙☚☛☜☝☞☟☠☡☢☣☤☥☦☧☨☩☪☫☬☭☮☯☰☱☲☳☴☵☶☷☸☹☺☻☼☽☾☿♀♁♂♃♄♅♆♇♈♉♊♋♌♍♎♏​♐♑♒♓♔♕♖♗♘♙♚♛♜♝♞♟♠♡♢♣♤♥♦♧♨♩♪♫♬♭♮♯♰♱♲♳♴♵♶♷♸♹♺♻♼♽♾♿⚀⚁⚂⚃⚄⚅⚆⚇⚈⚉⚊⚋⚌⚍⚎⚏⚐⚑⚒⚓⚔⚕⚖⚗⚘⚙⚚⚛⚜⚝⚞⚟​⚠⚡⚢⚣⚤⚥⚦⚧⚨⚩⚪⚫⚬⚭⚮⚯⚰⚱⚲⚳⚴⚵⚶⚷⚸⚹⚺⚻⚼⚽⚾⚿⛀⛁⛂⛃⛄⛅⛆⛇⛈⛉⛊⛋⛌⛍⛎⛏⛐⛑⛒⛓⛔⛕⛖⛗⛘⛙⛚⛛⛜⛝⛞⛟⛠⛡⛢⛣⛤⛥⛦⛧⛨⛩⛪⛫⛬⛭⛮⛯​⛰⛱⛲⛳⛴⛵⛶⛷⛸⛹⛺⛻⛼⛽⛾⛿

Thus it looks like the Prime cannot visually represent / print out all unicode characters, only a limited subset of them, Question 2: does anyone know the limits to the Prime's unicode support?
Find all posts by this user
Quote this message in a reply
05-30-2018, 01:40 PM
Post: #2
RE: Unicode char comparison puzzle
(05-30-2018 12:50 PM)tcab Wrote:  Thus it looks like the Prime cannot visually represent / print out all unicode characters, only a limited subset of them

Devices don't decide which Unicode symbols they can print, that's what fonts are for. Different fonts have glyphs for different character sets. I don't know of any font that has "all" Unicode characters, some are more complete than others but they are all mere fractions depending on usage.
Back to your problem: Code-wise, the Unicode characters don't change just because the font doesn't have a glyph to display it, therefore there's no way for you to filter the text, unless you take the font in which you want to display the text, make yourself a catalog of which characters have a glyph and which ones will display the default one, and use that table to do the filtering. Basically you need one table per font, or a program to "extract" that table from any font, and your filter code using those tables.
Find all posts by this user
Quote this message in a reply
05-30-2018, 03:34 PM
Post: #3
RE: Unicode char comparison puzzle
The font used by the Prime is called Prime Sans, and it's a modified version of Google's free Droid Sans font. The ttf file is in the Fonts folder of the HP Prime Virtual Calculator installation, so you can look at that with your favorite font analysis tool to see what characters it supports. A tool I use shows 51280 different characters, but it doesn't seem to have any of the emoji characters.
Visit this user's website Find all posts by this user
Quote this message in a reply
05-31-2018, 01:24 AM
Post: #4
RE: Unicode char comparison puzzle
Any chance the font set of the physical Prime can be altered, or the set updated via a loadable code page? I haven't seen anything in the documentation for that, so I presume it can't be done. Poking around in the bin file would probably come with the usual Void The Warranty admonition. I'd hate to do my own font creation and placement via graphics commands just to get a few unsupported characters!

~Mark

Remember kids, "In a democracy, you get the government you deserve."
Find all posts by this user
Quote this message in a reply
05-31-2018, 01:34 AM
Post: #5
RE: Unicode char comparison puzzle
Thanks for the responses.

So the problem is that I want to filter out all the unicode chars/codepoints that cannot be rendered. I just want to see all the "juicy" characters that the Prime is capable of rendering, and none of the noise viz. I dont want:
[Image: unicode_fun2_noisy.png?raw=1]
I want:
[Image: unicode_fun2.png?raw=1]

I stumbled upon this post about comparing the bitmaps of characters, and managed to write the Prime code to do it. My new character comparison function renders the character as a bitmap, which I then convert into a list of 1' s and 0's and can use this information to compare characters, as rendered.

Code:

EXPORT bit_signature_for_char(c)
BEGIN
  local result={}, x, y, pix;
  TEXTOUT_P(c,G1,0,0,2,#000000,100,#FFFFFF);
  for y from 0 to 5 do  // should probably compare more rows of pixels
    for x from 0 to 100 do
      if GETPIX_P(G1,x, y) = #FFFFFF then
        pix := 0;
      else
        pix := 1;
      end;
      result := concat(result, pix);
    end;
  end;
  return result;
END;

EXPORT same_char(c1, c2)
BEGIN
  return string(bit_signature_for_char(c1)) = string(bit_signature_for_char(c2));
END;

EXPORT UNIFUN()
BEGIN
LOCAL cp, S:="";
PRINT();
FOR cp FROM #2600h TO #26FFh DO
  //IF CHAR(cp) ≠ "⛾" THEN  // doesn't work 
  IF NOT same_char(CHAR(cp), CHAR(#2600)) THEN  // works :-)
    S := S + CHAR(cp);
  END;
END;
PRINT(S);
END;

Running across a longer range of chars takes time, but gives:
[Image: unicode_fun1.png?raw=1]

A couple of programming issues I ran into:
  • I used the offscreen bitmap G1 but for some reason couldn't use G2 to do the textout renderings of bitmaps?
  • I couldn't find a function to compare lists, so had to convert the lists to strings in order to compare them.
Find all posts by this user
Quote this message in a reply
05-31-2018, 04:22 AM
Post: #6
RE: Unicode char comparison puzzle
(05-31-2018 01:34 AM)tcab Wrote:  
  • I couldn't find a function to compare lists, so had to convert the lists to strings in order to compare them.

EQ() will do it. You'll find it in [Toolbox] > Math > List
Find all posts by this user
Quote this message in a reply
05-31-2018, 04:29 AM
Post: #7
RE: Unicode char comparison puzzle
(05-31-2018 01:34 AM)tcab Wrote:  A couple of programming issues I ran into:
  • I used the offscreen bitmap G1 but for some reason couldn't use G2 to do the textout renderings of bitmaps?
  • I couldn't find a function to compare lists, so had to convert the lists to strings in order to compare them.

Bonjour

Pour utiliser les variables graphiques G1...G9 vous devez les dimensionnées avant avec DIMGROB.
Comme je ne vois pas cette instruction dans vôtre code : G1 était sans doute déja dimensionnée par un autre programme exécuté avant et pas G2.

Pour comparer 2 listes il faut utiliser EQ(liste1,liste2).

Si cela peut vous aider.


Hello

To use the graphical variables G1 ... G9 you have to dimension them before with DIMGROB.
As I do not see this instruction in your code: G1 was probably already sized by another program executed before and not G2.

To compare 2 lists, use EQ (list1, list2).

If that can help you.

Sorry for my english
Find all posts by this user
Quote this message in a reply
05-31-2018, 01:23 PM
Post: #8
RE: Unicode char comparison puzzle
Thanks for the tips re comparing lists with EQ and DIMGROB usage. I've incorporated those changes, though then decided to render onto the main screen G0 to serve as a visual progress indicator, together with a percentage done.

Code:

// Unicode dump of unique chars
// Version 2, Andy Bulka (tcab) 2018

LOCAL BLACK:=#000000, WHITE:=#FFFFFF, YELLOW:=#FFFF00;
LOCAL CHAR_WIDTH:=100, CHAR_HEIGHT:=5;

bit_signature_for_char(c)
BEGIN
  local result={}, x, y, pix;
  TEXTOUT_P(c,G0,0,0,2,#000000,100,#FFFFFF);
  for y from 0 to CHAR_HEIGHT do  // should probably compare more rows of pixels
    for x from 0 to CHAR_WIDTH do
      result := concat(result, GETPIX_P(G0,x, y));
    end;
  end;
  return result;
END;

same_char(c1, c2)
BEGIN
  return EQ(bit_signature_for_char(c1),
            bit_signature_for_char(c2));
END;

disp_current(cp,perc)
BEGIN
  RECT_P(G0,20,0,95,12,YELLOW);
  TEXTOUT_P(cp,G0,20,0,2,BLACK,200,WHITE); // disp candidate codepoint number
  TEXTOUT_P(perc+"%",G0,70,0,2,BLACK,200,YELLOW); // disp percent done
END;

EXPORT UNIFUN()
BEGIN
LOCAL cp, S:="", start:=#2600h, finish:=#26FFh, perc;
PRINT();
FOR cp FROM start TO finish DO
  perc:=IP((cp-start)/(finish-start)*100);
  disp_current(cp,perc);
  //IF CHAR(cp) ≠ "⛾" THEN  // doesn't work 
  IF NOT same_char(CHAR(cp), CHAR(#2600)) THEN  // works :-)
    S := S + CHAR(cp);
  END;
END;
PRINT(S);
END;

More programming questions:
  • Is there a way to TEXTOUT an integer in hex?
  • Do we know the exact height and width of characters in pixels? (Both in font size 1 & 2)
Find all posts by this user
Quote this message in a reply
06-01-2018, 04:30 AM
Post: #9
RE: Unicode char comparison puzzle
(05-31-2018 01:23 PM)tcab Wrote:  [*]Is there a way to TEXTOUT an integer in hex?
[*]Do we know the exact height and width of characters in pixels? (Both in font size 1 & 2)
[/list]

Bonjour
Vous pouvez écrire var:=TEXTOUT_P(.......
var contiendra la coordonée x du dernier pixel utiliser pour l'affichage du message
ou de la valeur.


Hello
You can write var:=TEXTOUT_P (.......
var will contain the x coordinate of the last pixel used for displaying the message
or value.

Sorry for my english
Find all posts by this user
Quote this message in a reply
06-01-2018, 04:53 AM
Post: #10
RE: Unicode char comparison puzzle
Hello,

TEXTOUT_P(R->B(value), 0,0) will dispay hex (ih hex is the current mode).
TEXTOUT_P returns the x position of the NEXT pixel where to write the next character...

so, textout_p("part2", textout_p("part1", 0, 0), 0)
will write part1part2 on the screen

cyrille

Although I work for the HP calculator group, the views and opinions I post here are my own. I do not speak for HP.
Find all posts by this user
Quote this message in a reply
06-01-2018, 07:48 AM
Post: #11
RE: Unicode char comparison puzzle
Thanks guys - very helpful.
Am now working on an optimised algorithm that:
  • Caches the 'signature' of the glyph that represents a boring/non juicy character
  • Scans only a portion of the pixels of a rendered character
as it turns out that I get just as accurate results scanning only a few pixel rows/cols of a character, and it speeds up scanning enormously. It will be interesting to discover how low I can go and still get accurate results!

I measured the width of a character font size 2 using the return value of TEXTOUT_P as being 12 pixels. Is there any similar technique to determine a character's height?
Find all posts by this user
Quote this message in a reply
06-01-2018, 07:56 AM
Post: #12
RE: Unicode char comparison puzzle
Not really. Remember that characters can be different widths/heights and even flow into neighboring characters in some cases. It is a very different thing then just plain "bitmap" fonts.

What is actually being returned as the "width" is the stroke advancement.

TW

Although I work for HP, the views and opinions I post here are my own.
Find all posts by this user
Quote this message in a reply
06-01-2018, 08:03 AM (This post was last modified: 06-01-2018 08:05 AM by Tim Wessman.)
Post: #13
RE: Unicode char comparison puzzle
Also, note that we use 16bit per character encoding and currently have no plans to support any characters higher then that. So you can stop at #FFFE...

I'm a bit curious what is wrong or missing from the existing character browser? What are you trying to accomplish that isn't done there?

I'm the one primarily responsible for the unicode stuff in general, so I'm the one that may be able to provide more details in specific areas.

TW

Although I work for HP, the views and opinions I post here are my own.
Find all posts by this user
Quote this message in a reply
06-01-2018, 08:29 AM (This post was last modified: 06-01-2018 08:32 AM by Martin Hepperle.)
Post: #14
RE: Unicode char comparison puzzle
Some statistics:

PrimeSansMono.ttf ("Prime Sans Mono") has 613 glyphs
PrimeSansBold.ttf ("Prime Sans Bold") has 606 glyphs
PrimeSansFull.ttf ("Prime Sans") has 51279 glyphs
Printing out a character table of the latter is left as an exercise to the reader ;-)

And a small Java program (sorry no Prime here) to create a list of the code points:
Code:
   static void checkChars () throws FontFormatException, IOException
   {
      Font theFont = Font.createFont(Font.TRUETYPE_FONT,
            new File("d:/tmp/PrimeSansMono.ttf"));
      // or, if font is installed:
      // Font theFont = Font.decode("Prime Sans Mono");

      int nGlyphs = theFont.getNumGlyphs();

      System.out.println(theFont.getFontName() + " has " + nGlyphs
            + " glyphs and can display:");

      System.out.println();
      System.out.println("// "+theFont.getFontName() + " has " + nGlyphs
            + " glyphs:");
         System.out.println("glyphs := {");
         int i = 0;
         for ( int codePoint = 0; codePoint < Character.MAX_CODE_POINT; codePoint++ )
         {
            if ( theFont.canDisplay(codePoint) )
            {
               System.out.print( (codePoint));

               if ( i < nGlyphs-2 )
                  System.out.print(",");

               if ( (++i % 8) == 0 )
               {
                  // 8 characters per line
                  System.out.println();
               }
               else
               {
                  System.out.print(" ");
               }
            }
         }
         System.out.println("};");
   }
Find all posts by this user
Quote this message in a reply
06-01-2018, 10:35 AM
Post: #15
RE: Unicode char comparison puzzle
Ok here is the new faster version - to view different ranges of characters just edit the code.

Code:

// Unicode dump of unique chars
// Version 3, Andy Bulka (tcab) 2018

LOCAL boring_glyph_sig:={};
LOCAL BLACK:=#000000, WHITE:=#FFFFFF, YELLOW:=#FFFF00;
//LOCAL CHAR_WIDTH:=12, CHAR_HEIGHT:=5;  // char width seems to be about 12 but TW says it varies
LOCAL CHAR_WIDTH:=3, CHAR_HEIGHT:=2;  // cheaper comparison, don't scan all pixels
 
bit_signature_for_char(c)
BEGIN
  local result={}, x, y, pix,width;
  TEXTOUT_P(c,G0,0,0,2,#000000,100,#FFFFFF);
  for y from 0 to CHAR_HEIGHT do
    for x from 0 to CHAR_WIDTH do
      result := concat(result, GETPIX_P(G0,x, y));
    end;
  end;
  return result;
END;

same_char(c1, c2)
BEGIN
  return EQ(bit_signature_for_char(c1),
                 bit_signature_for_char(c2));
END;

can_render(c1)
BEGIN
  return NOT EQ(bit_signature_for_char(c1),
                           boring_glyph_sig);
END;

disp_progress(cp,perc)
BEGIN
  RECT_P(G0,20,0,95,12,YELLOW);
  TEXTOUT_P(R→B(cp),G0,20,0,2,BLACK,200,WHITE); // disp candidate codepoint number
  TEXTOUT_P(perc+"%",G0,70,0,2,BLACK,200,YELLOW); // disp percent done
END;

EXPORT UNIFUN()
BEGIN
  //LOCAL cp, S:="", start:=#2600h, finish:=#260Ah, perc;  // a short range of chars
  LOCAL cp, S:="", start:=#2600h, finish:=#26FFh, perc;    // larger range of chars
  //LOCAL cp, S:="", start:=#2500h, finish:=#28FFh, perc;    // even larger range of chars
  PRINT();
  boring_glyph_sig:=bit_signature_for_char(CHAR(#2600));
  FOR cp FROM start TO finish DO
    perc:=IP((cp-start)/(finish-start)*100);
    disp_progress(cp,perc);
    //IF NOT same_char(CHAR(cp), CHAR(#2600)) THEN  // slow
    IF can_render(CHAR(cp)) THEN  // faster
      S := S + CHAR(cp);
    END;
  END;
  PRINT(S);
END;

The progress indicator now displays in hex, too.

Quote:I'm a bit curious what is wrong or missing from the existing character browser? What are you trying to accomplish that isn't done there?

The Prime character browser is great - nothing missing there. I just happened to be looping through some characters one day on the Prime and wondered how I could filter out dud characters, and, well, one thing led to another. :-) Just learning... about unicode, pixels, coordinate systems, grobs, hex format, comparing lists, TEXTOUT, character widths & optimisation. Love it - the Prime sure is a powerful calculator - a well thought out design appropriate to the times we live in (touchscreen, unicode, grobs, algebraic, structured programming, rechargeable, free connectivity kit/editor/emulator, iPhone versions etc.).

Andy
Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 11 Guest(s)