Threaded Mode | Linear Mode

tcab · 05-30-2018, 12:50 PM

I'd like to print out all the juicy non '⛾' unicode characters between #2600h and #26FFh as per http://www.unicode-symbol.com/block/Misc_Symbols.html

Code:

EXPORT UNIFUN()

BEGIN

LOCAL cp, S:="";

PRINT();

FOR cp FROM #2600h TO #26FFh DO

  IF CHAR(cp) ≠ "⛾" THEN  // doesn't work 

    S := S + CHAR(cp);

  END;

END;

PRINT(S);

END;

The output on the prime prints most characters as '⛾', so I tried to filter out those chars out with the conditional expression

Code:

IF CHAR(cp) ≠ "⛾" THEN

but this doesn't work, as the '⛾' seems to refer to a specific unicode character/code point rather than to any unicode characters that look like '⛾'. Question 1: how do I effectively filter out the chars that look like '⛾' ?

When pasting the terminal output of the prime code above, from the prime emulator on my iPad to this thread, the output was surprising. More of the characters are rendered and there are many fewer '⛾' characters.

☀☁☂☃☄★☆☇☈☉☊☋☌☍☎☏☐☑☒☓☔☕☖☗☘☙☚☛☜☝☞☟☠☡☢☣☤☥☦☧☨☩☪☫☬☭☮☯☰☱☲☳☴☵☶☷☸☹☺☻☼☽☾☿♀♁♂♃♄♅♆♇♈♉♊♋♌♍♎♏♐♑♒♓♔♕♖♗♘♙♚♛♜♝♞♟♠♡♢♣♤♥♦♧♨♩♪♫♬♭♮♯♰♱♲♳♴♵♶♷♸♹♺♻♼♽♾♿⚀⚁⚂⚃⚄⚅⚆⚇⚈⚉⚊⚋⚌⚍⚎⚏⚐⚑⚒⚓⚔⚕⚖⚗⚘⚙⚚⚛⚜⚝⚞⚟⚠⚡⚢⚣⚤⚥⚦⚧⚨⚩⚪⚫⚬⚭⚮⚯⚰⚱⚲⚳⚴⚵⚶⚷⚸⚹⚺⚻⚼⚽⚾⚿⛀⛁⛂⛃⛄⛅⛆⛇⛈⛉⛊⛋⛌⛍⛎⛏⛐⛑⛒⛓⛔⛕⛖⛗⛘⛙⛚⛛⛜⛝⛞⛟⛠⛡⛢⛣⛤⛥⛦⛧⛨⛩⛪⛫⛬⛭⛮⛯⛰⛱⛲⛳⛴⛵⛶⛷⛸⛹⛺⛻⛼⛽⛾⛿

Thus it looks like the Prime cannot visually represent / print out all unicode characters, only a limited subset of them, Question 2: does anyone know the limits to the Prime's unicode support?

Claudio L. · 05-30-2018, 01:40 PM

(05-30-2018 12:50 PM)tcab Wrote: Thus it looks like the Prime cannot visually represent / print out all unicode characters, only a limited subset of them

Devices don't decide which Unicode symbols they can print, that's what fonts are for. Different fonts have glyphs for different character sets. I don't know of any font that has "all" Unicode characters, some are more complete than others but they are all mere fractions depending on usage.
Back to your problem: Code-wise, the Unicode characters don't change just because the font doesn't have a glyph to display it, therefore there's no way for you to filter the text, unless you take the font in which you want to display the text, make yourself a catalog of which characters have a glyph and which ones will display the default one, and use that table to do the filtering. Basically you need one table per font, or a program to "extract" that table from any font, and your filter code using those tables.

Eric Rechlin · 05-30-2018, 03:34 PM

The font used by the Prime is called Prime Sans, and it's a modified version of Google's free Droid Sans font. The ttf file is in the Fonts folder of the HP Prime Virtual Calculator installation, so you can look at that with your favorite font analysis tool to see what characters it supports. A tool I use shows 51280 different characters, but it doesn't seem to have any of the emoji characters.

mfleming · 05-31-2018, 01:24 AM

Any chance the font set of the physical Prime can be altered, or the set updated via a loadable code page? I haven't seen anything in the documentation for that, so I presume it can't be done. Poking around in the bin file would probably come with the usual Void The Warranty admonition. I'd hate to do my own font creation and placement via graphics commands just to get a few unsupported characters!

~Mark

tcab · 05-31-2018, 01:34 AM

Thanks for the responses.

So the problem is that I want to filter out all the unicode chars/codepoints that cannot be rendered. I just want to see all the "juicy" characters that the Prime is capable of rendering, and none of the noise viz. I dont want:
[Image: unicode_fun2_noisy.png?raw=1]

I want:

I stumbled upon this post about comparing the bitmaps of characters, and managed to write the Prime code to do it. My new character comparison function renders the character as a bitmap, which I then convert into a list of 1' s and 0's and can use this information to compare characters, as rendered.

Code:

EXPORT bit_signature_for_char(c)

BEGIN

  local result={}, x, y, pix;

  TEXTOUT_P(c,G1,0,0,2,#000000,100,#FFFFFF);

  for y from 0 to 5 do  // should probably compare more rows of pixels

    for x from 0 to 100 do

      if GETPIX_P(G1,x, y) = #FFFFFF then

        pix := 0;

      else

        pix := 1;

      end;

      result := concat(result, pix);

    end;

  end;

  return result;

END;

EXPORT same_char(c1, c2)

BEGIN

  return string(bit_signature_for_char(c1)) = string(bit_signature_for_char(c2));

END;

EXPORT UNIFUN()

BEGIN

LOCAL cp, S:="";

PRINT();

FOR cp FROM #2600h TO #26FFh DO

  //IF CHAR(cp) ≠ "⛾" THEN  // doesn't work 

  IF NOT same_char(CHAR(cp), CHAR(#2600)) THEN  // works :-)

    S := S + CHAR(cp);

  END;

END;

PRINT(S);

END;

Running across a longer range of chars takes time, but gives:
[Image: unicode_fun1.png?raw=1]

A couple of programming issues I ran into:

I used the offscreen bitmap G1 but for some reason couldn't use G2 to do the textout renderings of bitmaps?
I couldn't find a function to compare lists, so had to convert the lists to strings in order to compare them.

Didier Lachieze · 05-31-2018, 04:22 AM

(05-31-2018 01:34 AM)tcab Wrote:

I couldn't find a function to compare lists, so had to convert the lists to strings in order to compare them.

EQ() will do it. You'll find it in [Toolbox] > Math > List

Tyann · 05-31-2018, 04:29 AM

(05-31-2018 01:34 AM)tcab Wrote: A couple of programming issues I ran into:

I used the offscreen bitmap G1 but for some reason couldn't use G2 to do the textout renderings of bitmaps?

I couldn't find a function to compare lists, so had to convert the lists to strings in order to compare them.

Bonjour

Pour utiliser les variables graphiques G1...G9 vous devez les dimensionnées avant avec DIMGROB.
Comme je ne vois pas cette instruction dans vôtre code : G1 était sans doute déja dimensionnée par un autre programme exécuté avant et pas G2.

Pour comparer 2 listes il faut utiliser EQ(liste1,liste2).

Si cela peut vous aider.

Hello

To use the graphical variables G1 ... G9 you have to dimension them before with DIMGROB.
As I do not see this instruction in your code: G1 was probably already sized by another program executed before and not G2.

To compare 2 lists, use EQ (list1, list2).

If that can help you.

tcab · 05-31-2018, 01:23 PM

Thanks for the tips re comparing lists with EQ and DIMGROB usage. I've incorporated those changes, though then decided to render onto the main screen G0 to serve as a visual progress indicator, together with a percentage done.

Code:

// Unicode dump of unique chars

// Version 2, Andy Bulka (tcab) 2018

LOCAL BLACK:=#000000, WHITE:=#FFFFFF, YELLOW:=#FFFF00;

LOCAL CHAR_WIDTH:=100, CHAR_HEIGHT:=5;

bit_signature_for_char(c)

BEGIN

  local result={}, x, y, pix;

  TEXTOUT_P(c,G0,0,0,2,#000000,100,#FFFFFF);

  for y from 0 to CHAR_HEIGHT do  // should probably compare more rows of pixels

    for x from 0 to CHAR_WIDTH do

      result := concat(result, GETPIX_P(G0,x, y));

    end;

  end;

  return result;

END;

same_char(c1, c2)

BEGIN

  return EQ(bit_signature_for_char(c1),

            bit_signature_for_char(c2));

END;

disp_current(cp,perc)

BEGIN

  RECT_P(G0,20,0,95,12,YELLOW);

  TEXTOUT_P(cp,G0,20,0,2,BLACK,200,WHITE); // disp candidate codepoint number

  TEXTOUT_P(perc+"%",G0,70,0,2,BLACK,200,YELLOW); // disp percent done

END;

EXPORT UNIFUN()

BEGIN

LOCAL cp, S:="", start:=#2600h, finish:=#26FFh, perc;

PRINT();

FOR cp FROM start TO finish DO

  perc:=IP((cp-start)/(finish-start)*100);

  disp_current(cp,perc);

  //IF CHAR(cp) ≠ "⛾" THEN  // doesn't work 

  IF NOT same_char(CHAR(cp), CHAR(#2600)) THEN  // works :-)

    S := S + CHAR(cp);

  END;

END;

PRINT(S);

END;

More programming questions:

Is there a way to TEXTOUT an integer in hex?
Do we know the exact height and width of characters in pixels? (Both in font size 1 & 2)

Tyann · 06-01-2018, 04:30 AM

(05-31-2018 01:23 PM)tcab Wrote: [*]Is there a way to TEXTOUT an integer in hex?
[*]Do we know the exact height and width of characters in pixels? (Both in font size 1 & 2)
[/list]

Bonjour
Vous pouvez écrire var:=TEXTOUT_P(.......
var contiendra la coordonée x du dernier pixel utiliser pour l'affichage du message
ou de la valeur.

Hello
You can write var:=TEXTOUT_P (.......
var will contain the x coordinate of the last pixel used for displaying the message
or value.

cyrille de brébisson · 06-01-2018, 04:53 AM

Hello,

TEXTOUT_P(R->B(value), 0,0) will dispay hex (ih hex is the current mode).
TEXTOUT_P returns the x position of the NEXT pixel where to write the next character...

so, textout_p("part2", textout_p("part1", 0, 0), 0)
will write part1part2 on the screen

cyrille

tcab · 06-01-2018, 07:48 AM

Thanks guys - very helpful.
Am now working on an optimised algorithm that:

Caches the 'signature' of the glyph that represents a boring/non juicy character
Scans only a portion of the pixels of a rendered character

as it turns out that I get just as accurate results scanning only a few pixel rows/cols of a character, and it speeds up scanning enormously. It will be interesting to discover how low I can go and still get accurate results!

I measured the width of a character font size 2 using the return value of TEXTOUT_P as being 12 pixels. Is there any similar technique to determine a character's height?

Tim Wessman · 06-01-2018, 07:56 AM

Not really. Remember that characters can be different widths/heights and even flow into neighboring characters in some cases. It is a very different thing then just plain "bitmap" fonts.

What is actually being returned as the "width" is the stroke advancement.

Tim Wessman · (This post was last modified: 06-01-2018 08:05 AM by Tim Wessman.)

Also, note that we use 16bit per character encoding and currently have no plans to support any characters higher then that. So you can stop at #FFFE...

I'm a bit curious what is wrong or missing from the existing character browser? What are you trying to accomplish that isn't done there?

I'm the one primarily responsible for the unicode stuff in general, so I'm the one that may be able to provide more details in specific areas.

Martin Hepperle · (This post was last modified: 06-01-2018 08:32 AM by Martin Hepperle.)

Some statistics:

PrimeSansMono.ttf ("Prime Sans Mono") has 613 glyphs
PrimeSansBold.ttf ("Prime Sans Bold") has 606 glyphs
PrimeSansFull.ttf ("Prime Sans") has 51279 glyphs
Printing out a character table of the latter is left as an exercise to the reader ;-)

And a small Java program (sorry no Prime here) to create a list of the code points:

Code:

   static void checkChars () throws FontFormatException, IOException

   {

      Font theFont = Font.createFont(Font.TRUETYPE_FONT,

            new File("d:/tmp/PrimeSansMono.ttf"));

      // or, if font is installed:

      // Font theFont = Font.decode("Prime Sans Mono");

      int nGlyphs = theFont.getNumGlyphs();

      System.out.println(theFont.getFontName() + " has " + nGlyphs

            + " glyphs and can display:");

      System.out.println();

      System.out.println("// "+theFont.getFontName() + " has " + nGlyphs

            + " glyphs:");

         System.out.println("glyphs := {");

         int i = 0;

         for ( int codePoint = 0; codePoint < Character.MAX_CODE_POINT; codePoint++ )

         {

            if ( theFont.canDisplay(codePoint) )

            {

               System.out.print( (codePoint));

               if ( i < nGlyphs-2 )

                  System.out.print(",");

               if ( (++i % 8) == 0 )

               {

                  // 8 characters per line

                  System.out.println();

               }

               else

               {

                  System.out.print(" ");

               }

            }

         }

         System.out.println("};");

   }

tcab · 06-01-2018, 10:35 AM

Ok here is the new faster version - to view different ranges of characters just edit the code.

Code:

// Unicode dump of unique chars

// Version 3, Andy Bulka (tcab) 2018

LOCAL boring_glyph_sig:={};

LOCAL BLACK:=#000000, WHITE:=#FFFFFF, YELLOW:=#FFFF00;

//LOCAL CHAR_WIDTH:=12, CHAR_HEIGHT:=5;  // char width seems to be about 12 but TW says it varies

LOCAL CHAR_WIDTH:=3, CHAR_HEIGHT:=2;  // cheaper comparison, don't scan all pixels

bit_signature_for_char(c)

BEGIN

  local result={}, x, y, pix,width;

  TEXTOUT_P(c,G0,0,0,2,#000000,100,#FFFFFF);

  for y from 0 to CHAR_HEIGHT do

    for x from 0 to CHAR_WIDTH do

      result := concat(result, GETPIX_P(G0,x, y));

    end;

  end;

  return result;

END;

same_char(c1, c2)

BEGIN

  return EQ(bit_signature_for_char(c1),

                 bit_signature_for_char(c2));

END;

can_render(c1)

BEGIN

  return NOT EQ(bit_signature_for_char(c1),

                           boring_glyph_sig);

END;

disp_progress(cp,perc)

BEGIN

  RECT_P(G0,20,0,95,12,YELLOW);

  TEXTOUT_P(R→B(cp),G0,20,0,2,BLACK,200,WHITE); // disp candidate codepoint number

  TEXTOUT_P(perc+"%",G0,70,0,2,BLACK,200,YELLOW); // disp percent done

END;

EXPORT UNIFUN()

BEGIN

  //LOCAL cp, S:="", start:=#2600h, finish:=#260Ah, perc;  // a short range of chars

  LOCAL cp, S:="", start:=#2600h, finish:=#26FFh, perc;    // larger range of chars

  //LOCAL cp, S:="", start:=#2500h, finish:=#28FFh, perc;    // even larger range of chars

  PRINT();

  boring_glyph_sig:=bit_signature_for_char(CHAR(#2600));

  FOR cp FROM start TO finish DO

    perc:=IP((cp-start)/(finish-start)*100);

    disp_progress(cp,perc);

    //IF NOT same_char(CHAR(cp), CHAR(#2600)) THEN  // slow

    IF can_render(CHAR(cp)) THEN  // faster

      S := S + CHAR(cp);

    END;

  END;

  PRINT(S);

END;

The progress indicator now displays in hex, too.

Quote:I'm a bit curious what is wrong or missing from the existing character browser? What are you trying to accomplish that isn't done there?

The Prime character browser is great - nothing missing there. I just happened to be looping through some characters one day on the Prime and wondered how I could filter out dud characters, and, well, one thing led to another. :-) Just learning... about unicode, pixels, coordinate systems, grobs, hex format, comparing lists, TEXTOUT, character widths & optimisation. Love it - the Prime sure is a powerful calculator - a well thought out design appropriate to the times we live in (touchscreen, unicode, grobs, algebraic, structured programming, rechargeable, free connectivity kit/editor/emulator, iPhone versions etc.).

Andy