HP Forums
Questions regarding HP Roman character set - Printable Version

+- HP Forums (https://www.hpmuseum.org/forum)
+-- Forum: Not HP Calculators (/forum-7.html)
+--- Forum: Not quite HP Calculators - but related (/forum-8.html)
+--- Thread: Questions regarding HP Roman character set (/thread-6763.html)



Questions regarding HP Roman character set - matthiaspaul - 08-30-2016 04:02 PM

Hi,

I recently wrote an article about the "HP Roman" character set, its history, its properties, the variants in existance, and how they are (or were) used in their various environments:

HP Roman

While the article covers the main aspects already, there are still quite a few details I would like to see included for completeness and accurateness. As some of you are much more familiar with the history of Hewlett-Packard and their products, I guess, some of you could provide valuable information.

Here's the short list of remaining open questions:

- When exactly was "HP Roman Extension" introduced? The earliest documents using this name I could find are dated 1979, but the machines they describe (HP 300 and HP 250) seem to have been announced or even have been available in 1978 already. Before that date, I could find references to "extended international character sets" etc., but from the very incomplete info I found about them, they seem to have had no similarity with "HP Roman Extension" (different characters, different codepoints). Anyone?

- The earliest character map for "HP Roman Extension" I could find dates from 1982 (but this does not mean, that this particular definition wasn't used earlier already), and it still had many codepoints undefined. When was the 1982 definition actually introduced? Does someone know even earlier representations? Which characters were lacking?

- I have found some hints that a HP document named "2640 Series Character Set Generation 13245-90001" might contain character maps of an earlier revision of "HP Roman Extension", but it also could be for another character set, given that the document is apparently dated 1975 (this would precede the earliest date I could find for "HP Roman Extension" by four years). So far, I was unable to find this document. Does someone own a copy of it?

- I found 1984 definitions of both, "HP Roman Extension" and "HP Roman-8", which still had 6(+2) characters undefined. This changed with the 1985 revision of "HP Roman Extension" and "HP Roman-8", which added 6 characters and changed the definition of one character (228) from a stroked d to an eth. These 1985 definitions are also in line with what IBM standardized as codepages 1050 and 1051 in 1989, and except for the "modified HP Roman-8" variant used on early HP RPL calculators in 1986-1988, I am not aware of any other changes later on (except for inofficial "euro" extensions and the later "HP Roman-9"). Are you aware of any revisions of "HP Roman Extension" or "HP Roman-8" after 1985? The pattern of undefined characters before the 1985 revision looks a bit strange to me (two groups 177-178 and 242-245) - why were these areas left undefined in the middle of the code space? (Well, this question applies even more so to the 1982 definition, which left many holes of definition instead of filling up the character set from the bottom more orderly. I assume that this might be down to internal negotiations and back&forth changes - perhaps there is interesting history to be told here as well?)

- Regarding the 1984 definition of "HP Roman-8", it is my assumption that it is the same as the original(?) 1983 definition of "HP Roman-8", right? Or have there been earlier variants with more characters lacking or different before 1984?

- Regarding the 1984 definition of "HP Roman Extension", it obviously differs from the definition in 1982 (which, however, might have been introduced earlier). Is it safe to assume, that there was a 1983 revision, which introduced "HP Roman-8" and brought "HP Roman Extension" to the same level?

- When exactly was the name "HP Roman-8" introduced? The earliest reference using this name I could find was in 1983, but there might have been earlier sources. Before 1983, I only found references to the 7-bit and 8-bit "HP Roman Extension" character set, kind of a pre-cursor.

- When exactly was HP Roman-9 introduced? I could find documents in 2001 which imply that some of the earliest printers to support it were produced in 1998, but I could not find any kind of announcement, so I stated "around 1999".

- Regarding the modified HP Roman-8 character set used on early HP RPL calculators like the HP-28C/HP-28S (and the HP-18C) and the HP 82240A printer, does someone know an official name for that variant? Some of the added characters are very odd, others duplicate already existing characters. There certainly is some history to this particular extension. Why were these characters chosen, when exactly and by whom? Which other calculators used this character set? Were there other printers but the HP 82240A/B to use this? AFAIK the HP-28 could only display characters up to 147, probably because of limited ROM space, but why did it duplicate the guillemets at 146/147 (or are they different from 251/253), and what is the purpose of the "LF" character at 144? Why did the printers support such a large number of strange superscript and subscript indices? And what is the purpose of this strange graphical symbol at 148?

- Are there any official mappings from HP Roman-8 to Unicode? Or does someone have an overview of the different mappings introduced in HP and third-party software products (including emulators)? If possible I would like to document the specific differences.

Thanks and greetings,

Matthias


RE: Questions regarding HP Roman character set - rprosperi - 08-30-2016 08:15 PM

(08-30-2016 04:02 PM)matthiaspaul Wrote:  Hi,

I recently wrote an article about the "HP Roman" character set....

Nice work, the kind of Wikipedia page I enjoy reading when I run across one like this. Rich tables and nice reference section pull me in and make me want to read the whole thing, rather than scan for the TL;DR summary.


RE: Questions regarding HP Roman character set - Chris Dreher - 08-31-2016 03:23 AM

Droid48 Reader has the following article about how it maps the HP48 character set to Unicode. It includes images generated from the how the HP48 renders these characters to the screen (does not discuss the printer). It also discusses the rational for choosing one possible Unicode codepoint over others when there is more than one character to chose from. It also cites alternative mappings that other people/software have chosen. If you hover your mouse over the images for empty boxes and bad renders, you can see further details on what tool incorrectly rendered the character.
http://www.drehersoft.com/mapping-hp48-text-to-unicode/

One correction to make on the Wikipedia article is that the HP48 uses character 160 as NBSP, not character 128, at least for screen display purposes. I might be misunderstanding what the article means by "... because this is the codepoint the former non-breaking space (in the HP 48 series) will be transformed to" when it was discussing character 128.

Other than that, this article looks great. I can see a lot of hard work was put into it. Thanks!

(disclaimer: I wrote the Droid48 Reader article)


RE: Questions regarding HP Roman character set - Chris Dreher - 08-31-2016 03:45 AM

FYI, I still have my HP 48SX manuals, if that is of any help.

Page 609 talks about how characters are mapped with the 82240A printer. It is clear about what characters are unavailable on the 82240A but is then vague about which HP48 characters need remapping via the OLDPRT command (probably because HP's OLDPRT code handles it for the user).

Pages 626 and 694 talk about how characters 160-255 are ISO 8859 Latin 1. It's interesting they don't mention the slight deviations they took in the ASCII range (0-127). Pages 627 and 695-696 have the character tables. Page 627 includes the translation escape codes such as \<), \<<, and \PI.

Some of this might not directly apply to the Wikipedia article, but I figured it might be interesting.


RE: Questions regarding HP Roman character set - matthiaspaul - 09-04-2016 11:16 PM

Thanks for the feedback already. It is nice to know it was worth the effort.

(08-31-2016 03:23 AM)Chris Dreher Wrote:  It also discusses the rational for choosing one possible Unicode codepoint over others when there is more than one character to chose from. It also cites alternative mappings that other people/software have chosen. If you hover your mouse over the images for empty boxes and bad renders, you can see further details on what tool incorrectly rendered the character.
http://www.drehersoft.com/mapping-hp48-text-to-unicode/
Yes, I am aware of your article. But it does not really apply to the HP Roman article, more to the RPL character set article, where it is already among the citations. ;-)
Quote:One correction to make on the Wikipedia article is that the HP48 uses character 160 as NBSP, not character 128, at least for screen display purposes. I might be misunderstanding what the article means by "... because this is the codepoint the former non-breaking space (in the HP 48 series) will be transformed to" when it was discussing character 128.
In the later ISO 8859-1 based HP-48 character set, the angle symbol ∡ is at codepoint 128 (0x80) and NBSP at codepoint 160 (0xA0).

In the older HP Roman-8 based HP-28 character set, the angle symbol and NBSP are swapped; the angle symbol was located at codepoint 160 and NBSP at 128.
I assumed (but haven't found the time to verify) that this is also the translation vector used by OLDPRT/PRTPAR for printing on an HP 82240A printer.
If this assumption would be correct and the vector hasn't changed on the HP-49/50, the euro character at codepoint 160 would end up at codepoint 128 as well.

I won't have the time to investigate the translation vector in the next couple of weeks (and I only have a 48GX and 50g to look at anyway). I could not find this being discussed explicitly in the past, but if someone has this info available in some form already, please provide it... ;-)
Did the translation vector change from the 38/48 to the 49/50 series or to the 39/40 series? After all, the 49 added the euro symbol, and the 39 the -1 index. Can the 39/40 still print on the HP 82240A at all? When exactly (which model) were the glyphs for codepoints 28, 29, 30 and 31 added, and what's the difference between left and right ellipsis?

(08-31-2016 03:45 AM)Chris Dreher Wrote:  I still have my HP 48SX manuals, if that is of any help.
If the SX manuals contain any details not also mentioned in the GX manuals (which I own in paper and electronic form), this would be of help.

Greetings,

Matthias


RE: Questions regarding HP Roman character set - Chris Dreher - 09-05-2016 04:21 PM

Thanks. I think I may have cross-wired some parts of the RPL character set with the HP Roman character set but kept it separate with other parts.

Regarding the Wikipedia sentence that contains "... because this is the codepoint the former non-breaking space (in the HP 48 series) will be transformed to", I now see what you are saying there. However, I misread it the first several times I read it. I recommend rewording it since the first sentence is very long with multiple facts. While this isn't the best, here's one suggestion:
Quote:There is no official codepoint definition for the euro sign in this modified character set. The HP 49/50 series of calculators use a different character set based on ECMA-94 / ISO 8859-1 which includes the euro symbol. When printing to the HP 82240A printers via OLDPRT/PRTPAR, the euro sign may be translated to codepoint 128 (0x80). This is because the HP 48 series OLDPRT/PRTPAR transforms the former non-breaking space codepoint to codepoint 128 (0x80). Mapping the euro sign to code point 186 (0xBA) as in HP Roman-9 would be another choice.

That said, I'm not certain that the HP 48 translates the RPL NBSP (0xA0) to HP Roman NBSP (0x80). It appears that it is translated to a regular ASCII space (0x20). I might be reading the vector wrong. Anyways, here is the vector from the PRTPAR variable since HP's code trumps any documentation they may have written for the 48-series. I extracted this from the SX and GX and confirmed the vector is the same on both calculators.
Code:
      _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F  

8_    A0 7F 7F 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 76 
9_    5E 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F FC 7F 
A_    20 B8 BF AF BA BC 7C BD AB 63 F9 92 7E 2D 52 B0 
B_    B3 FE 97 98 A8 8F F4 F2 2C 31 FA 93 F7 F8 F5 B9 
C_    A1 E0 A2 E1 D8 D0 D3 B4 A3 DC A4 A5 E6 E5 A6 A7 
D_    E3 B6 E8 E7 DF E9 DA 82 D2 AD ED AE DB B1 F0 DE 
E_    C8 C4 C0 E2 CC D4 D7 B5 C9 C5 C1 CD D9 D5 D1 DD 
F_    E4 B7 CA C6 C2 EA CE 81 D6 CB C7 C3 CF B2 F1 EF



RE: Questions regarding HP Roman character set - matthiaspaul - 09-09-2016 09:30 PM

(09-05-2016 04:21 PM)Chris Dreher Wrote:  That said, I'm not certain that the HP 48 translates the RPL NBSP (0xA0) to HP Roman NBSP (0x80). It appears that it is translated to a regular ASCII space (0x20). I might be reading the vector wrong.
No, then my assumption was invalid.

Thanks a lot, I have incorporated your feedback.

Greetings,

Matthias


RE: Questions regarding HP Roman character set - Garth Wilson - 09-10-2016 09:12 PM

Last night I got my yellow monochrome monitor out and my HP92198 80-column, 24-line HPIL video interface which I had not used in many years. There was a note on the bottom of the HP92198 saying I had refreshed the EPROMs in 2004. I thought I should do it again now, but I couldn't tell the brand so that the EPROM programmer would be sure to use the right algorithm. I probably used the generic non-CMOS one before, 50ms and 25V. I did read the EPROMs into files though, so I can program new EPROMs later if necessary. (I do lots of backups, so there shouldn't be any way I'll lose it.) For now, the video is working. I just listed a program and key assignments and flags from my HP-41.

While I was in the programmer software, I decided to look at the data. It was pretty clear that U2 has the program material and U12 has the characters, ie, the info telling what dots to use in each field to form each character. It didn't take long to realize I could change the characters if desired. It always bugged me that I could have the ROMAN8 character set loaded in my HP71 to use with the Thinkjet printer but the monitor did not do the special characters but instead did inverse video of the normal characters for the ones with the high bit set. It might take a few hours, but the ROMAN8 set could be put in the HP92198. If you had to go back sometimes to the inverse video (like for spreadsheets where you're moving from one active field to another), you could have more than one ROM, and put a switch on it to select which one you want at the time. The switch would only have to affect the output-enable or chip-select pin, not all the data pins, let alone address pins.

I realize the HP92198 80-column interface is rare; but I expect things work similarly with the other, more common HP video interface which only did 40 columns.

Only partially related, but maybe someone knows: I was looking through my manuals for a way to print (in this case to the HP92198) HP-41cx text files, but found nothing. I could write a program to pull lines into ALPHA, up to 24 characters at a time, and use ACA (accumulate alpha) to piece longer lines together on the monitor or printer, but it seems like I shouldn't have to. [Edit: While looking for something else, I found such a program that I wrote 30 years ago; but again, it seems like there must be an already-provided way to do it.] Am I forgetting a built-in function? I have a 41cx, HPIL, and Extended I/O module (plus ZENROM and Advantage modules).