Post Reply 
monic part 6: the mC programming language and compiler
12-11-2021, 12:29 AM (This post was last modified: 05-29-2022 06:08 AM by F-73P.)
Post: #1
monic part 6: the mC programming language and compiler
Previous: monic part 5: writing C programs on the calculator

Next: monic part 7: a little CAS

In the previous thread I introduced the mC ("monic C") high-level programming language for the monic calculator. In this thread I'll document the development of a compiler that translates the language into bytecode for execution by a virtual machine.

mC does not conform to any C standard. Features will include:
  • recursive functions
  • local variables
  • multi-dimensional arrays and pointers
Below is the EBNF grammar (v1.0) for mC (see the previous thread for information on grammars and EBNF notation):

program \(\rightarrow\) code ENDFILE

code \(\rightarrow\) { var-declaration } { function-declaration } main-function-declaration

var-declaration \(\rightarrow\) ARRAY ID dimensions SEMICOLON | [ "*" ] ID SEMICOLON

dimensions \(\rightarrow\) LEFTBRACKET NUM RIGHTBRACKET { LEFTBRACKET NUM RIGHTBRACKET }

function-declaration \(\rightarrow\) ID [ LEFTPARENTHESIS params RIGHTPARENTHESIS ] LEFTBRACE local-declarations statement-list RIGHTBRACE

main-function-declaration \(\rightarrow\) MAIN LEFTBRACE local-declarations statement-list RIGHTBRACE

params \(\rightarrow\) [ param { COMMA param } ]

param \(\rightarrow\) ID { LEFTBRACKET NUM RIGHTBRACKET } | "*" ID

local-declarations \(\rightarrow\) { LOCAL var-declaration }

statement-list \(\rightarrow\) { statement }

statement \(\rightarrow\) compound-stmt | if-stmt | while-stmt | do-stmt | for-stmt| switch-stmt | return-stmt | read-stmt | write-stmt | assignment-stmt | expression-stmt | goto-stmt | label-stmt | nop-stmt

compound-stmt\(\rightarrow\) LEFTBRACE statement-list RIGHTBRACE

if-stmt \(\rightarrow\) IF LEFTPARENTHESIS Boolean-expression RIGHTPARENTHESIS statement [ ELSE statement ]

while-stmt \(\rightarrow\) WHILE LEFTPARENTHESIS Boolean-expression RIGHTPARENTHESIS statement

do-stmt \(\rightarrow\) DO statement WHILE LEFTPARENTHESIS Boolean-expression RIGHTPARENTHESIS SEMICOLON

for-stmt \(\rightarrow\) FOR LEFTPARENTHESIS [ assignment ] SEMICOLON Boolean-expression SEMICOLON [ addditive-expression ] RIGHTPARENTHESIS statement

switch-stmt \(\rightarrow\) SWITCH LEFTPARENTHESIS var RIGHTPARENTHESIS LEFTBRACE { CASE NUM COLON statement-list BREAK SEMICOLON } [ DEFAULT COLON statement-list ] RIGHTBRACE

var \(\rightarrow\) ID { LEFTBRACKET additive-expression RIGHTBRACKET } | "*" ID

return-stmt \(\rightarrow\) RETURN [ expression ] SEMICOLON

read-stmt \(\rightarrow\) READ var SEMICOLON

write-stmt \(\rightarrow\) WRITE expression SEMICOLON

assignment-stmt \(\rightarrow\) assignment SEMICOLON

expression-stmt \(\rightarrow\) [ expression ] SEMICOLON

goto-stmt \(\rightarrow\) GOTO ID SEMICOLON

label-stmt \(\rightarrow\) ID COLON statement

nop-stmt \(\rightarrow\) NOPR SEMICOLON

assignment \(\rightarrow\) var "=" expression

expression \(\rightarrow\) Boolean-expression | additive-expression

Boolean-expression \(\rightarrow\) logical-expression { boolop logical-expression }

logical-expression \(\rightarrow\) additive-expression relop additive-expression | var | TRUE | FALSE

additive-expression \(\rightarrow\) term { addop term }

term \(\rightarrow\) [ NEGATE ] factor { mulop [ NEGATE ] factor }

factor \(\rightarrow\) LEFTPARENTHESIS additive-expression RIGHTPARENTHESIS | NUM | var | ID LEFTPARENTHESIS args RIGHTPARENTHESIS | var "++" | var "--" | "&" ID { LEFTBRACKET additive-expression RIGHTBRACKET }

args \(\rightarrow\) [ expression { COMMA expression } ]

boolop \(\rightarrow\) "and" | "or" | "not"

relop \(\rightarrow\) "<=" | "<" | ">" | ">=" | "==" | "!="

addop \(\rightarrow\) "+" | "-"

mulop \(\rightarrow\) "*" | "/"

The C language combines all the power of assembly language with all the ease-of-use of assembly language
Find all posts by this user
Quote this message in a reply
12-12-2021, 06:57 PM
Post: #2
RE: monic part 6: the mC programming language and compiler
Nice work! I really like your project and I'm very much looking forward to learn more about it. Looks like you put a lot of effort into this to cover most of the C language.

I am curious how you plan to handle logical expressions as "short circuit" code or not. I noticed that RETURN, WRITE and a few other constructs allow 'expression' as an argument, but 'factor' for example uses 'additive_expression' instead of 'expression'. To implement RETURN and WRITE this way requires "short circuit logic" to return Boolean values without evaluating unnecessary AND and OR operands. This can be tricky to implement, but not too hard to do using backpatch lists to jump on "true" and "false" conditions to the right VM opcode targets.

I think I spotted a problem with the 'function_declaration' starting with an ID just like 'var_declaration', so there is no way of telling which is which with one token lookahead. Making the '(' and ')' optional in 'function_declaration' makes it even harder to distinguish from 'var_declaration' because you have to look ahead all the way to the first '(' to see if this is a function, not a variable declaration. Perhaps insist that functions should be declared with parenthesis?

Also 'expression' is defined to be either a 'Boolean_expression' or a 'additive_expression' but you have no way of telling which one applies by just looking at the lookahead token.

Why is SWITCH restricted to a 'var' argument and not an `additive_expression'?

I've converted the grammar to Yacc to give it a go with Bison and fixed 'function_declaration' to require parenthesis and MAIN with optional parameters for consistence and just in case if you'd like to pass arguments to the main program. However, this grammar has 47 shift/reduce conflicts which suggests that several (cascading) ambiguities exist:
Code:
%token ARRAY
%token ID
%token NUM
%token MAIN
%token LOCAL
%token RETURN
%token READ
%token WRITE
%token GOTO
%token NOPR
%token SWITCH
%token CASE
%token BREAK
%token DEFAULT
%token FOR
%token IF
%token ELSE
%token WHILE
%token DO
%token NEGATE
%token LE
%token GE
%token EQ
%token NE
%token AND
%token OR
%token NOT
%token INC       // ++
%token DEC       // --
%token TRUE
%token FALSE
%token ENDFILE
%%
program
: code ENDFILE
;
code
: var_declarations function_declarations main_function_declaration
;
var_declarations
: var_declaration var_declarations
|
;
var_declaration
: ARRAY ID dimensions ';'
| '*' ID ';'
| ID ';'
;
dimensions
: '[' NUM ']' dimensions
| '[' NUM ']'
;
function_declarations
: function_declaration function_declarations
|
;
function_declaration
: ID '(' params_opt ')' '{' local_declarations statement_list '}'
;
main_function_declaration
: MAIN '(' params_opt ')' '{' local_declarations statement_list '}'
;
params
: param ',' params
| param
;
params_opt
: params
|
;
param
: ID nums
| '*' ID
;
nums
: '[' NUM ']' nums
|
;
local_declarations
: LOCAL var_declaration local_declarations
|
;
statement_list
: statement statement_list
|
;
statement
: compound_stmt
| if_stmt
| while_stmt
| do_stmt
| for_stmt| switch_stmt
| return_stmt
| read_stmt
| write_stmt
| assignment_stmt
| expression_stmt
| goto_stmt
| label_stmt
| nop_stmt 
;
compound_stmt
: '{' statement_list '}'
;
if_stmt
: IF '(' Boolean_expression ')' statement
| IF '(' Boolean_expression ')' statement ELSE statement
;
while_stmt
: WHILE '(' Boolean_expression ')' statement
;
do_stmt
: DO statement WHILE '(' Boolean_expression ')' ';'
;
for_stmt
: FOR '(' assignment_opt ';' Boolean_expression ';' additive_expression_opt ')' statement 
;
switch_stmt
: SWITCH '(' var ')' '{' cases ';' default_opt '}' 
;
cases
: CASE NUM ':' statement_list BREAK ';' cases
|
;
default_opt
: DEFAULT ':' statement_list
|
;
var
: ID indices
| '*' ID 
;
indices
: '[' additive_expression ']' indices
|
;
return_stmt
: RETURN expression_opt ';' 
;
read_stmt
: READ var ';' 
;
write_stmt
: WRITE expression ';' 
;
assignment_stmt
: assignment ';'
;
expression_stmt
: expression_opt ';'
;
goto_stmt
: GOTO ID ';'
;
label_stmt
: ID ':' statement
;
nop_stmt
: NOPR ';'
;
assignment
: var '=' expression 
;
assignment_opt
: assignment
|
;
expression
: Boolean_expression
| additive_expression
;
expression_opt
: expression
|
;
Boolean_expression
: logical_expression boolop Boolean_expression
| logical_expression
;
logical_expression
: additive_expression relop additive_expression
| var
| TRUE
| FALSE 
;
additive_expression
: term addop additive_expression
| term
;
additive_expression_opt
: additive_expression
|
;
term
: neg_opt factor mulop term
| neg_opt factor
;
neg_opt
: NEGATE
|
;
factor
: '(' additive_expression ')'
| NUM
| var
| ID '(' args_opt ')'
| var INC
| var DEC
| '&' ID indices
;
args
: expression ',' args
| expression
;
args_opt
: args
|
;
boolop
: AND
| OR
| NOT
;
relop
: LE
| '<'
| '>'
| GE
| EQ
| NE
;
addop
: '+'
| '-'
;
mulop
: '*'
| '/'
;
%%
Run bison -v grammar.y to produce a grammar.output report.

I didn't have time to verify the conflicts to see what constructs might actually become problematic to implement as a parser based on these two conceptually identical grammar versions, but the grammar is not LR(1) and LALR. Note that there should only be one shift/reduce conflict for the IF-ELSE ambiguity that always arrises in C. I haven't checked the grammar with ANTLR, which may reveal more useful information about potential grammar ambiguity issues to implement a recursive-descent parser, since ANTLR is a LL(1) generator. ANTLR also accepts grammars beyond LL(1).

Hope this is helpful. Looks like a fun and useful project to work on.

- Rob

"I count on old friends to remain rational"
Visit this user's website Find all posts by this user
Quote this message in a reply
12-13-2021, 03:39 AM (This post was last modified: 12-17-2021 04:11 AM by F-73P.)
Post: #3
RE: monic part 6: the mC programming language and compiler
(12-12-2021 06:57 PM)robve Wrote:  Nice work! I really like your project and I'm very much looking forward to learn more about it. Looks like you put a lot of effort into this to cover most of the C language.

Thanks, I've started using a development board with a 4.3" display and i.MX RT1062 processor (Cortex-M7, 600MHz, 4MB FLASH, 32MB RAM) and am rewriting the firmware, so I thought this would be a good time to document the project. Here's a screen shot of a test program:

[Image: 51743164643_c0f2edc8f2.jpg]

Some aspects have been very challenging but I'm learning a lot and it has been fun too. I would like to add support for a programming language similar to FORTRAN as well:

[Image: 51742101942_dc63d72959.jpg][Image: 51743812475_47cf6ec3c2.jpg]

(12-12-2021 06:57 PM)robve Wrote:  Why is SWITCH restricted to a 'var' argument and not an `additive_expression'?
Thanks, I'll fix that. I've only ever used a variable argument with SWITCH and just didn't think of it. Great to have a fresh pair of eyes go over the grammar Smile

(12-12-2021 06:57 PM)robve Wrote:  I am curious how you plan to handle logical expressions as "short circuit" code or not. I noticed that RETURN, WRITE and a few other constructs allow 'expression' as an argument, but 'factor' for example uses 'additive_expression' instead of 'expression'. To implement RETURN and WRITE this way requires "short circuit logic" to return Boolean values without evaluating unnecessary AND and OR operands. This can be tricky to implement, but not too hard to do using backpatch lists to jump on "true" and "false" conditions to the right VM opcode targets.
To be honest I never heard about "short circuit" code until you mentioned it Smile. I thought to just evaluate the entire Boolean expression, but I now see programs can run faster with "short circuit" code. Something to think about!

Thanks for pointing out inconsistencies in the use of 'expression' and 'additive-expression'. I'll amend the grammar and post it soon.

(12-12-2021 06:57 PM)robve Wrote:  I think I spotted a problem with the 'function_declaration' starting with an ID just like 'var_declaration', so there is no way of telling which is which with one token lookahead. Making the '(' and ')' optional in 'function_declaration' makes it even harder to distinguish from 'var_declaration' because you have to look ahead all the way to the first '(' to see if this is a function, not a variable declaration. Perhaps insist that functions should be declared with parenthesis?
Yes you are right. I wanted to make parentheses optional for functions without parameters to reduce keystrokes but your suggestion is better, and I'll employ an extra token lookahead to determine if it's a function or variable declaration.

(12-12-2021 06:57 PM)robve Wrote:  Also 'expression' is defined to be either a 'Boolean_expression' or a 'additive_expression' but you have no way of telling which one applies by just looking at the lookahead token.
Well spotted, I'll post an amended grammar soon.

(12-12-2021 06:57 PM)robve Wrote:  I've converted the grammar to Yacc to give it a go with Bison and fixed 'function_declaration' to require parenthesis and MAIN with optional parameters for consistence and just in case if you'd like to pass arguments to the main program
Great idea, I thought to only use "read" to bring in values from the stack. Your way allows one to write e.g. "main(a,b)" and save two read statements.

(12-12-2021 06:57 PM)robve Wrote:  Hope this is helpful. Looks like a fun and useful project to work on.
Thanks Rob, this is very helpful. I really appreciate you taking the time to check the grammar and provide valuable feedback, which I'll use to improve the project.

The C language combines all the power of assembly language with all the ease-of-use of assembly language
Find all posts by this user
Quote this message in a reply
12-16-2021, 10:14 AM (This post was last modified: 03-17-2022 01:20 AM by F-73P.)
Post: #4
RE: monic part 6: the mC programming language and compiler
Below is v1.1 of the EBNF grammar, which hopefully solves the problems Rob found in the previous version:

program \(\rightarrow\) code ENDFILE

code \(\rightarrow\) { var-declaration } { function-declaration } main-function-declaration

var-declaration \(\rightarrow\) ARRAY ID dimensions SEMICOLON | [ "*" ] ID SEMICOLON

dimensions \(\rightarrow\) LEFTBRACKET NUM RIGHTBRACKET { LEFTBRACKET NUM RIGHTBRACKET }

function-declaration \(\rightarrow\) ID LEFTPARENTHESIS params RIGHTPARENTHESIS LEFTBRACE local-declarations statement-list RIGHTBRACE

main-function-declaration \(\rightarrow\) MAIN LEFTPARENTHESIS params RIGHTPARENTHESIS LEFTBRACE local-declarations statement-list RIGHTBRACE

params \(\rightarrow\) [ param { COMMA param } ]

param \(\rightarrow\) ID { LEFTBRACKET NUM RIGHTBRACKET } | "*" ID

local-declarations \(\rightarrow\) { LOCAL var-declaration }

statement-list \(\rightarrow\) { statement }

statement \(\rightarrow\) compound-stmt | if-stmt | while-stmt | do-stmt | for-stmt| switch-stmt | return-stmt | read-stmt | write-stmt | assignment-stmt | expression-stmt | goto-stmt | label-stmt | nop-stmt

compound-stmt\(\rightarrow\) LEFTBRACE statement-list RIGHTBRACE

if-stmt \(\rightarrow\) IF LEFTPARENTHESIS expression RIGHTPARENTHESIS statement [ ELSE statement ]

while-stmt \(\rightarrow\) WHILE LEFTPARENTHESIS expression RIGHTPARENTHESIS statement

do-stmt \(\rightarrow\) DO statement WHILE LEFTPARENTHESIS expression RIGHTPARENTHESIS SEMICOLON

for-stmt \(\rightarrow\) FOR LEFTPARENTHESIS [ assignment ] SEMICOLON [ expression ] SEMICOLON [ expression ] RIGHTPARENTHESIS statement

switch-stmt \(\rightarrow\) SWITCH LEFTPARENTHESIS expression RIGHTPARENTHESIS LEFTBRACE { CASE NUM COLON statement-list BREAK SEMICOLON } [ DEFAULT COLON statement-list ] RIGHTBRACE

var \(\rightarrow\) ID { LEFTBRACKET expression RIGHTBRACKET } | "*" ID

return-stmt \(\rightarrow\) RETURN [ expression ] SEMICOLON

read-stmt \(\rightarrow\) READ var SEMICOLON

write-stmt \(\rightarrow\) WRITE expression SEMICOLON

assignment-stmt \(\rightarrow\) assignment SEMICOLON

expression-stmt \(\rightarrow\) [ expression ] SEMICOLON

goto-stmt \(\rightarrow\) GOTO ID SEMICOLON

label-stmt \(\rightarrow\) ID COLON statement-list

nop-stmt \(\rightarrow\) NOPR SEMICOLON

assignment \(\rightarrow\) var "=" expression

expression \(\rightarrow\) simple-expression { boolop simple-expression }

simple-expression \(\rightarrow\) additive-expression [ relop additive-expression ]

additive-expression \(\rightarrow\) term { addop term }

term \(\rightarrow\) [ NEGATE ] factor { mulop [ NEGATE ] factor }

factor \(\rightarrow\) LEFTPARENTHESIS expression RIGHTPARENTHESIS | NUM | var | var "++" | var "--" | "&" ID { LEFTBRACKET expression RIGHTBRACKET } | "true" | "false" | ID LEFTPARENTHESIS args RIGHTPARENTHESIS

args \(\rightarrow\) [ expression { COMMA expression } ]

boolop \(\rightarrow\) "and" | "or" | "not"

relop \(\rightarrow\) "<=" | "<" | ">" | ">=" | "==" | "!="

addop \(\rightarrow\) "+" | "-"

mulop \(\rightarrow\) "*" | "/"

I've written a parser recogniser for this grammar. Below are some screen shots. The source code is displayed in green after "compile" is pressed if no syntax errors are found:

[Image: 51752732405_5b578c8ccc.jpg][Image: 51752503359_5901aebbbb.jpg]

otherwise the offending token is printed in red and an error message displayed at the bottom of the screen:

[Image: 51752731935_1e3857a15e.jpg][Image: 51751029642_31974f540b.jpg]

The C language combines all the power of assembly language with all the ease-of-use of assembly language
Find all posts by this user
Quote this message in a reply
01-16-2022, 10:01 PM
Post: #5
RE: monic part 6: the mC programming language and compiler
(12-13-2021 03:39 AM)F-73P Wrote:  I've started using a development board with a 4.3" display and i.MX RT1062 processor (Cortex-M7, 600MHz, 4MB FLASH, 32MB RAM) and am rewriting the firmware, so I thought this would be a good time to document the project. Here's a screen shot of a test program:

[Image: 51743164643_c0f2edc8f2.jpg]

Some aspects have been very challenging but I'm learning a lot and it has been fun too. I would like to add support for a programming language similar to FORTRAN as well:

[Image: 51742101942_dc63d72959.jpg][Image: 51743812475_47cf6ec3c2.jpg]

Looks like this is becoming a really nice small but capable system! I wonder how you handle keyboard input? With QWERTY keys or with a menu system to enter commands? Or both? I prefer QWERTY like the Sharp and Casio pocket computers, the HP 71B and Ti Voyage 200, especially the 200 because its keyboard and menu system helps to write programs quickly without getting bogged down by awkward keypress combinations to write code.

To implement interpreters from scratch, Crafting Interpreters is an indispensable resource. Personally, I'm more on the compiler-compiler side of the programming language implementation spectrum, but Crafting Compilers makes for a good reading to learn the trade without requiring compiler-compiler tools or a deep understanding of the theoretical foundations of these.

- Rob

"I count on old friends to remain rational"
Visit this user's website Find all posts by this user
Quote this message in a reply
01-21-2022, 03:49 AM
Post: #6
RE: monic part 6: the mC programming language and compiler
(01-16-2022 10:01 PM)robve Wrote:  I wonder how you handle keyboard input? With QWERTY keys or with a menu system to enter commands? Or both?

Both, but without a QWERTY keypad:

[Image: 51626443852_cc1801b291_b.jpg]

I'll try to incorporate a QWERTY keypad on the next prototype:

[Image: 51833753930_b72eb48c1f_b.jpg]

(01-16-2022 10:01 PM)robve Wrote:  To implement interpreters from scratch, Crafting Interpreters is an indispensable resource.

Thanks, I'll check it out!

The C language combines all the power of assembly language with all the ease-of-use of assembly language
Find all posts by this user
Quote this message in a reply
01-28-2022, 10:20 AM
Post: #7
RE: monic part 6: the mC programming language and compiler
I've written a debugger that supports single-stepping through the source code. The parser creates two bytecode programs - one for execution ("release") and the other for debugging ("debug"). The debug bytecode program contains instructions that halt execution and enable debugging.

As an example, suppose one wishes to step through the program to calculate 5!:

[Image: 51847618123_b194e38cf0.jpg]

The debugger is comprised of the following screens:

1. Source code with arrow pointing at the line to be executed next:

[Image: 51847557446_7ed21d8c7d.jpg]

2. Stack with the line to be executed next at the top:

[Image: 51846583922_8676dd2572.jpg]

3. Global variables with the line to be executed next at the top. The image below gives the values of the variables after stepping through the while loop once:

[Image: 51847613953_95d24082f4.jpg]

4. Parameters with the line to be executed next at the top (not yet active).
5. Local variables with the line to be executed next at the top (not yet active).

6. "Release" bytecode with arrow pointing at the line to be executed next:

[Image: 51847875319_2b2a99fc88.jpg][Image: 51847555326_a90f3971c5.jpg]

7. "Debug" bytecode with arrow pointing at the line to be executed next:

[Image: 51847615023_0cdb0c21f5.jpg][Image: 51847614458_b5aa2660f9.jpg]

Pressing "step" executes a line of the C source code and the user can see which corresponding bytecode instructions are executed in the bytecode debugging screens.

After pressing "step" once more at the end of the program we return to the home screen. The result is on stack level 1:

[Image: 51848200450_80f63589dd.jpg][Image: 51847871149_99186b1df9.jpg]

The C language combines all the power of assembly language with all the ease-of-use of assembly language
Find all posts by this user
Quote this message in a reply
05-29-2022, 05:51 AM (This post was last modified: 12-27-2022 12:04 AM by F-73P.)
Post: #8
RE: monic part 6: the mC programming language and compiler
I've added support for functions. Below is a bisection method program:

[Image: 52105355057_3743dc53d9.jpg][Image: 52106376761_c815d1a2d9.jpg]

Parameter values can be viewed in the debugger:

[Image: 52106631524_ff84dab33f.jpg]

I've replaced the i.MX RT1010 board with an i.MX RT1050 board, which has considerably more memory (64MB vs 16MB Flash and 32MB vs 128kB RAM):

[Image: 52105353052_02f8854f1a.jpg]

The C language combines all the power of assembly language with all the ease-of-use of assembly language
Find all posts by this user
Quote this message in a reply
06-21-2022, 11:31 PM
Post: #9
RE: monic part 6: the mC programming language and compiler
I've completed all of the control statements. In addition to the if, while, do, and for statements there are now conditional operators:

[Image: 52162933992_6ace19f7d8.jpg]

and switch statements:

[Image: 52162933682_61b57396ba.jpg]

as well as continue and break statements.

And of course no programming language would be complete without a goto statement Smile:

[Image: 52164001288_cd10e069a6.jpg]

All of the control statements behave like their C counterparts, except for switch which has an implied break at the end of a list of statements, e.g. the following are equivalent:

[Image: 52164196479_48d6429ed2.jpg][Image: 52164196259_33e9251c1a.jpg]

The C language combines all the power of assembly language with all the ease-of-use of assembly language
Find all posts by this user
Quote this message in a reply
06-21-2022, 11:53 PM
Post: #10
RE: monic part 6: the mC programming language and compiler
Great work!

(06-21-2022 11:31 PM)F-73P Wrote:  All of the control statements behave like their C counterparts, except for switch which has an implied break at the end of a list of statements, e.g. the following are equivalent:
I am not sure this is a good idea, C programmers expect the code to fall thru between cases when there is no break instruction.

Sylvain
Find all posts by this user
Quote this message in a reply
06-22-2022, 01:50 AM (This post was last modified: 06-23-2022 04:29 AM by F-73P.)
Post: #11
RE: monic part 6: the mC programming language and compiler
Thanks Sylvain, you are right. Implementing switch is quite a challenge - I would like it to conform to standard C but I just can't figure it out. Below is the code for switch in the parser, there's already a lot going on!

Code:

void switch_stmt(void){
    uint16_t jumpArray1ReleaseIndexLocal;
    uint16_t jumpArray2ReleaseIndexLocal;
    uint16_t jumpArray3ReleaseIndexLocal; //to jump to statement-list when match in multiple labels in switch statement

    uint16_t jumpArray1DebugIndexLocal;
    uint16_t jumpArray2DebugIndexLocal;
    uint16_t jumpArray3DebugIndexLocal; //to jump to statement-list when match in multiple labels in switch statement

    _Bool flag=false;

    /*set breakpoint for debugging*/
    set_statement_to_line_number_array(statementTolineNoArrayIndex,get_line_no(​));
    set_prg_index_virtual_array(statementTolineNoArrayIndex,prgReleaseIndex);
    set_prg_debug(prgDebugIndex++,BKPT+statementTolineNoArrayIndex++);
    /****************/

    breakStackIndex++;
    breakStack[breakStackIndex-1].index=jumpArray1ReleaseIndexGlobal; //increment breakStack so if "break" statement in source jump out of switch statement
    breakStack[breakStackIndex-1].loop=SWITCHSTATEMENT;

    jumpArray1ReleaseIndexLocal = jumpArray1ReleaseIndexGlobal++;
    jumpArray1DebugIndexLocal = jumpArray1DebugIndexGlobal++;

    advance();
    match_check_advance(LPAREN)
    expression(); if(get_compiler_error_flag()) return;
    match_check_advance(RPAREN)
    match_check_advance(LBRACE)
    while(token==CASE) {
        if(flag) { //do not set jumpArray2 on first iteration
            set_jump_array2_release(jumpArray2ReleaseIndexLocal,prgReleaseIndex-1); //program jumps here if no match
            set_jump_array2_debug(jumpArray2DebugIndexLocal,prgDebugIndex-1);
        }
        else flag=true;
        /*DUP*/
        set_prg_release(prgReleaseIndex++,DUP);
        set_prg_debug(prgDebugIndex++,DUP);
        /************************************/
        advance();
        match(NUM); if(get_compiler_error_flag()) return;
        save_literal(); //convert tokenString to double and add to literal table
        /*push literal onto the stack*/
        set_prg_release(prgReleaseIndex++,PUSHLITERAL+literalTableIndex-1);
        set_prg_debug(prgDebugIndex++,PUSHLITERAL+literalTableIndex-1);
        /************************************/
        /*check if equal*/
        set_prg_release(prgReleaseIndex++,TESTEQ);
        set_prg_debug(prgDebugIndex++,TESTEQ);
        /************************************/
        advance();
        match_check_advance(COLON);
        if (token==CASE) { //multiple labels
            jumpArray3ReleaseIndexLocal = jumpArray3ReleaseIndexGlobal++;
            jumpArray3DebugIndexLocal = jumpArray3DebugIndexGlobal++;

            while(token==CASE) {
                /*jump to jumpArray3[jumpArray3IndexLocal] (statement-list) if match*/
                set_prg_release(prgReleaseIndex++,TRUEJUMPRELEASE2+jumpArray3ReleaseIndexLo​cal);
                set_prg_debug(prgDebugIndex++,TRUEJUMPDEBUG2+jumpArray3DebugIndexLocal);
                /************************************/
                /*DUP*/
                set_prg_release(prgReleaseIndex++,DUP);
                set_prg_debug(prgDebugIndex++,DUP);
                /************************************/
                advance();
                match(NUM); if(get_compiler_error_flag()) return;
                save_literal(); //convert tokenString to double and add to literal table
                /*push literal onto the stack*/
                set_prg_release(prgReleaseIndex++,PUSHLITERAL+literalTableIndex-1);
                set_prg_debug(prgDebugIndex++,PUSHLITERAL+literalTableIndex-1);
                /************************************/
                /*check if equal*/
                set_prg_release(prgReleaseIndex++,TESTEQ);
                set_prg_debug(prgDebugIndex++,TESTEQ);
                /************************************/
                advance();
                match_check_advance(COLON);
            }

            /*jump to jumpArray2[jumpArray2IndexLocal] if no match*/
            jumpArray2ReleaseIndexLocal = jumpArray2ReleaseIndexGlobal++;
            jumpArray2DebugIndexLocal = jumpArray2DebugIndexGlobal++;
            set_prg_release(prgReleaseIndex++,FALSEJUMPRELEASE+jumpArray2ReleaseIndexLo​cal);
            set_prg_debug(prgDebugIndex++,FALSEJUMPDEBUG+jumpArray2DebugIndexLocal);
            /************************************/

            set_jump_array3_release(jumpArray3ReleaseIndexLocal,prgReleaseIndex-1); //jump to statement-list if match
            set_jump_array3_debug(jumpArray3DebugIndexLocal,prgDebugIndex-1); //jump to statement-list if match

            /*DROP*/
            set_prg_release(prgReleaseIndex++,DROP);
            set_prg_debug(prgDebugIndex++,DROP);
            /************************************/
            statement_list(); if(get_compiler_error_flag()) return;

            /*jump to jumpArray1[jumpArray1Index] to exit*/ //implied "break" statement
            set_prg_release(prgReleaseIndex++,JUMPRELEASE+breakStack[breakStackIndex-1].index);
            set_prg_debug(prgDebugIndex++,JUMPDEBUG+breakStack[breakStackIndex-1].index);
            /************************************/
        }
        else { //not multiple labels
            /*jump to jumpArray2[jumpArray2IndexLocal] if no match*/
            jumpArray2ReleaseIndexLocal = jumpArray2ReleaseIndexGlobal++;
            jumpArray2DebugIndexLocal = jumpArray2DebugIndexGlobal++;
            set_prg_release(prgReleaseIndex++,FALSEJUMPRELEASE+jumpArray2ReleaseIndexLo​cal);
            set_prg_debug(prgDebugIndex++,FALSEJUMPDEBUG+jumpArray2DebugIndexLocal);
            /************************************/
            /*DROP*/
            set_prg_release(prgReleaseIndex++,DROP);
            set_prg_debug(prgDebugIndex++,DROP);
            /************************************/
            statement_list(); if(get_compiler_error_flag()) return;

            /*jump to jumpArray1[jumpArray1Index] to exit*/ //implied "break" statement
            set_prg_release(prgReleaseIndex++,JUMPRELEASE+breakStack[breakStackIndex-1].index);
            set_prg_debug(prgDebugIndex++,JUMPDEBUG+breakStack[breakStackIndex-1].index);
            /************************************/
        }
    }

    if(token==DEFAULT) {
        advance();
        match_check_advance(COLON)

        set_jump_array2_release(jumpArray2ReleaseIndexLocal,prgReleaseIndex-1); //program jumps here if no match
        set_jump_array2_debug(jumpArray2DebugIndexLocal,prgDebugIndex-1);

        /*DROP*/
        set_prg_release(prgReleaseIndex++,DROP);
        set_prg_debug(prgDebugIndex++,DROP);
        /************************************/

        statement_list(); if(get_compiler_error_flag()) return;
    }
    else {
        set_jump_array2_release(jumpArray2ReleaseIndexLocal,prgReleaseIndex-1); //program jumps here if no match
        set_jump_array2_debug(jumpArray2DebugIndexLocal,prgDebugIndex-1);

        /*DROP*/
        set_prg_release(prgReleaseIndex++,DROP);
        set_prg_debug(prgDebugIndex++,DROP);
        /************************************/
    }

    match_check_advance(RBRACE)

    set_jump_array1_release(jumpArray1ReleaseIndexLocal,prgReleaseIndex-1); //program jumps here if match
    set_jump_array1_debug(jumpArray1DebugIndexLocal,prgDebugIndex-1);

    breakStackIndex--; //decrement breakStack
}

It may be a little clearer how my implementation of switch works if we look at the bytecode into which program "switch test 5" (in my previous post) is translated:

[Image: 52163303047_57a2a36c6a.jpg][Image: 52164327116_120b4769cf.jpg]

Now if the implied break code generation instructions in the parser are removed the bytecode program becomes:

[Image: 52163775757_0f6284da29.jpg][Image: 52164800231_9be5f7d850.jpg]

There are two problems with this. The first is that in my implementation the switch expression ("n" in this example) is left on the stack (hence the "DUP" and "DROP" instructions). Without an implied break the program ends up just deleting "n". This can be fixed by saving the switch expression to a variable and pushing it onto the stack before testing the cases, so that "DUP" and "DROP" are not required.

But the other problem remains: when there's a match with one of the first two cases (i.e. the user entered "1" or "2") the program displays "10" but it won't fall through and display "20" as well.

Back to the drawing board Smile

The C language combines all the power of assembly language with all the ease-of-use of assembly language
Find all posts by this user
Quote this message in a reply
06-25-2022, 07:12 AM (This post was last modified: 06-25-2022 09:12 AM by F-73P.)
Post: #12
RE: monic part 6: the mC programming language and compiler
The switch statement now works as expected when there is no break statement. I modified the branch instructions so that the code falls through when there has been a previous case match:

[Image: 52171265988_70668211fd.jpg][Image: 52170242747_899e126977.jpg]

I've also added support for arrays:

[Image: 52171742060_885b0243c4.jpg]

The C language combines all the power of assembly language with all the ease-of-use of assembly language
Find all posts by this user
Quote this message in a reply
06-25-2022, 02:04 PM
Post: #13
RE: monic part 6: the mC programming language and compiler
Looks like a great update to the project!

(06-22-2022 01:50 AM)F-73P Wrote:  Thanks Sylvain, you are right. Implementing switch is quite a challenge - I would like it to conform to standard C but I just can't figure it out. Below is the code for switch in the parser, there's already a lot going on!

One of the nice things about switch is the fall-through, so multiple cases can be combined that would otherwise break, like so:
case A: case B: case C:
  <code>

This would not work if a break is placed between case A and case B and between case B and case C

Furthermore, the default arm can be placed before cases, like so:
case A:
  <code>
  break;
default:
  <code>
  // fall-through
case B:
  <code>

The default arm will be taken when cases A and B do not match.

Why write code like this? Well, the fall-through from the default arm to case B is useful!

Some switcheroo, in case you've not seen it before Smile

- Rob

"I count on old friends to remain rational"
Visit this user's website Find all posts by this user
Quote this message in a reply
06-25-2022, 06:33 PM
Post: #14
RE: monic part 6: the mC programming language and compiler
In addition to the case/default structuring, because switch is a block, one can declare locals like so:
switch (<expr>)
{
  double x; // local and visible to all cases, this is legal in C
  case <A>:
    <code>
  ...
}

While this is valid C, there is no widespread use as far as I can tell and few know about this. Initializers of locals in this switch block are also just ignored. Also, declaring variables with initializers in case blocks require braces. On the other hand, the PC-G850 doesn't permit variable declarations anywhere except at the start of a function. Simplified C is still very workable!

- Rob

"I count on old friends to remain rational"
Visit this user's website Find all posts by this user
Quote this message in a reply
06-27-2022, 04:37 AM (This post was last modified: 07-05-2022 04:09 AM by F-73P.)
Post: #15
RE: monic part 6: the mC programming language and compiler
(06-25-2022 02:04 PM)robve Wrote:  Some switcheroo, in case you've not seen it before Smile
- Rob

Thanks Rob, I haven't. Something to think about for a future version.

BTW, I've managed to run the remarkable program you recently posted for computing the digits of pi:

Code:

1 unsigned long a=10000,d;
2 unsigned b,c,e,*f,g;
3 main(){
4  printf("digits?");scanf("%u",&c);
5  c=7*c/2;c-=c%14;f=malloc(4*c+4);
6  for(;b-c;)f[b++]=a/5;
7  for(;d=0,g=c*2;c-=14,printf("%.4lu",e+d/a),e=d%a)
8   for(b=c;d+=f[b]*a,f[b]=d%--g,d/=g--,--b;d*=b);
9 }

It is interesting to compare the lengths of the C code and the translated bytecode, 15 lines vs 101:

[Image: 52186665417_fda3557de8.jpg]

[Image: 52186651987_1545c79465_n.jpg][Image: 52186651897_a9d222d99b_n.jpg][Image: 52187920694_85da5f9d62_n.jpg]
[Image: 52187671356_07421091fa_n.jpg][Image: 52186651612_6b914ef413_n.jpg][Image: 52187920489_d5f02e7b32_n.jpg]

I'll now add some display functions to the language in order to display the digits as the program runs.

The C language combines all the power of assembly language with all the ease-of-use of assembly language
Find all posts by this user
Quote this message in a reply
07-05-2022, 04:46 AM (This post was last modified: 01-31-2023 01:03 AM by F-73P.)
Post: #16
RE: monic part 6: the mC programming language and compiler
I've added clear_disp() and print() functions to display the digits of pi as the program runs. Below are the first 1000 digits of pi. The first screen takes about 7 seconds to fill and the second screen about 2 seconds.

[Image: 52193812897_ca2f8d7e0a.jpg][Image: 52195080604_45a1740217.jpg]

The C language combines all the power of assembly language with all the ease-of-use of assembly language
Find all posts by this user
Quote this message in a reply
07-05-2022, 08:32 PM
Post: #17
RE: monic part 6: the mC programming language and compiler
(07-05-2022 04:46 AM)F-73P Wrote:  I've added clear_disp() and print() functions to display the digits of pi as the program runs. Below are the first 1000 digits of pi. The first screen takes about 7 seconds to fill and the second screen about 2 seconds.

[Image: 52193812897_ca2f8d7e0a.jpg][Image: 52195080604_45a1740217.jpg]

Oooh, that's nice Smile

"I count on old friends to remain rational"
Visit this user's website Find all posts by this user
Quote this message in a reply
02-01-2023, 11:44 PM
Post: #18
RE: monic part 6: the mC programming language and compiler
In order to bring the language closer to C, I've started writing a new compiler using a grammar based on the ANSI C grammar given in the second edition of The C Programming Language. Data types, declarations and definitions are now supported:

[Image: 52663175522_ea9d014989.jpg]

The C language combines all the power of assembly language with all the ease-of-use of assembly language
Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 1 Guest(s)