monic part 6: the mC programming language and compiler
12-11-2021, 12:29 AM (This post was last modified: 03-18-2022 05:20 AM by F-73P.)
Post: #1
 F-73P Junior Member Posts: 29 Joined: May 2020
monic part 6: the mC programming language and compiler
Previous: monic part 5: writing C programs on the calculator

In the previous thread I introduced the mC ("monic C") high-level programming language for the monic calculator. In this thread I'll document the development of a compiler that translates the language into bytecode for execution by a virtual machine.

mC does not conform to any C standard. Features will include:
• recursive functions
• local variables
• multi-dimensional arrays and pointers
Below is the EBNF grammar (v1.0) for mC (see the previous thread for information on grammars and EBNF notation):

program $$\rightarrow$$ code ENDFILE

code $$\rightarrow$$ { var-declaration } { function-declaration } main-function-declaration

var-declaration $$\rightarrow$$ ARRAY ID dimensions SEMICOLON | [ "*" ] ID SEMICOLON

dimensions $$\rightarrow$$ LEFTBRACKET NUM RIGHTBRACKET { LEFTBRACKET NUM RIGHTBRACKET }

function-declaration $$\rightarrow$$ ID [ LEFTPARENTHESIS params RIGHTPARENTHESIS ] LEFTBRACE local-declarations statement-list RIGHTBRACE

main-function-declaration $$\rightarrow$$ MAIN LEFTBRACE local-declarations statement-list RIGHTBRACE

params $$\rightarrow$$ [ param { COMMA param } ]

param $$\rightarrow$$ ID { LEFTBRACKET NUM RIGHTBRACKET } | "*" ID

local-declarations $$\rightarrow$$ { LOCAL var-declaration }

statement-list $$\rightarrow$$ { statement }

statement $$\rightarrow$$ compound-stmt | if-stmt | while-stmt | do-stmt | for-stmt| switch-stmt | return-stmt | read-stmt | write-stmt | assignment-stmt | expression-stmt | goto-stmt | label-stmt | nop-stmt

compound-stmt$$\rightarrow$$ LEFTBRACE statement-list RIGHTBRACE

if-stmt $$\rightarrow$$ IF LEFTPARENTHESIS Boolean-expression RIGHTPARENTHESIS statement [ ELSE statement ]

while-stmt $$\rightarrow$$ WHILE LEFTPARENTHESIS Boolean-expression RIGHTPARENTHESIS statement

do-stmt $$\rightarrow$$ DO statement WHILE LEFTPARENTHESIS Boolean-expression RIGHTPARENTHESIS SEMICOLON

for-stmt $$\rightarrow$$ FOR LEFTPARENTHESIS [ assignment ] SEMICOLON Boolean-expression SEMICOLON [ addditive-expression ] RIGHTPARENTHESIS statement

switch-stmt $$\rightarrow$$ SWITCH LEFTPARENTHESIS var RIGHTPARENTHESIS LEFTBRACE { CASE NUM COLON statement-list BREAK SEMICOLON } [ DEFAULT COLON statement-list ] RIGHTBRACE

var $$\rightarrow$$ ID { LEFTBRACKET additive-expression RIGHTBRACKET } | "*" ID

return-stmt $$\rightarrow$$ RETURN [ expression ] SEMICOLON

read-stmt $$\rightarrow$$ READ var SEMICOLON

write-stmt $$\rightarrow$$ WRITE expression SEMICOLON

assignment-stmt $$\rightarrow$$ assignment SEMICOLON

expression-stmt $$\rightarrow$$ [ expression ] SEMICOLON

goto-stmt $$\rightarrow$$ GOTO ID SEMICOLON

label-stmt $$\rightarrow$$ ID COLON statement

nop-stmt $$\rightarrow$$ NOPR SEMICOLON

assignment $$\rightarrow$$ var "=" expression

expression $$\rightarrow$$ Boolean-expression | additive-expression

Boolean-expression $$\rightarrow$$ logical-expression { boolop logical-expression }

logical-expression $$\rightarrow$$ additive-expression relop additive-expression | var | TRUE | FALSE

additive-expression $$\rightarrow$$ term { addop term }

term $$\rightarrow$$ [ NEGATE ] factor { mulop [ NEGATE ] factor }

factor $$\rightarrow$$ LEFTPARENTHESIS additive-expression RIGHTPARENTHESIS | NUM | var | ID LEFTPARENTHESIS args RIGHTPARENTHESIS | var "++" | var "--" | "&" ID { LEFTBRACKET additive-expression RIGHTBRACKET }

args $$\rightarrow$$ [ expression { COMMA expression } ]

boolop $$\rightarrow$$ "and" | "or" | "not"

relop $$\rightarrow$$ "<=" | "<" | ">" | ">=" | "==" | "!="

addop $$\rightarrow$$ "+" | "-"

mulop $$\rightarrow$$ "*" | "/"
12-12-2021, 06:57 PM
Post: #2
 robve Member Posts: 289 Joined: Sep 2020
RE: monic part 6: the mC programming language and compiler
Nice work! I really like your project and I'm very much looking forward to learn more about it. Looks like you put a lot of effort into this to cover most of the C language.

I am curious how you plan to handle logical expressions as "short circuit" code or not. I noticed that RETURN, WRITE and a few other constructs allow 'expression' as an argument, but 'factor' for example uses 'additive_expression' instead of 'expression'. To implement RETURN and WRITE this way requires "short circuit logic" to return Boolean values without evaluating unnecessary AND and OR operands. This can be tricky to implement, but not too hard to do using backpatch lists to jump on "true" and "false" conditions to the right VM opcode targets.

I think I spotted a problem with the 'function_declaration' starting with an ID just like 'var_declaration', so there is no way of telling which is which with one token lookahead. Making the '(' and ')' optional in 'function_declaration' makes it even harder to distinguish from 'var_declaration' because you have to look ahead all the way to the first '(' to see if this is a function, not a variable declaration. Perhaps insist that functions should be declared with parenthesis?

Also 'expression' is defined to be either a 'Boolean_expression' or a 'additive_expression' but you have no way of telling which one applies by just looking at the lookahead token.

Why is SWITCH restricted to a 'var' argument and not an additive_expression'?

I've converted the grammar to Yacc to give it a go with Bison and fixed 'function_declaration' to require parenthesis and MAIN with optional parameters for consistence and just in case if you'd like to pass arguments to the main program. However, this grammar has 47 shift/reduce conflicts which suggests that several (cascading) ambiguities exist:
Code:
%token ARRAY
%token ID
%token NUM
%token MAIN
%token LOCAL
%token RETURN
%token WRITE
%token GOTO
%token NOPR
%token SWITCH
%token CASE
%token BREAK
%token DEFAULT
%token FOR
%token IF
%token ELSE
%token WHILE
%token DO
%token NEGATE
%token LE
%token GE
%token EQ
%token NE
%token AND
%token OR
%token NOT
%token INC       // ++
%token DEC       // --
%token TRUE
%token FALSE
%token ENDFILE
%%
program
: code ENDFILE
;
code
: var_declarations function_declarations main_function_declaration
;
var_declarations
: var_declaration var_declarations
|
;
var_declaration
: ARRAY ID dimensions ';'
| '*' ID ';'
| ID ';'
;
dimensions
: '[' NUM ']' dimensions
| '[' NUM ']'
;
function_declarations
: function_declaration function_declarations
|
;
function_declaration
: ID '(' params_opt ')' '{' local_declarations statement_list '}'
;
main_function_declaration
: MAIN '(' params_opt ')' '{' local_declarations statement_list '}'
;
params
: param ',' params
| param
;
params_opt
: params
|
;
param
: ID nums
| '*' ID
;
nums
: '[' NUM ']' nums
|
;
local_declarations
: LOCAL var_declaration local_declarations
|
;
statement_list
: statement statement_list
|
;
statement
: compound_stmt
| if_stmt
| while_stmt
| do_stmt
| for_stmt| switch_stmt
| return_stmt
| write_stmt
| assignment_stmt
| expression_stmt
| goto_stmt
| label_stmt
| nop_stmt
;
compound_stmt
: '{' statement_list '}'
;
if_stmt
: IF '(' Boolean_expression ')' statement
| IF '(' Boolean_expression ')' statement ELSE statement
;
while_stmt
: WHILE '(' Boolean_expression ')' statement
;
do_stmt
: DO statement WHILE '(' Boolean_expression ')' ';'
;
for_stmt
: FOR '(' assignment_opt ';' Boolean_expression ';' additive_expression_opt ')' statement
;
switch_stmt
: SWITCH '(' var ')' '{' cases ';' default_opt '}'
;
cases
: CASE NUM ':' statement_list BREAK ';' cases
|
;
default_opt
: DEFAULT ':' statement_list
|
;
var
: ID indices
| '*' ID
;
indices
|
;
return_stmt
: RETURN expression_opt ';'
;
;
write_stmt
: WRITE expression ';'
;
assignment_stmt
: assignment ';'
;
expression_stmt
: expression_opt ';'
;
goto_stmt
: GOTO ID ';'
;
label_stmt
: ID ':' statement
;
nop_stmt
: NOPR ';'
;
assignment
: var '=' expression
;
assignment_opt
: assignment
|
;
expression
: Boolean_expression
;
expression_opt
: expression
|
;
Boolean_expression
: logical_expression boolop Boolean_expression
| logical_expression
;
logical_expression
| var
| TRUE
| FALSE
;
| term
;
|
;
term
: neg_opt factor mulop term
| neg_opt factor
;
neg_opt
: NEGATE
|
;
factor
| NUM
| var
| ID '(' args_opt ')'
| var INC
| var DEC
| '&' ID indices
;
args
: expression ',' args
| expression
;
args_opt
: args
|
;
boolop
: AND
| OR
| NOT
;
relop
: LE
| '<'
| '>'
| GE
| EQ
| NE
;
: '+'
| '-'
;
mulop
: '*'
| '/'
;
%%
Run bison -v grammar.y to produce a grammar.output report.

I didn't have time to verify the conflicts to see what constructs might actually become problematic to implement as a parser based on these two conceptually identical grammar versions, but the grammar is not LR(1) and LALR. Note that there should only be one shift/reduce conflict for the IF-ELSE ambiguity that always arrises in C. I haven't checked the grammar with ANTLR, which may reveal more useful information about potential grammar ambiguity issues to implement a recursive-descent parser, since ANTLR is a LL(1) generator. ANTLR also accepts grammars beyond LL(1).

Hope this is helpful. Looks like a fun and useful project to work on.

- Rob

"I count on old friends" -- HP 71B,Prime|Ti VOY200,Nspire CXII CAS|Casio fx-CG50...|Sharp PC-G850,E500,2500,1500,14xx,13xx,12xx...
12-13-2021, 03:39 AM (This post was last modified: 12-17-2021 04:11 AM by F-73P.)
Post: #3
 F-73P Junior Member Posts: 29 Joined: May 2020
RE: monic part 6: the mC programming language and compiler
(12-12-2021 06:57 PM)robve Wrote:  Nice work! I really like your project and I'm very much looking forward to learn more about it. Looks like you put a lot of effort into this to cover most of the C language.

Thanks, I've started using a development board with a 4.3" display and i.MX RT1062 processor (Cortex-M7, 600MHz, 4MB FLASH, 32MB RAM) and am rewriting the firmware, so I thought this would be a good time to document the project. Here's a screen shot of a test program:

Some aspects have been very challenging but I'm learning a lot and it has been fun too. I would like to add support for a programming language similar to FORTRAN as well:

(12-12-2021 06:57 PM)robve Wrote:  Why is SWITCH restricted to a 'var' argument and not an additive_expression'?
Thanks, I'll fix that. I've only ever used a variable argument with SWITCH and just didn't think of it. Great to have a fresh pair of eyes go over the grammar

(12-12-2021 06:57 PM)robve Wrote:  I am curious how you plan to handle logical expressions as "short circuit" code or not. I noticed that RETURN, WRITE and a few other constructs allow 'expression' as an argument, but 'factor' for example uses 'additive_expression' instead of 'expression'. To implement RETURN and WRITE this way requires "short circuit logic" to return Boolean values without evaluating unnecessary AND and OR operands. This can be tricky to implement, but not too hard to do using backpatch lists to jump on "true" and "false" conditions to the right VM opcode targets.
To be honest I never heard about "short circuit" code until you mentioned it . I thought to just evaluate the entire Boolean expression, but I now see programs can run faster with "short circuit" code. Something to think about!

Thanks for pointing out inconsistencies in the use of 'expression' and 'additive-expression'. I'll amend the grammar and post it soon.

(12-12-2021 06:57 PM)robve Wrote:  I think I spotted a problem with the 'function_declaration' starting with an ID just like 'var_declaration', so there is no way of telling which is which with one token lookahead. Making the '(' and ')' optional in 'function_declaration' makes it even harder to distinguish from 'var_declaration' because you have to look ahead all the way to the first '(' to see if this is a function, not a variable declaration. Perhaps insist that functions should be declared with parenthesis?
Yes you are right. I wanted to make parentheses optional for functions without parameters to reduce keystrokes but your suggestion is better, and I'll employ an extra token lookahead to determine if it's a function or variable declaration.

(12-12-2021 06:57 PM)robve Wrote:  Also 'expression' is defined to be either a 'Boolean_expression' or a 'additive_expression' but you have no way of telling which one applies by just looking at the lookahead token.
Well spotted, I'll post an amended grammar soon.

(12-12-2021 06:57 PM)robve Wrote:  I've converted the grammar to Yacc to give it a go with Bison and fixed 'function_declaration' to require parenthesis and MAIN with optional parameters for consistence and just in case if you'd like to pass arguments to the main program
Great idea, I thought to only use "read" to bring in values from the stack. Your way allows one to write e.g. "main(a,b)" and save two read statements.

(12-12-2021 06:57 PM)robve Wrote:  Hope this is helpful. Looks like a fun and useful project to work on.
Thanks Rob, this is very helpful. I really appreciate you taking the time to check the grammar and provide valuable feedback, which I'll use to improve the project.
12-16-2021, 10:14 AM (This post was last modified: 03-17-2022 01:20 AM by F-73P.)
Post: #4
 F-73P Junior Member Posts: 29 Joined: May 2020
RE: monic part 6: the mC programming language and compiler
Below is v1.1 of the EBNF grammar, which hopefully solves the problems Rob found in the previous version:

program $$\rightarrow$$ code ENDFILE

code $$\rightarrow$$ { var-declaration } { function-declaration } main-function-declaration

var-declaration $$\rightarrow$$ ARRAY ID dimensions SEMICOLON | [ "*" ] ID SEMICOLON

dimensions $$\rightarrow$$ LEFTBRACKET NUM RIGHTBRACKET { LEFTBRACKET NUM RIGHTBRACKET }

function-declaration $$\rightarrow$$ ID LEFTPARENTHESIS params RIGHTPARENTHESIS LEFTBRACE local-declarations statement-list RIGHTBRACE

main-function-declaration $$\rightarrow$$ MAIN LEFTPARENTHESIS params RIGHTPARENTHESIS LEFTBRACE local-declarations statement-list RIGHTBRACE

params $$\rightarrow$$ [ param { COMMA param } ]

param $$\rightarrow$$ ID { LEFTBRACKET NUM RIGHTBRACKET } | "*" ID

local-declarations $$\rightarrow$$ { LOCAL var-declaration }

statement-list $$\rightarrow$$ { statement }

statement $$\rightarrow$$ compound-stmt | if-stmt | while-stmt | do-stmt | for-stmt| switch-stmt | return-stmt | read-stmt | write-stmt | assignment-stmt | expression-stmt | goto-stmt | label-stmt | nop-stmt

compound-stmt$$\rightarrow$$ LEFTBRACE statement-list RIGHTBRACE

if-stmt $$\rightarrow$$ IF LEFTPARENTHESIS expression RIGHTPARENTHESIS statement [ ELSE statement ]

while-stmt $$\rightarrow$$ WHILE LEFTPARENTHESIS expression RIGHTPARENTHESIS statement

do-stmt $$\rightarrow$$ DO statement WHILE LEFTPARENTHESIS expression RIGHTPARENTHESIS SEMICOLON

for-stmt $$\rightarrow$$ FOR LEFTPARENTHESIS [ assignment ] SEMICOLON [ expression ] SEMICOLON [ expression ] RIGHTPARENTHESIS statement

switch-stmt $$\rightarrow$$ SWITCH LEFTPARENTHESIS expression RIGHTPARENTHESIS LEFTBRACE { CASE NUM COLON statement-list BREAK SEMICOLON } [ DEFAULT COLON statement-list ] RIGHTBRACE

var $$\rightarrow$$ ID { LEFTBRACKET expression RIGHTBRACKET } | "*" ID

return-stmt $$\rightarrow$$ RETURN [ expression ] SEMICOLON

read-stmt $$\rightarrow$$ READ var SEMICOLON

write-stmt $$\rightarrow$$ WRITE expression SEMICOLON

assignment-stmt $$\rightarrow$$ assignment SEMICOLON

expression-stmt $$\rightarrow$$ [ expression ] SEMICOLON

goto-stmt $$\rightarrow$$ GOTO ID SEMICOLON

label-stmt $$\rightarrow$$ ID COLON statement-list

nop-stmt $$\rightarrow$$ NOPR SEMICOLON

assignment $$\rightarrow$$ var "=" expression

expression $$\rightarrow$$ simple-expression { boolop simple-expression }

simple-expression $$\rightarrow$$ additive-expression [ relop additive-expression ]

additive-expression $$\rightarrow$$ term { addop term }

term $$\rightarrow$$ [ NEGATE ] factor { mulop [ NEGATE ] factor }

factor $$\rightarrow$$ LEFTPARENTHESIS expression RIGHTPARENTHESIS | NUM | var | var "++" | var "--" | "&" ID { LEFTBRACKET expression RIGHTBRACKET } | "true" | "false" | ID LEFTPARENTHESIS args RIGHTPARENTHESIS

args $$\rightarrow$$ [ expression { COMMA expression } ]

boolop $$\rightarrow$$ "and" | "or" | "not"

relop $$\rightarrow$$ "<=" | "<" | ">" | ">=" | "==" | "!="

addop $$\rightarrow$$ "+" | "-"

mulop $$\rightarrow$$ "*" | "/"

I've written a parser recogniser for this grammar. Below are some screen shots. The source code is displayed in green after "compile" is pressed if no syntax errors are found:

otherwise the offending token is printed in red and an error message displayed at the bottom of the screen:

01-16-2022, 10:01 PM
Post: #5
 robve Member Posts: 289 Joined: Sep 2020
RE: monic part 6: the mC programming language and compiler
(12-13-2021 03:39 AM)F-73P Wrote:  I've started using a development board with a 4.3" display and i.MX RT1062 processor (Cortex-M7, 600MHz, 4MB FLASH, 32MB RAM) and am rewriting the firmware, so I thought this would be a good time to document the project. Here's a screen shot of a test program:

Some aspects have been very challenging but I'm learning a lot and it has been fun too. I would like to add support for a programming language similar to FORTRAN as well:

Looks like this is becoming a really nice small but capable system! I wonder how you handle keyboard input? With QWERTY keys or with a menu system to enter commands? Or both? I prefer QWERTY like the Sharp and Casio pocket computers, the HP 71B and Ti Voyage 200, especially the 200 because its keyboard and menu system helps to write programs quickly without getting bogged down by awkward keypress combinations to write code.

To implement interpreters from scratch, Crafting Interpreters is an indispensable resource. Personally, I'm more on the compiler-compiler side of the programming language implementation spectrum, but Crafting Compilers makes for a good reading to learn the trade without requiring compiler-compiler tools or a deep understanding of the theoretical foundations of these.

- Rob

"I count on old friends" -- HP 71B,Prime|Ti VOY200,Nspire CXII CAS|Casio fx-CG50...|Sharp PC-G850,E500,2500,1500,14xx,13xx,12xx...
01-21-2022, 03:49 AM
Post: #6
 F-73P Junior Member Posts: 29 Joined: May 2020
RE: monic part 6: the mC programming language and compiler
(01-16-2022 10:01 PM)robve Wrote:  I wonder how you handle keyboard input? With QWERTY keys or with a menu system to enter commands? Or both?

Both, but without a QWERTY keypad:

I'll try to incorporate a QWERTY keypad on the next prototype:

(01-16-2022 10:01 PM)robve Wrote:  To implement interpreters from scratch, Crafting Interpreters is an indispensable resource.

Thanks, I'll check it out!
01-28-2022, 10:20 AM
Post: #7
 F-73P Junior Member Posts: 29 Joined: May 2020
RE: monic part 6: the mC programming language and compiler
I've written a debugger that supports single-stepping through the source code. The parser creates two bytecode programs - one for execution ("release") and the other for debugging ("debug"). The debug bytecode program contains instructions that halt execution and enable debugging.

As an example, suppose one wishes to step through the program to calculate 5!:

The debugger is comprised of the following screens:

1. Source code with arrow pointing at the line to be executed next:

2. Stack with the line to be executed next at the top:

3. Global variables with the line to be executed next at the top. The image below gives the values of the variables after stepping through the while loop once:

4. Parameters with the line to be executed next at the top (not yet active).
5. Local variables with the line to be executed next at the top (not yet active).

6. "Release" bytecode with arrow pointing at the line to be executed next:

7. "Debug" bytecode with arrow pointing at the line to be executed next:

Pressing "step" executes a line of the C source code and the user can see which corresponding bytecode instructions are executed in the bytecode debugging screens.

After pressing "step" once more at the end of the program we return to the home screen. The result is on stack level 1:

03-18-2022, 04:59 AM (This post was last modified: 03-27-2022 02:02 AM by F-73P.)
Post: #8
 F-73P Junior Member Posts: 29 Joined: May 2020
RE: monic part 6: the mC programming language and compiler
I've completed the first stage of the compiler for the grammar in post #4. It's a stripped down version of C with support for:
• global variables only (no arrays)
• subroutines (i.e. functions without arguments)
• if-then-else, while, do-while and for statements
If anyone would like to try it:

2) Install Keil uVision5 (MDK-ARM)
3) Create a new project "Project -> New uVision Project"
4) Select "ARM Cortex M7 -> ARMCM7", press "OK" twice
5) Click "Target 1", right-click "Source Group 1 -> Add Existing Files to Group 'Source Group 1'..."

Select one of the example programs in "moniC_programs.h" or enter your own (see the example below). Then on lines 31 and 32 in "simulator.c" enter the values to be pushed onto the stack. Place a breakpoint at line 89, where you can step through the parser and virtual machine. Place another breakpoint at line 97, where you can pop the returned value off the stack to confirm the program executed correctly.

e.g. how to enter a program to find the zero of x-e^(-0.5x^2) in [a,b] with an error tolerance tol using the bisection method:

1) Type the program in Notepad++

Code:
/*bisection method*/

func()
{ return(x-exp(-0.5*x*x));
}

main()
x=a;
fa=func();
x=b;
fb=func();
m=(a+b)/2;
while(b-m>=tol)
{ x=m;
fm=func();
if(fm*fb<0) a=m;
else
{ b=m;
fb=fm;
}
m=(a+b)/2;
}
write m;
}

2) Convert the program to a single line by highlighting it and going to Edit->Line Operations->Join Lines

3) Copy the joined lines and paste them in the "char testSource[]" array in "moniC_programs.h".

Because the free version of uVision5 is code limited to 32kB, I was only able to include the transcendental functions ln, exp, sin, cos and tan from the C math library (taking the size to 29.5kB). But the actual calculator has plenty of memory so this is not an issue.

Next: monic part 7: a little CAS
 « Next Oldest | Next Newest »

User(s) browsing this thread: 1 Guest(s)