Post Reply 
monic part 6: the mC programming language and compiler
12-12-2021, 06:57 PM
Post: #2
RE: monic part 6: the mC programming language and compiler
Nice work! I really like your project and I'm very much looking forward to learn more about it. Looks like you put a lot of effort into this to cover most of the C language.

I am curious how you plan to handle logical expressions as "short circuit" code or not. I noticed that RETURN, WRITE and a few other constructs allow 'expression' as an argument, but 'factor' for example uses 'additive_expression' instead of 'expression'. To implement RETURN and WRITE this way requires "short circuit logic" to return Boolean values without evaluating unnecessary AND and OR operands. This can be tricky to implement, but not too hard to do using backpatch lists to jump on "true" and "false" conditions to the right VM opcode targets.

I think I spotted a problem with the 'function_declaration' starting with an ID just like 'var_declaration', so there is no way of telling which is which with one token lookahead. Making the '(' and ')' optional in 'function_declaration' makes it even harder to distinguish from 'var_declaration' because you have to look ahead all the way to the first '(' to see if this is a function, not a variable declaration. Perhaps insist that functions should be declared with parenthesis?

Also 'expression' is defined to be either a 'Boolean_expression' or a 'additive_expression' but you have no way of telling which one applies by just looking at the lookahead token.

Why is SWITCH restricted to a 'var' argument and not an `additive_expression'?

I've converted the grammar to Yacc to give it a go with Bison and fixed 'function_declaration' to require parenthesis and MAIN with optional parameters for consistence and just in case if you'd like to pass arguments to the main program. However, this grammar has 47 shift/reduce conflicts which suggests that several (cascading) ambiguities exist:
Code:
%token ARRAY
%token ID
%token NUM
%token MAIN
%token LOCAL
%token RETURN
%token READ
%token WRITE
%token GOTO
%token NOPR
%token SWITCH
%token CASE
%token BREAK
%token DEFAULT
%token FOR
%token IF
%token ELSE
%token WHILE
%token DO
%token NEGATE
%token LE
%token GE
%token EQ
%token NE
%token AND
%token OR
%token NOT
%token INC       // ++
%token DEC       // --
%token TRUE
%token FALSE
%token ENDFILE
%%
program
: code ENDFILE
;
code
: var_declarations function_declarations main_function_declaration
;
var_declarations
: var_declaration var_declarations
|
;
var_declaration
: ARRAY ID dimensions ';'
| '*' ID ';'
| ID ';'
;
dimensions
: '[' NUM ']' dimensions
| '[' NUM ']'
;
function_declarations
: function_declaration function_declarations
|
;
function_declaration
: ID '(' params_opt ')' '{' local_declarations statement_list '}'
;
main_function_declaration
: MAIN '(' params_opt ')' '{' local_declarations statement_list '}'
;
params
: param ',' params
| param
;
params_opt
: params
|
;
param
: ID nums
| '*' ID
;
nums
: '[' NUM ']' nums
|
;
local_declarations
: LOCAL var_declaration local_declarations
|
;
statement_list
: statement statement_list
|
;
statement
: compound_stmt
| if_stmt
| while_stmt
| do_stmt
| for_stmt| switch_stmt
| return_stmt
| read_stmt
| write_stmt
| assignment_stmt
| expression_stmt
| goto_stmt
| label_stmt
| nop_stmt 
;
compound_stmt
: '{' statement_list '}'
;
if_stmt
: IF '(' Boolean_expression ')' statement
| IF '(' Boolean_expression ')' statement ELSE statement
;
while_stmt
: WHILE '(' Boolean_expression ')' statement
;
do_stmt
: DO statement WHILE '(' Boolean_expression ')' ';'
;
for_stmt
: FOR '(' assignment_opt ';' Boolean_expression ';' additive_expression_opt ')' statement 
;
switch_stmt
: SWITCH '(' var ')' '{' cases ';' default_opt '}' 
;
cases
: CASE NUM ':' statement_list BREAK ';' cases
|
;
default_opt
: DEFAULT ':' statement_list
|
;
var
: ID indices
| '*' ID 
;
indices
: '[' additive_expression ']' indices
|
;
return_stmt
: RETURN expression_opt ';' 
;
read_stmt
: READ var ';' 
;
write_stmt
: WRITE expression ';' 
;
assignment_stmt
: assignment ';'
;
expression_stmt
: expression_opt ';'
;
goto_stmt
: GOTO ID ';'
;
label_stmt
: ID ':' statement
;
nop_stmt
: NOPR ';'
;
assignment
: var '=' expression 
;
assignment_opt
: assignment
|
;
expression
: Boolean_expression
| additive_expression
;
expression_opt
: expression
|
;
Boolean_expression
: logical_expression boolop Boolean_expression
| logical_expression
;
logical_expression
: additive_expression relop additive_expression
| var
| TRUE
| FALSE 
;
additive_expression
: term addop additive_expression
| term
;
additive_expression_opt
: additive_expression
|
;
term
: neg_opt factor mulop term
| neg_opt factor
;
neg_opt
: NEGATE
|
;
factor
: '(' additive_expression ')'
| NUM
| var
| ID '(' args_opt ')'
| var INC
| var DEC
| '&' ID indices
;
args
: expression ',' args
| expression
;
args_opt
: args
|
;
boolop
: AND
| OR
| NOT
;
relop
: LE
| '<'
| '>'
| GE
| EQ
| NE
;
addop
: '+'
| '-'
;
mulop
: '*'
| '/'
;
%%
Run bison -v grammar.y to produce a grammar.output report.

I didn't have time to verify the conflicts to see what constructs might actually become problematic to implement as a parser based on these two conceptually identical grammar versions, but the grammar is not LR(1) and LALR. Note that there should only be one shift/reduce conflict for the IF-ELSE ambiguity that always arrises in C. I haven't checked the grammar with ANTLR, which may reveal more useful information about potential grammar ambiguity issues to implement a recursive-descent parser, since ANTLR is a LL(1) generator. ANTLR also accepts grammars beyond LL(1).

Hope this is helpful. Looks like a fun and useful project to work on.

- Rob

"I count on old friends" -- HP 71B,Prime|Ti VOY200,Nspire CXII CAS|Casio fx-CG50...|Sharp PC-G850,E500,2500,1500,14xx,13xx,12xx...
Visit this user's website Find all posts by this user
Quote this message in a reply
Post Reply 


Messages In This Thread
RE: monic part 6: the mC programming language and compiler - robve - 12-12-2021 06:57 PM



User(s) browsing this thread: 1 Guest(s)