Lab06 - Parsing
Due:
Tue, Apr 5, 2022 at 11:59 PM
to Github Classroom Assignment
Language Parsing
- This lab asks you to expand on the given parser, adding code and data structures to fully implement the scanning and parsing of the language specified in the EBNF
- It is still a future topic to interpret the parse tree and do something useful with it. This lab covers only parser output
- Parsing refers to the process of identifying acceptable structures as defined by a language. For example, our scanner from Lab05 can accept the following:
1 + 2
This results in the following sequence of tokens:
TK_INTLIT("1") TK_OP("+") TK_INTLIT("2") TK_EOT("")
- We now need to parse these tokens to see if they represent a valid expression. Parsing will:
- Determine if a sequence of tokens is a valid program
- Construct a data structure from the tokens that can later be used for interpretation or compilation
-
The token sequence above can be represented as the following tree:
- We call this a parse tree (or technically an abstract syntax tree or AST).
- The nodes in the tree represent different parts of the structure of a valid program
- In this case, the program consists of a single expression
- This expression is an operator with two arguments (or operands). The expression operator is a PLUS (“+”) and the two operands are both
intval
- We can represent this tree structure as a string:
EXPR OPER2 PLUS ..EXPR INTVAL 1 ..EXPR INTVAL 2
- Consider the following program:
(1 + 2) * 3
The scan tokens are:
TK_LPAREN("(") TK_INTLIT("1") TK_PLUS("+") TK_INTLIT("2") TK_RPAREN(")") TK_MULT("*") TK_INTLIT("3") TK_EOT("")
The parse tree for that token sequence is:
The string representation of that parse tree is:
EXPR OPER2 MULT ..EXPR OPER2 PLUS ....EXPR INTVAL 1 ....EXPR INTVAL 2 ..EXPR INTVAL 3
- Notice that in both the tree and the string output there are no parentheses. This is because the tree form implicitly represents grouping. We discard the parens themselves, but not what they represent in terms of program structure.
Here is the EBNF for the scanner of our language:
tokenlist = (token)* token = integer | symbol integer = digit (digit)* symbol = '+' | '-' | '*' | '/' | '(' | ')' digit = '0' | ... | '9' # Ignore whitespace = (' ' | '\t') (' ' | '\t')*
- We can also represent the structure of valid programs with the EBNF:
program = expression EOT expression = operand (operator operand)* operand = integer | '-' operand | '(' expression ')'
- We will limit valid programs to expessions. For example, the following are valid expressions:
1 + 2 (1 + 2) * 3 4 * (10 / 5) (2 * (3 + (1 + 1))) -4 + 3 -4 + -4
- Just like with scanning, the EBNF specifies valid programs in terms of tokens
- An expression can be an operand followed by an operator and another operand
- An operand can be an integer, or a ‘-‘ then an operand, or another expression within parens
C Implementation
- We need a data structure in C that will represent the tree structures given above. Here is the
struct
we will use:enum parse_expr_enum {EX_INTVAL, EX_OPER1, EX_OPER2}; enum parse_oper_enum {OP_PLUS, OP_MINUS, OP_MULT, OP_DIV}; struct parse_node_st { enum parse_expr_enum type; union { struct { int value; } intval; struct { enum parse_oper_enum oper; struct parse_node_st *expr; } oper1; struct { enum parse_oper_enum oper; struct parse_node_st *left; struct parse_node_st *right; } oper2; }; };
- We will derive a recursive descent parser from the EBNF for expressions given above to construct a tree of
parse_node_st
structs. As with scanning, we will write a function for each production in the grammar. - We will have two parse functions:
struct parse_node_st * parse_operand(struct parse_table_st *pt, struct scan_table_st *st); struct parse_node_st * parse_expression(struct parse_table_st *pt, struct scan_table_st *st);
Given a sequence of tokens in a
scan_table_st
, we will recursively construct a parse tree by allocatingparse_node_st
structs.
Requirements
- You will implement a parser, called
lab06
, generated by aMakefile
- Your parser will:
- Take an input expression on the command line after the “-e” option
- Scan the input expression string into tokens
- Parse the tokens into a tree of
parse_node_st
- Walk the tree to generate indented output as shown
- Your parser will to base conversion from
TK_BINLIT
andTK_HEXLIT
into anint
forintval
operands
autograder
test cases will be provided
Example Output
./lab06 -e "1 + 2"
EXPR OPER2 PLUS
..EXPR INTVAL 1
..EXPR INTVAL 2
./lab06 -e "(1 + 2) * 3"
EXPR OPER2 MULT
..EXPR OPER2 PLUS
....EXPR INTVAL 1
....EXPR INTVAL 2
..EXPR INTVAL 3
./lab06 -e "4 * (10 / 5)"
EXPR OPER2 MULT
..EXPR INTVAL 4
..EXPR OPER2 DIV
....EXPR INTVAL 10
....EXPR INTVAL 5
./lab06 -e "(2 * (3 + (1 + 1)))"
EXPR OPER2 MULT
..EXPR INTVAL 2
..EXPR OPER2 PLUS
....EXPR INTVAL 3
....EXPR OPER2 PLUS
......EXPR INTVAL 1
......EXPR INTVAL 1
./lab06 -e "-4 + 3"
EXPR OPER2 PLUS
..EXPR OPER1 MINUS
....EXPR INTVAL 4
..EXPR INTVAL 3
./lab06 -e "-4 + -4"
EXPR OPER2 PLUS
..EXPR OPER1 MINUS
....EXPR INTVAL 4
..EXPR OPER1 MINUS
....EXPR INTVAL 4
./lab06 -e "1 + 0xa"
EXPR OPER2 PLUS
..EXPR INTVAL 1
..EXPR INTVAL 10
./lab06 -e "1 + 0xff"
EXPR OPER2 PLUS
..EXPR INTVAL 1
..EXPR INTVAL 255