Lab06 - Parsing

Due: Tue, Apr 5, 2022 at 11:59 PM to Github Classroom Assignment

Language Parsing

  1. This lab asks you to expand on the given parser, adding code and data structures to fully implement the scanning and parsing of the language specified in the EBNF
  2. It is still a future topic to interpret the parse tree and do something useful with it. This lab covers only parser output
  3. Parsing refers to the process of identifying acceptable structures as defined by a language. For example, our scanner from Lab05 can accept the following:
     1 + 2
    

    This results in the following sequence of tokens:

     TK_INTLIT("1")
     TK_OP("+")
     TK_INTLIT("2")
     TK_EOT("")
    
  4. We now need to parse these tokens to see if they represent a valid expression. Parsing will:
    • Determine if a sequence of tokens is a valid program
    • Construct a data structure from the tokens that can later be used for interpretation or compilation
  5. The token sequence above can be represented as the following tree:

    parse-tree-1

  6. We call this a parse tree (or technically an abstract syntax tree or AST).
    • The nodes in the tree represent different parts of the structure of a valid program
    • In this case, the program consists of a single expression
    • This expression is an operator with two arguments (or operands). The expression operator is a PLUS (“+”) and the two operands are both intval
    • We can represent this tree structure as a string:
       EXPR OPER2 PLUS
       ..EXPR INTVAL 1
       ..EXPR INTVAL 2
      
  7. Consider the following program:
     (1 + 2) * 3
    

    The scan tokens are:

     TK_LPAREN("(")
     TK_INTLIT("1")
     TK_PLUS("+")
     TK_INTLIT("2")
     TK_RPAREN(")")
     TK_MULT("*")
     TK_INTLIT("3")
     TK_EOT("")
    

    The parse tree for that token sequence is:

    parse-tree-2

    The string representation of that parse tree is:

     EXPR OPER2 MULT
     ..EXPR OPER2 PLUS
     ....EXPR INTVAL 1
     ....EXPR INTVAL 2
     ..EXPR INTVAL 3
    
  8. Notice that in both the tree and the string output there are no parentheses. This is because the tree form implicitly represents grouping. We discard the parens themselves, but not what they represent in terms of program structure. Here is the EBNF for the scanner of our language:
     tokenlist   = (token)*
     token       = integer | symbol
     integer     = digit (digit)*
     symbol      = '+' | '-' | '*' | '/' | '(' | ')'
     digit       = '0' | ... | '9'
     # Ignore
     whitespace  =  (' ' | '\t') (' ' | '\t')*
    
  9. We can also represent the structure of valid programs with the EBNF:
     program     = expression EOT
     expression  = operand (operator operand)*
     operand     = integer | '-' operand | '(' expression ')'
    
  10. We will limit valid programs to expessions. For example, the following are valid expressions:
     1 + 2
     (1 + 2) * 3
     4 * (10 / 5)
     (2 * (3 + (1 + 1)))
     -4 + 3
     -4 + -4
    
  11. Just like with scanning, the EBNF specifies valid programs in terms of tokens
    • An expression can be an operand followed by an operator and another operand
    • An operand can be an integer, or a ‘-‘ then an operand, or another expression within parens

C Implementation

  1. We need a data structure in C that will represent the tree structures given above. Here is the struct we will use:
     enum parse_expr_enum {EX_INTVAL, EX_OPER1, EX_OPER2};
     enum parse_oper_enum {OP_PLUS, OP_MINUS, OP_MULT, OP_DIV};
     struct parse_node_st {    
         enum parse_expr_enum type;
         union {
             struct {
                 int value;
             } intval;
             struct {
                 enum parse_oper_enum oper;
                 struct parse_node_st *expr;
             } oper1;
             struct {
                 enum parse_oper_enum oper;
                 struct parse_node_st *left;
                 struct parse_node_st *right;
             } oper2;
         };
     };
    
  2. We will derive a recursive descent parser from the EBNF for expressions given above to construct a tree of parse_node_st structs. As with scanning, we will write a function for each production in the grammar.
  3. We will have two parse functions:
     struct parse_node_st * parse_operand(struct parse_table_st *pt, 
                                          struct scan_table_st *st);
     struct parse_node_st * parse_expression(struct parse_table_st *pt, 
                                             struct scan_table_st *st);
    

    Given a sequence of tokens in a scan_table_st, we will recursively construct a parse tree by allocating parse_node_st structs.

Requirements

  1. You will implement a parser, called lab06, generated by a Makefile
  2. Your parser will:
    • Take an input expression on the command line after the “-e” option
    • Scan the input expression string into tokens
    • Parse the tokens into a tree of parse_node_st
    • Walk the tree to generate indented output as shown
    • Your parser will to base conversion from TK_BINLIT and TK_HEXLIT into an int for intval operands
  3. autograder test cases will be provided

Example Output

./lab06 -e "1 + 2"
EXPR OPER2 PLUS
..EXPR INTVAL 1
..EXPR INTVAL 2

./lab06 -e "(1 + 2) * 3"
EXPR OPER2 MULT
..EXPR OPER2 PLUS
....EXPR INTVAL 1
....EXPR INTVAL 2
..EXPR INTVAL 3

./lab06 -e "4 * (10 / 5)"
EXPR OPER2 MULT
..EXPR INTVAL 4
..EXPR OPER2 DIV
....EXPR INTVAL 10
....EXPR INTVAL 5

./lab06 -e "(2 * (3 + (1 + 1)))"
EXPR OPER2 MULT
..EXPR INTVAL 2
..EXPR OPER2 PLUS
....EXPR INTVAL 3
....EXPR OPER2 PLUS
......EXPR INTVAL 1
......EXPR INTVAL 1

./lab06 -e "-4 + 3"
EXPR OPER2 PLUS
..EXPR OPER1 MINUS
....EXPR INTVAL 4
..EXPR INTVAL 3

./lab06 -e "-4 + -4"
EXPR OPER2 PLUS
..EXPR OPER1 MINUS
....EXPR INTVAL 4
..EXPR OPER1 MINUS
....EXPR INTVAL 4

./lab06 -e "1 + 0xa"
EXPR OPER2 PLUS
..EXPR INTVAL 1
..EXPR INTVAL 10

./lab06 -e "1 + 0xff"
EXPR OPER2 PLUS
..EXPR INTVAL 1
..EXPR INTVAL 255