If you are struggling with using Rust (e.g. using an editor that doesn’t highlight it, getting weird errors when you try to compile, you’re totally lost, etc.) PLEASE get in touch now. Stop giving yourself a harder time than necessary.

In this project, you’ll make almost all of the Truss parser as a continuation from project 1.

You do not have to have completed project 1 successfully to do this project.

Again, I’ve written a bunch of the support code for the parser (AST type definitions, the Parser object itself, helper methods, tests) and you’ll finish it by writing all the interesting parsing methods.

Quick start guide:

  1. You should have received a GitHub invitation link in your email. Click it!
    • Now you should have access to your repo.
  2. git clone it to your computer.
  3. Edit the parser source as detailed below so that cargo test parse succeeds on all 22 tests.
  4. As you work, make use of git commits to keep track of your progress.

    You can also use git push origin main at any time to ensure that there’s a safe copy of your project on the GitHub servers, or to transfer your code between computers (by pushing it from one and pulling it on the other).

  5. When you’re done (or when you’re not done but it’s due):
    1. Update the README.txt to include your name and username and any information you think will be useful to the grader.
    2. Also update this line in Cargo.toml:
      • authors = ["YOUR NAME <YOUR PITT EMAIL>"]
    3. Commit those changes, and…
    4. git push origin main to submit it.
      • the last commit before the late due date is what we will grade.

What you’ve been given

A tour of the source:


Running the tests

Like the lexer project, your goal is to get the parser tests working.

If you just cargo test, it will run all tests in the whole compiler including the lexer tests from project 1!

Instead, you can run a subset of the tests:

Reading the tests and their errors

The tests work similarly to how they did in project 1, but I split them into two categories:

Be sure to actually look at the source of the tests in src/parse/tests.rs. Keep it open while you work, so you can compare the test failures to what the test actually says.

Just like project 1, you can run a single test by naming it on the command line:

$ cargo test compound_exps

It produces this output:

---- parse::tests::compound_exps stdout ----
thread 'parse::tests::compound_exps' panicked at '
on this input:
0:  Id("x")      <-- you gave an error here: BadExpression(Id("x"))
1:  Dot
2:  Id("y")
3:  Plus
4:  Id("z")

but it should have given:
(x.y + z)
', src/parse/tests.rs:970:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

It shows the input tokens one per line, with the token index on the left. That’s what 0: Id("x") means: token 0 is an identifier x.

It also shows that your code gave an error at token index 0, a BadExpression(Id("x")) error.

Then it shows what your code should have given:

but it should have given:
(x.y + z)

This is an AST! It’s just being printed out in a human-readable form. If you actually look at the failing test on line 270 of src/parse/tests.rs, you’ll see this:

    // x.y + z
    // if this is failing because your postfix parsing is expecting Eof, well, it
    // shouldn't be expecting Eof. See the instructions.
    test_ok_exp(&[id("x"), Dot, id("y"), Plus, id("z")],
        Exp::new_binary(
            Exp::new_field(
                Exp::new_id("x"),
                Ident::new("y")),
            BinOp::Add,
            Exp::new_id("z")));

The comment above each test shows what a human would type to produce these tokens, and some tests (like this one) have some extra comments for mistakes I expect people to make. Then, you can see the actual AST in Ugly Form (all the Exp:: stuff).


Getting Started

Open up these files in your editor, because you’ll probably need to refer to them:

In addition, I really recommend looking at the parsing_lisp and parsing_math examples in the examples repo. Most of what you’re doing is just… that, but more.

The Parser object and the methods I’ve given you

The Parser object works a bit like the Lexer object from project 1, and has these helper methods defined in src/parse/parser.rs:

The rest of Parser’s methods are spread across exp.rs, stmt.rs, and decl.rs.

Getting a feel for the code you’ll write

In src/parse/decl.rs, some methods are filled in (and others are just stubbed out). Have a look at parse_type at the bottom.

    pub(crate) fn parse_type(&mut self) -> ParseResult<Box<AstType>> {
        match self.cur() {
            TokenKind::LParen => {
                self.next();
                self.expect(TokenKind::RParen)?;
                Ok(AstType::new_void())
            }
            // etc...
            _ => Err(ParseError::expected_type(self.cur_loc(), self.cur())),
        }
    }

Your task

You will write the rest of the parsing methods in these three files:

For reference, it took me about 310 lines of code to do this. If your code is going way past that, maybe you’re making it harder than it needs to be and you should get some help.

There are four kinds of AST nodes: expressions (Exp), statements (Stmt), declarations (Decl), and types (AstType).

AstType is named that and not just Type because in the future semantic analysis, Type will refer to the real types that underlie the program, while AstType is just “whatever the programmer typed in”.

Because declarations contain statements, and statements contain expressions, it makes sense to do expressions first, then do statements, then finish with declarations.


Parsing expressions

I forgot boolean literals in the original Primary expression grammar in the comments in src/parse/ast.rs. I have since fixed it, but if you started the project before I fixed it, refer to the grammar below instead.

Here is the grammar for expressions:

	Exp:       Term (BinOp Term)*
	Term:      UnOp* Primary PostfixOp*
	Primary:   Id | IntLit | StrLit | TrueLit | FalseLit | Ctor | '(' Exp ')'

	BinOp:     '+'|'-'|'*'|'/'|'%'|'<<'|'>>'|'<'|'>'|'<='|'>='|'=='|'!='|'and'|'or'
	UnOp:      '-'|'not'

	PostfixOp: FuncCall | Field
	FuncCall:  '(' (Exp (',' Exp)*)? ')'
	Field:     '.' Id

	Id:        <TokenKind::Id>
	IntLit:    <TokenKind::IntLit>
	StrLit:    <TokenKind::StrLit>
	TrueLit:   'true'
	FalseLit:  'false'
	Ctor:      'new' Id '(' (Exp (',' Exp)*)? ')'

Some notes:

The tests

Here are the relevant tests, in the order I recommend you tackle them. Remember you can use cargo test exps to only run the expression tests, or e.g. cargo test good_primary_exps to run a single test. Run them now and see how they fail.

Writing some code

In exp.rs I’ve given you parse_exp and parse_binops methods that are the same as in parsing_math. So, you kind of get binary operators “for free.” You’re welcome ;)

    fn parse_term(&mut self) -> ParseExpResult {
        match self.cur() {
            // TODO: the UnOp operators are parsed here

            _ => {
                let pri = self.parse_primary()?;
                self.parse_postfix(pri)
            }
        }
    }

    fn parse_primary(&mut self) -> ParseExpResult {
        match self.cur() {
            // TODO: all the Primary expressions

            _ => Err(ParseError::bad_expression(self.cur_loc(), self.cur())),
        }
    }

    fn parse_postfix(&mut self, mut lhs: Box<Exp>) -> ParseExpResult {
        loop {
            match self.cur() {
                // TODO: the Postfix operators are parsed here

                _ => break,
            }
        }

        Ok(lhs)
    }

Implementing parse_term

parse_term will look very much like the one from parsing_math. This is the code from parsing_math:

    Token::Minus => {
        self.next();
        let operand = self.parse_term()?;
        Ok(AstNode::neg(operand))
    }

Your code will look very similar but with different names for things:

You will also have a second case for TokenKind::Not, and the code for that will look almost exactly the same as the TokenKind::Minus case, just using a different UnOp.

Implementing parse_primary

The grammar for Primary expressions is this:

Primary:   Id | IntLit | StrLit | TrueLit | FalseLit | Ctor | '(' Exp ')'

Remember the | means OR, so this reads as, “a primary expression is an Id OR an IntLit OR a StrLit OR…”

Let’s start with just implementing identifiers. The grammar says:

Id:        <TokenKind::Id>

This says, “an Id expression consists of a single token, TokenKind::Id.”

In impl Exp there is a constructor for Exp::new_id(). So, the code for that in the match of parse_primary will look like this:

	TokenKind::Id(name) => { self.next(); Ok(Exp::new_id(name)) }

Remember that TokenKind::Id(name) is a pattern which declares a variable named name, and that variable will contain the string that was embedded inside the TokenKind::Id token. (Again, remember when you created this kind of token in project 1?)

Stop. Don’t write any more code. cargo build and fix any errors. Then cargo test exps and see what happens. You should get a different failure on good_primary_exps than you did before, complaining about an IntLit(123 or something now.

Now carry on by writing code to handle these primary expressions: IntLit, StrLit, TrueLit (Exp::new_bool_lit), and FalseLit. They all work pretty similarly, and you should be able to compile and test after implementing each.

Then we get to the more interesting ones: Ctor and '(' Exp ')'.

Parenthesized expressions '(' Exp ')'

parsing_math has code for this, but let’s examine the rule closely. It says:

  1. expect a left parenthesis (so, use self.next() when you see a TokenKind::LParen)
  2. recursively parse an Exp (so, call self.parse_exp()?)
  3. expect a right parenthesis (so, use self.expect(TokenKind::RParen)?)

Look at the code for this in parsing_math. Sure enough, it does exactly what the grammar rule says. So implement this rule in parse_primary too.

Constructors Ctor

This kind of expression looks like new Object() or new Point(3, 8). It’s used to construct struct objects similarly to how class objects are constructed in Java.

Exp::new_ctor takes two arguments: an AstType that represents the type that comes right after new, and a vector of Exps that represent the arguments in the parentheses.

Here is the grammar rule:

Ctor:      'new' Id '(' (Exp (',' Exp)*)? ')'

You read grammar rules from left to right, as a sequence of instructions. This one says:

  1. expect a new token.
  2. parse the name of the struct as an identifier (using self.parse_ident()?).
  3. expect a left parenthesis.
  4. parse 0 or more expressions, in a comma-separated list. (this bit is tricky)
  5. expect a right parenthesis.

To help you with steps 3-5, here is a helper method you can add to exp.rs. Study how this method works; you will be using a similar pattern of “an if with a while inside, pushing things into a Vec” in multiple places throughout the project.

// '(' (Exp (',' Exp)*)? ')'
fn parse_arg_list(&mut self) -> ParseResult<Vec<Box<Exp>>> {
	// '('
    self.expect(TokenKind::LParen)?;

    let mut args = Vec::new();

    // this 'if' handles the '?' bit of the grammar rule
    if self.cur() != TokenKind::RParen {
    	// Exp
        args.push(self.parse_exp()?);

        // (',' Exp)*
        while self.cur() == TokenKind::Comma {
            self.next();
            args.push(self.parse_exp()?);
        }
    }

    // ')'
    self.expect(TokenKind::RParen)?;
    Ok(args)
}

This simplifies parsing ctors to:

  1. expect a new token.
  2. parse the name of the struct as an identifier.
  3. parse the arguments using parse_arg_list.

There is one last bit to implementing this rule: Exp::new_ctor’s first argument is an AstType, not an Ident. So, when you construct it, you’ll use something like this:

	// where 'name' is the ident you parsed right after the 'new',
	// and 'args' is what was returned by 'parse_arg_list'.
	Ok(Exp::new_ctor(AstType::new_struct(name), args))

Implementing parse_postfix

For this, look closely at the way the code in parsing_math works:

    Token::LParen => {
        // .... some stuff goes here ....

        // vvvvvvvv THIS vvvvvvvvv is important
        lhs = AstNode::call(lhs, arg);
    }

Unlike parse_term and parse_primary, you do not just Ok(whatever) at the ends of the cases. Instead, you replace the lhs variable with a new AST node. Your code will do something similar.

These are the two postfix operators you have to parse:

	PostfixOp: FuncCall | Field
	FuncCall:  '(' (Exp (',' Exp)*)? ')'
	Field:     '.' Id

Hey, that FuncCall rule looks exactly the same as what parse_arg_list does, huh? Yay for reusing code instead of copying and pasting it!

FuncCall will use Exp::new_func_call and Field will use Exp::new_field.

And with that, you’re done with expressions!


Parsing statements

The grammar for expressions is in src/parse/ast.rs starting on line 200.

Here are the relevant tests, in the order I recommend you tackle them:

Statements are still pretty straightforward, but they have some more parts than expressions, so the code can get longer.

Important notes:

Block statements and returns

In Java, if you have a method that returns a value (like int foo()), the compiler will perform some complex control flow analysis to determine if you actually used a return in all possible cases. Control flow analysis is something we’ll talk about much later in the course.

This is a little too heavyweight for what is supposed to be a toy language. Instead, Truss uses a sort of syntactic hack to ensure that every non-void function returns something. Basically, if there is a return statement, it must be the last statement in a block, and this rule applies recursively.

To explain better, here are examples of valid code:

And here are examples of code that would be syntactically valid, but which breaks this rule:

Fortunately, this is actually really easy to implement.

All Stmts have a .returns() method which determines if that statement returns. It’s determined recursively (see src/parse/ast.rs line 278).

Block statements have this grammar: '{' Stmt* '}'. So you’ll have a loop there, parsing statements and pushing them into a vector, right?

All YOU have to do is in that loop, if a statement’s .returns() is true, break out of the loop.

It’s made a little tricky because of Rust’s borrowing rules (you have to call stmt.returns() before pushing it, but you have to push it before deciding to break) but that’s all you have to do.


Parsing declarations

Finally we come to declarations, which are the things that appear at the top level of a Truss source code file. This includes functions, structs, and global variables.

The grammar for declarations is in src/parse/ast.rs starting on line 114.

Here are the relevant tests, in the order I recommend you tackle them:

Variable declarations should be super simple for you by now. Implementing them will also make the let_stmts test succeed (if you implemented the parsing for let statements).

There is one caveat: parse_var_decl will return a value that looks like Ok(VarDecl { name, init }), where name and init are local variables within parse_var_decl. (The reasons for this instead of returning a Decl are tedious so just go with it ok.)


Function decls

This is where things start getting hairy. They have a lot of parts, but don’t let it intimidate you. Just start parsing at the beginning.

The return type: (':' Type)?

In the Truss language, a function’s return type is written after a colon after the arguments:

fn f(): int { return 5; }

A function that returns nothing can be written implicitly (by not putting the return type) or explicitly (by writing a pair of empty parens):

// both of these are void-returning functions
fn a() {}
fn b(): () {}

After parsing a function’s code, we have a final check to do to ensure that non-void functions return a value. Assuming you have local variables return_type, code, start_loc, and name, you will have to do:

if *return_type != AstType::Void && !code.returns() {
    return Err(ParseError::no_return(start_loc, name.name.clone()));
}

start_loc should be the location of the fn token that started this function. (Remember string literals?)


Struct decls

This is it! The last piece.

Struct decls in Truss look like a weird offspring of Rust structs and Java classes. The grammar is intentionally limited to make them easier to parse. Here are some clarifications as to what the hell the struct decl rules are trying to say.

A struct can have zero or more fields, declared just like in Rust:

// Truss structs
struct S {}
struct T {
    x: int,
    y: int,
}

It’s possible to have a trailing comma after the last field, like in Rust. So the code here won’t be exactly the same shape as, say, function decl arguments. But it will almost be the same.

Truss structs can also inherit from another struct, kind of like class inheritance in Java:

// Truss struct inheritance
struct S {}
struct T: S {} // S is the base... struct.

After any fields come methods:

// A Truss struct with a method
struct Cat {
    age: int,
    fn meow() {
        println("meow!");
    }
}

These are parsed as func decls, but! when you parse them, you will pass the struct’s name as an argument to parse_func_decl:

    // right after the `struct` token...
    let name = self.parse_ident()?;

    // ...later on, when you're in a loop parsing methods...
    let method = self.parse_func_decl(Some(&name))?;

Otherwise, the test at src/parse/tests.rs, line 757 will fail.

Finally, like with parse_func_decl, you will return a value that looks like like:


Victory lap

If everything else is working, the full_program test should work too!

If it doesn’t, you are making a mistake in one of the parsing methods you wrote. If you’re stuck, get help understanding the test failure message.


Submission and Grading

Project submissions will only be accepted through GitHub. If you are having trouble using git, please get in touch sooner rather than at 11:58 PM on the due date.

The project is due at midnight on the due date, but there is some wiggle room there. (We can see the time that you pushed to GitHub.)

You can turn it in for late credit until midnight on the day after the due date. You’ll get 10% off your grade.

Grading rubric: