TECHNOLOGY

Parsing / Recursive Descent Parser

« All posts

A parser is a program that assuredly takes a stride of lexical tokens and transforms them into one more records structure, assuredly within the fabricate of a parse tree that satisfies the language’s grammar guidelines.

What’s a Recursive Descent Parser?

Recursive Descent Parser is a high-down parser where each and each non-terminal within the BNF grammar is a subroutine. The parser works by recursively calling each and each subroutine to invent the parsed output. It’s no longer the excellent algorithm to put in power a parser, but it’s no doubt one of essentially the most uncomplicated ones that are very straightforward to realize and put in power.

As an illustration, let’s whisper now we maintain a grammar to parse money amount in USD, GBP, and EUR. The money amount must nonetheless be written within the fabricate of , esteem $100:

money           = currency_symbol amount ;
currency_symbol = '$' | '£' | '€' ;
amount          = INTEGER ;

The grammar has three non-terminals: money, currency_symbol, and amount. When implemented, we must nonetheless additionally put in power three parsing suggestions: parse_money(), parse_currency_symbol(), and parse_amount(). Every of the parsing suggestions will call one one more correct esteem how they’re connected within the grammar guidelines:

form ParseResult = Consequence;
 
impl<'a> Parser<'a> {
    
    fn parse_amount(&mut self) -> ParseResult<i32> {
        ...
    }
 
    
    fn parse_currency_symbol(&mut self) -> ParseResult {
        ...
    }
 
    
    fn parse_money(&mut self) -> ParseResult {
        let currency = self.parse_currency_symbol()?;
        let amount = self.parse_amount()?;
        return Okay(MoneyNode { currency, amount });
    }
}

Implementing a Money Parser

Let’s dig deeper into the above instance. We’re going to be in a position to level of curiosity on the parser. Let’s decide that now we already maintain a lexer that converts an enter string esteem "$100" into a checklist of tokens.

Data Constructions

On this program, now we maintain two kinds of tokens: the CurrencySymbol token and the Number token:

#[derive(Debug, PartialEq, Clone, Copy)]
enum TokenType {
    CurrencySymbol,
    Number
}
 
#[derive(Debug)]
struct Token<'a> {
    token_type: TokenType,
    instruct material: &'a str
}
 
impl<'a> Token<'a> {
    pub fn fresh(token_type: TokenType, instruct material: &'a str) -> Self {
        Self {
            token_type,
            instruct material
        }
    }
}

The enter for our parser is a Vec that looks esteem this:


let tokens = vec![
    Token::new(TokenType::CurrencySymbol, "£"),
    Token::new(TokenType::Number, "128")
];

The output of our parser is an data structure called MoneyNode:

#[derive(Debug, PartialEq)]
enum Forex {
    USD,
    GBP,
    EUR
}
 
#[derive(Debug, PartialEq)]
struct MoneyNode {
    currency: Forex,
    amount: i32
}

Error Facing

For this grammar, there are two kinds of errors that could also happen all over parsing:

  1. Unexpected token error: when the parser found a token that was as soon as misplaced.
  2. Invalid amount error: when the parser can also no longer parse the amount amount.

We’re going to be in a position to manufacture a custom error form and a Consequence form to tackle these two errors:

#[derive(Debug, PartialEq)]
enum ParseError {
    UnexpectedToken(TokenType, TokenType),
    InvalidAmount
}
 
impl Thunder for ParseError {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt:: Consequence {
        match self {
            Self:: UnexpectedToken(anticipated, found) =>
                write!(f, "Unexpected Token: Expected {:?}. Learned {:?}.", anticipated, found),
            Self::InvalidAmount =>
                write!(f, "Invalid Amount!"),
        }
    }
}
 
form ParseResult = Consequence;

The Parser and a few utility suggestions

Now, let’s manufacture the parser. It takes a checklist of enter tokens and uses a pos variable to arrangement shut tune of the present token.

struct Parser<'a> {
    tokens: Vec'a>>,
    pos: usize
}

We’re going to be in a position to put in power some utility govern the enter token stride, esteem:

  1. is_eof(): to establish if we’re on the tip of the token stride or no longer
  2. glance(): to catch the present token
  3. is_match(): to establish if the present token matched the anticipated form or no longer
  4. near(): to eat the present token and switch on to the following

These suggestions are no longer weird and wonderful to a recursive descent parser, but they’re very precious, as they arrangement shut the loyal parsing code looks orderly:

impl<'a> Parser<'a> {
    pub fn fresh(tokens: Vec'a>>) -> Self {
        Self {
            tokens,
            pos: 0
        }
    }
 
    fn is_eof(&self) -> bool {
        self.pos >= self.tokens.len()
    }
 
    fn glance(&self) -> &Token {
        &self.tokens[self.pos]
    }
 
    fn is_match(&self, token_type: TokenType) -> bool {
        !self.is_eof() && self.glance().token_type == token_type
    }
 
    fn near(&mut self) {
        self.pos += 1;
    }
}

The Recursive Descent Parser

Now, the considerable segment of the parser that we are staring at for. First, let’s write a parser for the amount rule of the grammar:



fn parse_amount(&mut self) -> ParseResult<i32> {
    let token = self.glance();
    if self.is_match(TokenType::Number) {
        let consequence = token.instruct material.parse::<i32>()
                          .map_err(|_| ParseError::InvalidAmount);
        self.near();
        return consequence;
    }
    Err(ParseError:: UnexpectedToken(
            TokenType::Number,
            token.token_type
        )
    )
}

We merely test if the present token is a Number token or no longer and parse the instruct material of this token into an i32 amount. On this fashion, we are in a position to ogle the usage of every and each the InvalidAmount and the UnexpectedToken errors.

Subsequent, we can write the parser for the currency_symbol rule:



fn parse_currency_symbol(&mut self) -> ParseResult {
    let token = self.glance();
    if self.is_match(TokenType::CurrencySymbol) {
        let currency_symbol = match token.instruct material {
            "$" => Forex::USD,
            "£" => Forex::GBP,
            _ => Forex::EUR
        };
        self.near();
        return Okay(currency_symbol);
    }
    Err(ParseError:: UnexpectedToken(
            TokenType::CurrencySymbol,
            token.token_type
        )
    )
}

Now that now we maintain the parser for every and each the currency_symbol and amount guidelines. The final step is to write down the parser for the money rule, it is implemented the similar blueprint the money rule is written. We’re going to be in a position to call the currency_symbol parser, then call the amount parser.

None of the above parsers will return any error worth for steady enter. Their return worth can even be mixed to fabricate the output MoneyNode object:



fn parse_money(&mut self) -> ParseResult {
    let currency = self.parse_currency_symbol()?;
    let amount = self.parse_amount()?;
    return Okay(MoneyNode {
        currency,
        amount
    });
}

And that’s it! We now maintain got already accomplished our parser!

Test the parser

Now, let’s write some assessments to get dangle of if the parser works or no longer. First, in a gay path, we can depart a sound token checklist and ask to get dangle of a sound output:

#[test]
fn test_parse_usd() {
    let tokens = vec![
        Token::new(TokenType::CurrencySymbol, "$"),
        Token::new(TokenType::Number, "512")
    ];
    let mut parser = Parser:: fresh(tokens);
    assert_eq!(parser.parse_money(), Okay(MoneyNode {
        currency: Forex::USD,
        amount: 512
    }))
}

Finally, if the enter currency is EUR as a replacement of USD, the parser must nonetheless return the correct worth:

#[test]
fn test_parse_eur() {
    let tokens = vec![
        Token::new(TokenType::CurrencySymbol, "€"),
        Token::new(TokenType::Number, "9372")
    ];
    let mut parser = Parser:: fresh(tokens);
    assert_eq!(parser.parse_money(), Okay(MoneyNode {
        currency: Forex::EUR,
        amount: 9372
    }))
}

Don’t neglect some uncomfortable paths, the parser must nonetheless return an Err worth if any of the parsing steps fails:

#[test]
fn test_parse_unexpected_token() {
    let tokens = vec![
        Token::new(TokenType::Number, "512"),
        Token::new(TokenType::CurrencySymbol, "$"),
    ];
    let mut parser = Parser:: fresh(tokens);
    assert_eq!(parser.parse_money(), Err(
        ParseError:: UnexpectedToken(
            TokenType::CurrencySymbol,
            TokenType::Number
        )
    ))
}
 
#[test]
fn test_parse_invalid_amount() {
    let tokens = vec![
        Token::new(TokenType::CurrencySymbol, "$"),
        Token::new(TokenType::Number, "3rr0r"),
    ];
    let mut parser = Parser:: fresh(tokens);
    assert_eq!(parser.parse_money(), Err(ParseError::InvalidAmount))
}

Bustle the test with the cargo test expose, and you might want to always nonetheless ogle all assessments are handed:

running 5 assessments
test test_parse_eur ... ok
test test_parse_gbp ... ok
test test_parse_invalid_amount ... ok
test test_parse_usd ... ok
test test_parse_unexpected_token ... ok
 
test consequence: ok. 5 handed; 0 failed; 0 no longer famend; 0 measured; 0 filtered out; accomplished in 0.00s

It’s probably you’ll presumably well presumably ogle the total source code of the parser in this gist.


On this text, we discovered what a Recursive Descent Parser is and put in power the parser for every and each grammar rule, which serves as constructing blocks for one one more. Within the following article, we can examine a more complex parser for fixing arithmetic expressions, which is able to offer us a smarter look on the recursive traits of the Recursive Descent Parsing technique.

Be taught more

Related Articles

Leave a Reply

Your email address will not be published.

Back to top button