How to implement a parser using perplex/lemon.
See the templates directory for boilerplate versions of the sources described.

Parser Sources
---------------
Here's an overview of the source files that will implement the parser, and where they come from:

// program you write to drive the parsing
main.c  // includes main.h, calls scanner and parser
main.h  // includes parser.h, scanner.h, and parser prototypes

// generated by lemon from a parser.lemon input that you write
// $ lemon -q parser.lemon
parser.h  // token definitions
parser.c  // parser implementation

// generated by perplex and re2c from a scanner.perplex that you write
// $ perplex -t /path/to/perplex_template.c -i scanner.h -o scanner.re
// $ re2c -o scanner.c scanner.re
scanner.h // scanner prototypes
scanner.c // scanner implementation, includes scanner.h

Stub the Files You'll Need
---------------------------
main.c          // main for program to read input
main.h          // header for main
scanner.perplex // input to perplex
parser.lemon    // input to lemon

Note the dependencies between main, the scanner, and the parser:

* The parser and scanner need access to the token definitions from the
  lemon-generated parser.h.
* The parser and main need access to the parser function prototypes and
  app_data_t definition.
* The scanner and main need access to the perplex function prototypes and
  the perplex_t and token_t definitions.

The easiest way to satisfy these dependencies is to put the token_t and
app_data_t definitions, the parser function prototypes, and the headers
generated by perplex and lemon in one header (main.h), and have main.c,
scanner.perlex, and parser.lemon all include that header.

Define app_data_t
------------------
Filling out the app_data_t makes it clear what data you are trying to extract,
and helps guide the design of the parser.

Usually, your app_data_t struct contains the token_t struct as well, this
allows you to give the scanner and parser access to all the same information.
While the lemon parser function takes a token_t and app_data_t argument separately,
the perplex scanner routine only takes a perplex_t scanner as an argument. You
need to give it the app data by setting the scanner's "extra" member via the
setter routine:

perplexSetExtra(scanner, (void *)appData);

The data is then available inside the rules of the perplex input file as
(void *)yyextra. You can simplify access to the data by defining the entrance
macro thusly before including the perplex-generated header in your driver
program:

#define PERPLEX_ON_ENTER app_data_t *appData = (app_data_t *)yyextra;

Then you can just reference "appData" inside the rules of the perplex input
file, just like you can in the lemon input file.

Write the Lemon Input
----------------------
The typical input file can be seen as a list of statements, which may be
delimited by newlines or other termination sequences. Very often, there's not
a set number of statements, so we start off with this basic definition which
basically says "an input file is zero or more statements".

    start_symbol ::= statement_list.

    statement_list ::= /* empty */.
    statement_list ::= statement_list statement.

Define the statement types you expect to see, specifying the statement
delimiter you plan on using, and fill out the grammar from there. Note that
Lemon automatically distinguishes non-terminals and terminals based on case.

    statement ::= statement_a EOL.
    statement ::= statement_b EOL.

    statement_a ::= ASTART index.

    ...

Once you've outlined the rules, add actions which assign the values of
non-terminals to the in-memory representation. The following example
assumes that the token type has a member 'n' of type int:

    %type index {int}

    statement_a ::= ASTART index(IDX). {
	appData->token_data.idx = IDX;
    }

    index(IDX) ::= NONNEGATIVE_INT(N). {
	IDX = N.n;
    }

Write the Perplex Input
------------------------
The main job of the perplex input is to define rules which result in capturing
any relevant token data and returning the tokens that appear in the lemon
grammar rules. So, following the above example, we need rules like this:

    "a" {
	return TOKEN_ASTART;
    }

    [0-9]|[1-9][0-9]+ {
	sscanf(yytext, "%d", &appData->token_data.n);
	return TOKEN_NONNEGATIVE_INT;
    }

    [\n]+ {
	return TOKEN_EOL;
    }
