Chapter 7. Info Files

Table of Contents

7.1. States
7.2. Interpreting conflicts

Happy info files, generated using the -i flag, are your most important tool for debugging errors in your grammar. Although they can be quite verbose, the general concept behind them is quite simple.

An info file contains the following information:

  1. A summary of all shift/reduce and reduce/reduce conflicts in the grammar.

  2. Under section Grammar, a summary of all the rules in the grammar. These rules correspond directly to your input file, absent the actual Haskell code that is to be run for each rules. A rule is written in the form <non-terminal> -> <id> ...

  3. Under section Terminals, a summary of all the terminal tokens you may run against, as well as a the Haskell pattern which matches against them. This corresponds directly to the contents of your %token directive (Section 6.3.2, “Tokens”).

  4. Under section Non-terminals, a summary of which rules apply to which productions. This is generally redundant with the Grammar section.

  5. The primary section States, which describes the state-machine Happy built for your grammar, and all of the transitions for each state.

  6. Finally, some statistics Grammar Totals at the end of the file.

In general, you will be most interested in the States section, as it will give you information, in particular, about any conflicts your grammar may have.

7.1. States

Although Happy does its best to insulate you from the vagaries of parser generation, it's important to know a little about how shift-reduce parsers work in order to be able to interpret the entries in the States section.

In general, a shift-reduce parser operates by maintaining parse stack, which tokens and productions are shifted onto or reduced off of. The parser maintains a state machine, which accepts a token, performs some shift or reduce, and transitions to a new state for the next token. Importantly, these states represent multiple possible productions, because in general the parser does not know what the actual production for the tokens it's parsing is going to be. There's no direct correspondence between the state-machine and the input grammar; this is something you have to reverse engineer.

With this knowledge in mind, we can look at two example states from the example grammar from Chapter 2, Using Happy:

State 5

        Exp1 -> Term .                                      (rule 5)
        Term -> Term . '*' Factor                           (rule 6)
        Term -> Term . '/' Factor                           (rule 7)

        in             reduce using rule 5
        '+'            reduce using rule 5
        '-'            reduce using rule 5
        '*'            shift, and enter state 11
        '/'            shift, and enter state 12
        ')'            reduce using rule 5
        %eof           reduce using rule 5

State 9

        Factor -> '(' . Exp ')'                             (rule 11)

        let            shift, and enter state 2
        int            shift, and enter state 7
        var            shift, and enter state 8
        '('            shift, and enter state 9

        Exp            goto state 10
        Exp1           goto state 4
        Term           goto state 5
        Factor         goto state 6

For each state, the first set of lines describes the rules which correspond to this state. A period . is inserted in the production to indicate where, if this is indeed the correct production, we would have parsed up to. In state 5, there are multiple rules, so we don't know if we are parsing an Exp1, a multiplication or a division (however, we do know there is a Term on the parse stack); in state 9, there is only one rule, so we know we are definitely parsing a Factor.

The next set of lines specifies the action and state transition that should occur given a token. For example, if in state 5 we process the '*' token, this token is shifted onto the parse stack and we transition to the state corresponding to the rule Term -> Term '*' . Factor (matching the token disambiguated which state we are in.)

Finally, for states which shift on non-terminals, there will be a last set of lines saying what should be done after the non-terminal has been fully parsed; this information is effectively the stack for the parser. When a reduce occurs, these goto entries are used to determine what the next state should be.