===============================================================================
                        MONADIC PARSER GENERATOR
===============================================================================
                           Haskell 1.3 Version 
                              April 1997
                   
                            Alberto Pardo
                        Instituto de Computacion
                       Universidad de la Republica
                          Montevideo - Uruguay

                           pardo@fing.edu.uy
                      http://www.fing.edu.uy/~pardo
                    
===============================================================================

                              USERS GUIDE 

===============================================================================

The Monadic Parser Generator (MPG) is an experimental tool for the automatic
generation of functional (recursive descent) parsers in Haskell. The parsers 
produced by MPG are monadic in the sense that they are expressed in terms of 
(a version of) the so-called monad for parsing.

From an annotated grammar specification in a style similar to Yacc, MPG 
produces a monadic parser which is close to the parser one would write 
``by hand''.

This document shortly describes the main features and use of the Haskell 1.3 
version of MPG.

INDEX
~~~~~

 * Grammar Files

 * Lexical Rules

 * The Produced Parser

 * Invoking MPG

 * The MPG distribution

 * Future Work

 * Monadic Parsers

 * Bibliography


Grammar Files
~~~~~~~~~~~~~

Grammar specifications are given to MPG in a file with extension '.gen'. 
The information and structure of a grammar file closely resembles the input 
to Yacc or Happy (a Haskell version of Yacc developed at Glasgow, UK). 
A grammar file has the following format (and order):

       <optional code>
       %name <parser name>
       <token declaration>
       %%
       <grammar specification>
       <optional code>

Spaces, \t or \n are allowed in between, but (by now) no comment lines can
be included. In future versions we plan to accept grammar files in `literate 
format'.

The header and footer <optional code> corresponds to Haskell code enclosed
in braces, i.e.

      { ...code... }

The Haskell code is treated as a comment by MPG, but copied into the generated 
output file. The optional code is meant for including auxiliary definitions 
that may be necessary to be placed in the same file as the parser produced 
by MPG. A typical example is to include a lexical analyser as part of the 
optional code.
 
The <parser name> in the directive

       %name <parser name>

must be a valid Haskell identifier. MPG uses that identifier to call the 
top-level parsing function, which simply invokes the parsing function 
corresponding to the first nonterminal in the grammar specification. For 
example, the directive

      %name mpgParser

gives rise to the definition:

      mpgParser = parser_<first_nonterminal>

A <token declaration> 

       %token
             <token_name>  { <token_pattern> }
                ...               ...
                ...               ...  
             <token_name>  { <token_pattern> }

specifies the names to be used for referring to the tokens in the grammar 
specification. Between braces it is specified how tokens are pattern matched.
These patterns are Haskell patterns (eventually) with a special mark `$$' 
which may occur at most once. That mark is used to inform that the token 
carries certain information that must be used within the semantic actions 
as the value of the token (instead of using the token itself as value). 
An example of a token declaration is the following:

     %token
           '('  { TokenLeft }
           ')'  { TokenRight }
           '+'  { TokenPlus }
           '-'  { TokenMinus }
           num  { TokenNum $$ }
 
The <grammar specification> is of this form:

       <nonterm> : <production>  { <semantic action> }
               [ | <production>  { <semantic action> }
                 |     ...         ...
                 | <production>  { <semantic action> } ]
       ... 
       ...
       <nonterm> : <production>  { <semantic action> }
               [ | <production>  { <semantic action> }
                 |     ...                ...
                 | <production>  { <semantic action> } ]

Each <production> is a sequence of token names and nonterminals. MPG accepts
grammars with occurrences of the null string, which is reflected by the
possibility of writing empty productions.
 
A <semantic action> is a Haskell expression with occurrences of meta-variables
$1,...,$n which refer to the semantic values of the symbols in the production
($i refers to the semantic value of the i-th symbol in the production). For 
instance,

      exp : term op exp  { $2 $1 $3 }
          | term         { $1 }
    
      term : '(' exp ')' { $2 }
           | num         { ASnumber $1 }
     
      op   : '+'         { ASplus }
           | '-'         { ASminus }

The use of the null string can be observed in the following example:

      exp  : term exp1    { foldl (\y (f,t) -> f y t) $1 $2 }

      exp1 : op term exp1 { ($1,$2) : $3 }
           |              { [] }
    
      term : '(' exp ')'  { $2 }
           | num          { ASnumber $1 }
     
      op   : '+'          { ASplus }
           | '-'          { ASminus }

Lexical Rules (taken from Happy's info)
~~~~~~~~~~~~~

Identifiers (i.e. token names and nonterminals) in MPG grammar files can take 
any of the following forms (using the BNF syntax from the Haskell Report):

             id      ::= alpha { idchar }
                       | ' { any{^'} | \' } '
                       | " { any{^"} | \" } "
     
             alpha   ::= A | B | ... | Z
                       | a | b | ... | z
     
             idchar  ::= alpha
                       | 0 | 1 | ... | 9
                       | _

 
The Produced Parser
~~~~~~~~~~~~~~~~~~~

The parser that MPG produces is expressed in terms of monad operations. 
In our examples we have used the parser monad defined below, but feel free
to change it for your favourite one. The only restriction is that the parser
monad that you select must be an instance of the MonadPlus class (that appears
in the Standard Library). The parser monad below is part of Hugs 1.3 Library.

    newtype Parser token value = P ([token] -> [(value,[token])])

where 

 - 'token' denotes the type of the tokens that the parser accepts; and

 - 'value' denotes the type of the result that the parser returns; this type 
   corresponds to the type of the semantic actions. 

The name of these two types is not required by MPG. The parser monad above
represents the list of successes method. Failure is represented by the empty 
list. As usual, you can order the productions in your grammar in such a way 
that the desired parsing appears first in the list of successes.

The parser monad operations are given by:

    instance Functor (Parser t) where
       -- map         :: (a -> b) -> (Parser t a -> Parser t b)
       map f (P p)     = P (\inp -> [(f v, out) | (v,out) <- p inp])

    instance Monad (Parser t) where
       -- return      :: a -> Parser t a
       return v        = P (\inp -> [(v,inp)])

       -- >>=         :: Parser t a -> (a -> Parser t b) -> Parser t b
      (P p) >>= f     = P (\inp -> concat [parse (f v) out | (v,out) <- p inp])

    instance MonadZero (Parser t) where
       -- zero        :: Parser t a
       zero            = P (\inp -> [])

    instance MonadPlus (Parser t) where
       -- (++)        :: Parser t a -> Parser t a -> Parser t a
       (P p) ++ (P q)  = P (\inp -> (p inp ++ q inp))


These are some necessary parser combinators: 

    item :: Parser t t
    item  = P (\inp -> case inp of
                         []     -> []
                         (x:xs) -> [(x,xs)])

    sat  :: (t -> Bool) -> Parser t t
    sat p = do {x <- item; if p x; return x}

    tok  :: Eq t => t -> Parser t t
    tok t = sat (t==)

    parse         :: Parser t a -> [t] -> [(a,[t])]
    parse (P p) ts = p ts 


[Note: With the above definition of the parser monad, the application of a 
parser to a list of tokens needs to be done through function parse, since 
the Parser type is a newtype.]

For example, for the first grammar for arithmetic expressions shown above 
the following monadic parser is produced by MPG:

    parser_exp =
	(parser_term >>= \ var1 ->
	 parser_op >>= \ var2 ->
	 parser_exp >>= \ var3 ->
	 return ( var2 var1 var3 ))
	++
	(parser_term >>= \ var1 ->
	 return ( var1 ))

    parser_term =
	(parser_TokenLeft >>= \ var1 ->
	 parser_exp >>= \ var2 ->
	 parser_TokenRight >>= \ var3 ->
	 return ( var2 ))
	++
	(parser_TokenNum >>= \ var1 ->
	 return ( ASnumber var1 ))

    parser_op =
	(parser_TokenPlus >>= \ var1 ->
	 return ( ASplus ))
	++
	(parser_TokenMinus >>= \ var1 ->
	 return ( ASminus ))


    parser_TokenLeft = tok TokenLeft

    parser_TokenRight = tok TokenRight

    parser_TokenPlus = tok TokenPlus

    parser_TokenMinus = tok TokenMinus

    parser_TokenNum = map (\(TokenNum var) -> var) (sat isToken) 
    	where 
	  isToken (TokenNum var) = True 
	  isToken _ = False

Suppose that the %name directive specifies that

    %name mpgParser   

Then the function definition:

    mpgParser = parser_exp

is included as part of the generated parser. This function has type:
 
    mpgParser :: Parser Token ASexp

since it accepts tokens of type Token,

   data Token = TokenLeft
              | TokenRight
              | TokenPlus
              | TokenMinus 
              | TokenNum String
              deriving (Eq,Show)

and yields abstract syntax trees of type ASexp,

   data ASexp = ASplus ASexp ASexp
              | ASminus ASexp ASexp
              | ASnumber String
              deriving (Show)

We recommend to look at the examples that come with the MPG distribution. 
(Remember that grammar files have extension `.gen'.)

Invoking MPG
~~~~~~~~~~~~

MPG can be run either in interpreted form using Hugs 1.3 or as an executable
by having compiled it first with Haskell 1.3. 

[The version implemented so far compiles at least with Chalmers HBC 1.3, 
tests with other Haskell implementations have not been done.]

MPG imports the "System" Library that permits access to system facilities, 
since it makes use of the function getArgs defined in it. This library is 
part of HBC 1.3 and Hugs 1.3 libraries. 

From Hugs 1.3
=============

To run MPG in interpreted form type:

    hugsmpg <grammar filename>

The script `hugsmpg' calls Hugs (which should have the Prelude.hs file 
as prelude), loads MPG, and applies it to the grammar file specified.
Make sure that the version of Hugs that `hugsmpg' calls is 1.3.

For instance, if we call

    hugsmpg exp.gen

and exp.gen is a consistent grammar file, then MPG generates the parser
in the file exp.hs and concludes printing the message:

    Parser Generated!  [File: exp.hs]

Note: MPG builds the name of the output file by taking the prefix of the 
grammar file's name previous the last occurrence of a dot `.', and 
adding to it the extension `.hs'. (The present version of MPG does not
control that the grammar file's name has actually `.gen' as suffix.)

As an executable
================

To use the compiled version of MPG type

    mpg <grammar filename> 

For instance,

    mpg exp.gen

The MPG distribution includes an executable version complied with HBC 1.3
in a SunOS machine. In case you need to recompile MPG so that you can run it 
in another platform, first activate these importations

    import Char 
    import MonadicIO 

that appear as a comment at the beginning of the literate script mpg.lhs. 
(These importations have been left as a comment in the source code due to 
incompatibilties between Hugs 1.3 and HBC 1.3 Library). 

The MPG distribution
~~~~~~~~~~~~~~~~~~~~

The MPG distribution can be obtained from the address

	http://www.fing.edu.uy/~pardo/MPG/


Future Work
~~~~~~~~~~~

Among other shortcomings that we will try to alleviate in future versions, 
we plan to:

  * allow extended BNF notation by the inclusion of e.g. repetition, 
    generating thus parsers that use the `many' parser combinator;
 
  * accept more flexible grammar input files;

  * improve MPG's source code by doing it more functional in the sense 
    to avoid as much as possible the use of the imperative "error" function 
    for interrupting program execution and error reporting; 

  * produce more sophisticated and efficient parsers, or at least give the 
    possibility of generating alternative versions of monadic parsers in order 
    to be able to perform comparative tests; 

  * include more sophisticated examples.


Monadic Parsers
~~~~~~~~~~~~~~~

The parsers produced by MPG possess all the deficiencies and advantages of 
monadic parsers. Here are some features to remark: 

  * non-deterministic parsers are achieved using the monad parser described
    above

  * deterministic parsers are achieved using state transformers with maybe:

       newtype Parser token value = P ([token] -> Maybe (value,[token]))

  * left recursive grammars should not be used!! (otherwise, the produced 
    parser will have an infinite loop). This is a limitation of recursive
    descent parsers.

  * lexical analysers can also be written as monadic parsers, and therefore 
    can be produced by MPG.

Bibliography
~~~~~~~~~~~~

The following papers contain details about combinator and monadic parsers:

  * Graham Hutton & Erik Meijer, "Monadic Parser Cominators", Tech.Report
    NOTTCS-TR-96-4, Dept. of Comp.Science, University of Nottinham, 1996.

  * Graham Hutton & Erik Meijer, "Monadic Parsing in Haskell", Functional
    Pearl, JFP.

  * Philip Wadler, "Monads for Functional Programming", Advanced Functional
    Programming, LNCS 925, 1995.

  * Jeoren Fokker, "Functional Parsers", Advanced Functional Programming,
    LNCS 925, 1995.

  * Graham Hutton, "Higher-Order Functions for Parsing", JFP 2(3): 323-343, 
    July 1992.


=== End of Users' Guide =======================================================
