parse()
, that recognizes correct instances of the grammar.
In this chapter the organization and specification of such a grammar file is discussed in detail.
Having read this chapter you should be able to define a grammar for which Bisonc++ can generate a class, including a member that will recognize correctly formulated (in terms of your grammar) input, using all the features and facilities offered by Bisonc++ to specify a grammar. In principle this grammar will be in the class of LALR(1) grammars (see, e.g., Aho, Sethi & Ullman, 19xx).
B() directives %% Grammar rulesReaders familiar with Bison will note that there is no C declaration section and no section to define Additional C code. With Bisonc++ these sections are superfluous since, due to the fact that a Bisonc++ parser is a class, all additional code required for the parser's implementation can be incorporated into the parser class itself. Also, C++ classes normally only require declarations that can be defined in the classes' header files, so also the `additional C code' section could be omitted from the Bisonc++ grammar file.
The `%%' is a punctuation that appears in every Bisonc++ grammar file to separate the two sections.
The Bisonc++ directives section is used to declare the names of the terminal and nonterminal symbols, and may also describe operator precedence and the data types of semantic values of various symbols. Furthermore, this section is also used to specify Bisonc++ directives. These Bisonc++ directives are used to define, e.g., the name of the generated parser class and a namespace in which the parser class will be defined. All Bisonc++ directives are covered in section 5.6.
The grammar rules define how to construct nonterminal symbols from their parts. The grammar rules section contains one or more Bisonc++ grammar rules, and nothing else. See section 5.3, covering the syntax of grammar rules.
There must always be at least one grammar rule, and the first `%%' (which precedes the grammar rules) may never be omitted even if it is the first thing in the file.
A terminal symbol (also known as a token type) represents a class of
syntactically equivalent tokens. You use the symbol in grammar rules to mean
that a token in that class is allowed. The symbol is represented in the
Bisonc++ parser by a numeric code, and the parser's lex()
member function
returns a token type code to indicate what kind of token has been read. You
don't need to know what the code value is; you can use the symbol to stand for
it.
A nonterminal symbol stands for a class of syntactically equivalent groupings. The symbol name is used in writing grammar rules. By convention, it should be all lower case.
Symbol names can contain letters, digits (not at the beginning), and underscores. Bisonc++ currently does not support periods in symbol names (Users familiar with Bison may observe that Bison does support periods in symbol names, but the Bison user guide remarks that `Periods make sense only in nonterminals'. Even so, it appears that periods in symbols are hardly ever used).
There are two ways to write terminal symbols in the grammar:
%token
. See
section 5.6.1.
character token type
(or literal character token
) is
written in the grammar using the same syntax used in C++ for character
constants; for example, '+
' is a character token type. A character token
type doesn't need to be declared unless you need to specify its semantic value
data type (see section 5.5.1), associativity, or precedence (see
section 5.6.2).
By convention, a character token type is used only to represent a token
that consists of that particular character. Thus, the token type '+
' is
used to represent the character `+
' as a token. Nothing enforces this
convention, but if you depart from it, your program will confuse other
readers.
All the usual escape sequences used in character literals in C++ can
be used in Bisonc++ as well, but you must not use the 0
character as a
character literal because its ASCII code, zero, is the code lex()
must
return for end-of-input (see section 6.3.1).
EQ
may be defined in the grammar file, with the
lexical scanner returning EQ
when it matches ==
.
How you choose to write a terminal symbol has no effect on its grammatical meaning. That depends only on where it appears in rules and on when the parser function returns that symbol.
The value returned by the lex()
member is always one of the terminal
symbols (or 0 for end-of-input). Whichever way you write the token type in the
grammar rules, you write it the same way in the definition of yylex. The
numeric code for a character token type is simply the ASCII code for the
character, so lex()
can use the identical character constant to generate
the requisite code. Each named token type becomes a C++ enumeration value
in the parser base-class header file, so lex()
can use the corresponding
enumeration identifiers. When using an externally (to the parser) defined
lexical scanner, the lexical scanner should include the parser's base class
header file, returning the required enumeration identifiers as defined in the
parser class. So, if (%token NUM) is defined in the parser class Parser
,
then the externally defined lexical scanner may return Parser::NUM
.
The symbol error
is a terminal symbol reserved for error recovery (see
chapter 8). The error
symbol should not be used for any other
purpose. In particular, the parser's member function lex()
should never
return this value.
result: components ... ;where result is the nonterminal symbol that this rule describes and components are various terminal and nonterminal symbols that are put together by this rule (see section 5.2). For example,
exp: exp '+' exp ;means that two groupings of type
exp
, with a `+' token in between, can
be combined into a larger grouping of type exp
.
Whitespace in rules is significant only to separate symbols. You can add extra whitespace as you wish.
Scattered among the components can be actions that determine the semantics of the rule. An action looks like this:
{ C++ statements }Usually there is only one action and it follows the components. See section 5.5.3.
Multiple rules for the same result can be written separately or can be joined with the vertical-bar character `|' as follows:
result: rule1-components ... | rule2-components... ... ;They are still considered distinct rules even when joined in this way.
If components in a rule is empty, it means that result can
match the empty string. For example, here is how to define a comma-separated
sequence of zero or more exp
groupings:
expseq: // empty | expseq1 ; expseq1: exp | expseq1 ',' exp ;It is customary to write a comment `
// empty
' in each rule with no
components.
expseq1: expseq1 ',' exp | exp ;Since the recursive use of expseq1 is the leftmost symbol in the right hand side, we call this left recursion. By contrast, here the same construct is defined using right recursion:
expseq1: exp ',' expseq1 | exp ;Any kind of sequence can be defined using either left recursion or right recursion, but you should always use left recursion, because it can parse a sequence of any number of elements with bounded stack space. Right recursion uses up space on the Bisonc++ stack in proportion to the number of elements in the sequence, because all the elements must be shifted onto the stack before the rule can be applied even once. See chapter 7 for further explanation of this.
Indirect or mutual recursion occurs when the result of the rule does not appear directly on its right hand side, but does appear in rules for other nonterminals which do appear on its right hand side. For example:
expr: primary '+' primary | primary ; primary: constant | '(' expr ')' ;defines two mutually-recursive nonterminals, since each refers to the other.
For example, the calculator calculates properly because the value associated with each expression is the proper number; it adds properly because the action for the grouping `x + y' is to add the numbers associated with x and y.
In this section defining the semantics of a language will be addressed. The following topics will be covered:
Bisonc++'s default is to use type int
for all semantic values. To specify
some other type, the directive %stype
must be used, like this:
%stype doubleThis directive must go in the directives section of the grammar file (see section 5.1). As a result of this, the parser class will define a private type
STYPE
as double
: Semantic values of language
constructs always have the type STYPE
, and an internally used data member
d_val
that could be used by the lexical scanner to associate a semantic
value with a returned token is defined as
STYPE d_val;
int
or
double
, while a string needs type std::string
, and an identifier might
need a pointer to an entry in a symbol table.
To use more than one data type for semantic values in one parser, Bisonc++ offers the following feature:
%union
directive (see section 5.6.11).
%token
directive (see section 5.6.1)
and for groupings with the %type
directive (see section
5.6.4).
union
fields cannot contain
class
objects as fields, and thus that such objects must be accessed
through pointers. So, if the semantic value of a textual token is returned in
std::string
object, then the union
should have a std::string *
field. Consequently, the programmer must take care to delete the memory
pointed to by the string *
fields whenever these fields become
obsolete. However, this requirement is not too hard to meet, and so union
constructions containing pointers to class type objects are frequently used in
practice.
An action consists of C++ statements surrounded by braces, much like a compound statement in C++. It can be placed at any position in the rule; it is executed at that position. Most rules have just one action at the end of the rule, following all the components. Actions in the middle of a rule are tricky and should be used only for special purposes (see section 5.5.5).
The C++ code in an action can refer to the semantic values of the
components matched by the rule with the construct $n
, which stands for the
value of the nth component. The semantic value for the grouping being
constructed is $$
. (Bisonc++ translates both of these constructs into
array element references when it copies the actions into the parser file.)
Here is a typical example:
exp: ... | exp '+' exp { $$ = $1 + $3; } | ...This rule constructs an
exp
from two smaller exp groupings connected
by a plus-sign token. In the action, $1
and $3
refer to the semantic
values of the two component exp groupings, which are the first and third
symbols on the right hand side of the rule. The sum is stored into $$
so
that it becomes the semantic value of the addition-expression just recognized
by the rule. If there were a useful semantic value associated with the `+'
token, it could be referred to as $2
.
If you don't specify an action for a rule, Bisonc++ supplies a default:
$$ = $1
. Thus, the value of the first symbol in the rule becomes the value
of the whole rule. Of course, the default rule is valid only if the two data
types match. There is no meaningful default action for an empty rule; every
empty rule must have an explicit action unless the rule's value does not
matter.
Using $n
with n equal to zero or a negative value is allowed for
reference to tokens and groupings on the stack before those that match the
current rule. This is a very risky practice, and to use it reliably you
must be certain of the context in which the rule is applied. Here is a case in
which you can use this reliably:
foo: expr bar '+' expr { ... } | expr bar '-' expr { ... } ; bar: // empty | { previous_expr = $0; } ;As long as
bar
is used only in the fashion shown here, $0
always refers to the expr
which precedes bar in the definition of foo
.
But as mentioned: it's a risky practice, which should be avoided if at all
possible. See also section 6.5.1.
$$
and
$n
constructs always have that data type.
If you have used a %union
directive to specify a variety of data types,
then you must declare a choice among these types for each terminal or
nonterminal symbol that can have a semantic value. Then each time you use
$$
or $n
, its data type is determined by which symbol it refers to in
the rule. In this example,
exp: ... | exp '+' exp { $$ = $1 + $3; }
$1
and $3
refer to instances of exp, so they all have the data
type declared for the nonterminal symbol exp. If $2
were used, it would
have the data type declared for the terminal symbol '+
', whatever that
might be.
Alternatively, you can specify the data type when you refer to the value,
by inserting `<type>
' after the `$
' at the beginning of the
reference. For example, if you have defined types as shown here:
%union { int u_int; double u_double; };then you can write
$<u_int>1
to refer to the first subunit of the rule
as an integer, or $<u_double>1
to refer to it as a double.
A mid-rule action may refer to the components preceding it using $n
, but
it may not (cannot) refer to subsequent components because it is executed
before they are parsed.
The mid-rule action itself counts as one of the components of the rule. This
makes a difference when there is another action later in the same rule (and
usually there is another at the end): you have to count the actions along with
the symbols when working out which number n
to use in $n
.
The mid-rule action can also have a semantic value. The action can set its
value with an assignment to $$
, and actions later in the rule can refer to
the value using $n
. Since there is no symbol to name the action, there is
no way to declare a data type for the value in advance, so you must use the
`$<...>
' construct to specify a data type each time you refer to this
value.
There is no way to set the value of the entire rule with a mid-rule action,
because assignments to $$
do not have that effect. The only way to set the
value for the entire rule is with an ordinary action at the end of the rule.
Here is an example from a hypothetical compiler, handling a let
statement
that looks like ``let (variable) statement
' and serves to create a
variable named variable
temporarily for the duration of statement. To
parse this construct, we must put variable
into the symbol table while
statement is parsed, then remove it afterward. Here is how it is done:
stmt: LET '(' var ')' { $<u_context>$ = pushSymtab(); temporaryVariable($3); } stmt { $$ = $6; popSymtab($<u_context>5); }As soon as `
let (variable)
' has been recognized, the first action is
executed. It saves a copy of the current symbol table as its semantic value,
using alternative u_context
in the data-type union. Then it uses
temporaryVariable()
to add the new variable (using, e.g., a name that
cannot normally be used in the parsed language) to the current symbol
table. Once the first action is finished, the embedded statement (stmt
)
can be parsed. Note that the mid-rule action is component number 5, so
`stmt
' is component number 6.
Once the embedded statement is parsed, its semantic value becomes the value of
the entire let
-statement. Then the semantic value from the earlier action
is used to restore the former symbol table. This removes the temporary
let
-variable from the list so that it won't appear to exist while the rest
of the program is parsed.
Taking action before a rule is completely recognized often leads to conflicts since the parser must commit to a parse in order to execute the action. For example, the following two rules, without mid-rule actions, can coexist in a working parser because the parser can shift the open-brace token and look at what follows before deciding whether there is a declaration or not:
compound: '{' declarations statements '}' | '{' statements '}' ;But when we add a mid-rule action as follows, the rules become nonfunctional:
compound: { prepareForLocalVariables(); } '{' declarations statements '}' | '{' statements '}' ;Now the parser is forced to decide whether to execute the mid-rule action when it has read no farther than the open-brace. In other words, it must commit to using one rule or the other, without sufficient information to do it correctly. (The open-brace token is what is called the look-ahead token at this time, since the parser is still deciding what to do about it. See section 7.1.
You might think that the problem can be solved by putting identical actions into the two rules, like this:
{ prepareForLocalVariables(); } '{' declarations statements '}' | { prepareForLocalVariables(); } '{' statements '}' ;But this does not help, because Bisonc++ does not parse the contents of actions, and so it will not realize that the two actions are identical (Note that Bisonc++ never tries to understand the C++ code in an action).
If the grammar is such that a declaration can be distinguished from a statement by the first token (which is true in C, but not in C++, which allows statements and declarations to be mixed)), then one solution is to put the action after the open-brace, like this:
compound: '{' { prepareForLocalVariables(); } declarations statements '}' | '{' statements '}' ;Now the next token following a recognized
'{'
token would be either
the first declarations
token or the first statements
token, which
would in any case tell Bisonc++ which rule to use, thus solving the problem.
Another (much used) solution is to bury the action inside a support non-terminal symbol which recognizes the first block-open brace and performs the required preparations:
openblock: '{' { prepareForLocalVariables(); } ; compound: openblock declarations statements '}' | openblock statements '}' ;Now Bisonc++ can execute the action in the rule for subroutine without deciding which rule for compound it will eventually use. Note that the action is now at the end of its rule. Any mid-rule action can be converted to an end-of-rule action in this way, and this is what Bisonc++ actually does to implement mid-rule actions.
By the way, note that in a language like C++ the above construction is obsolete anyway, since C++ allows mid-block variable- and object declarations. In C++ a compound statement could be defined, e.g., as follows:
stmnt_or_decl: declarations | pure_stmnt // among which: compound_stmnt ; statements: // empty | statements stmnt_or_decl ; compound_stmnt: open_block statements '}' ;Here, the
compound_stmnt
would begin with the necessary preparations
for local declarations, which would then have been completed by the time they
would really be needed by declarations
.
All token type names (but not single-character literal tokens such as '+' and '*') must be declared. Nonterminal symbols must be declared if you need to specify which data type to use for the semantic value (see section 5.5.2).
The first rule in the file also specifies the start symbol, by default. If
you want some other symbol to be the start symbol, you must use an explicit
%start
directive (see section 3.1).
In this section all Bisonc++ declarations will be discussed. Some of the
declarations have already been mentioned, but several more are available. Some
declarations define how the grammar will parse its input (like %left,
%right
); other declarations are available, defining, e.g., the name of the
parsing function (by default parse()
), or the name(s) of the
files generated by Bisonc++.
In particular readers familiar with Bison (or Bison++) should read this section thoroughly, since Bisonc++'s directives are more extensive and different from the `declarations' offered by Bison, and the macros offered by Bison++.
%token nameBisonc++ will convert this into a
Parser::Tokens
enumeration value
(where `Parser
' is the name of the generated parser class, see section
5.6.7). This allows the lexical scanner member function lex()
to return these token values by name directly, and it allows externally
defined lexical scanners (called by lex()
) to return token values as
Parser::name
. Note that when an externally defined lexical scanner is
used, it should include Parserbase.h
, the parser's base class header file,
in order to be able to use the Parser::Tokens
enumeration type.
Alternatively, you can use %left
, %right
, or %nonassoc
instead of
%token
, if you wish to specify precedence. See section 5.6.2.
It is possible to specify explicitly the numeric code for a token type by appending an integer value in the field immediately following the token name:
%token NUM 300However, this practice is strongly discouraged. It is generally best to let Bisonc++ choose the numeric codes for all token types. Bisonc++ will automatically select codes that don't conflict with each other or with ASCII characters.
In the event that the semantic stack type contain union
elements, the
%token
or other token declaration may be augmented with the data type
alternative delimited by angle-brackets (see section 5.5.2). For
example:
%union // define semantic stack elements { double u_val; symrec *u_tptr; }; %token <val> NUM // define token NUM and its type
%left
, %right
or %nonassoc
directive to declare a token
and specify its precedence and associativity, all at once. These are called
precedence directives. See section 5.6.2, for general
information on operator precedence. The syntaxis of a precedence directive is
comparable to that of %token
: either
%left symbols...or
%left <type> symbols...And indeed any of these directives serves the purposes of
%token
. But
in addition, they specify the associativity and relative precedence for all
the symbols:
op
determines how repeated
uses of the operator nest: whether `x op y op z
' is parsed by grouping
x
with y
first or by grouping y
with z
first. %left
specifies left-associativity (grouping x
with y
first) and
%right
specifies right-associativity (grouping y
with z
first). %nonassoc
specifies no associativity, which means that `x
op y op z
' is not a defined operation, and could be considered an error.
token
may be used in production rules to
overrule the defined precendence of an operator for a particular grammatical
rule. Well known is the construction
expression: '-' expression %prec UMINUS { ... }Here, the default priority and precedence of the
`-'
token as the
subtraction operator is overruled by the precedence and priority of the
UMINUS
token, which is commonly defined as
%right UMINUSfollowing, e.g., the
'*'
and '/'
operators.
In this case acommon list of operators consists of:
%left '+' '-' %left '*' '/' '%' %right UMINUSgiving
'*' '/'
and '%'
a higher priority than '+'
and '-'
,
ensuring at the same time that UMINUS
is given both the highest priority
and a right-associativity.
In the abovementioned production rule the operator-order would cause the construction
'-' expressionto be evaluated from right to left, which a higher precedence than either the multiplication or the addition operators.
%union
is used to specify multiple value types, the value type of
each nonterminal symbol for which values are used must be specified. This is
done with a %type
declaration, like this:
%type <type> nonterminal...Here
nonterminal
is the name of a nonterminal symbol, and type is the
name given in the %union
to the alternative that you want (see section
5.6.11). Any number of nonterminal symbols can be specified in
the same %type
declaration, if they have the same value type. Use spaces
to separate the symbol names.
It is also possible to declare the value type of a terminal symbol. To do this, use the same <type> construction in a declaration for the terminal symbol. All kinds of token declarations allow <type>.
%expect
declaration. The declaration looks like this:
%expect nHere
n
is a decimal integer. The declaration says there should be no
warning if there are n
shift/reduce conflicts and no reduce/reduce
conflicts. The usual warning is given if there are either more or
fewer conflicts, or if there are any reduce/reduce conflicts.
In general, using %expect
involves these steps:
%expect
. Use the `-V
' option to
get a verbose list of where the conflicts occur. Bisonc++ will also print the
number of conflicts.
%expect
declaration, copying the number n
from the
number which Bisonc++ printed.
%start
directive as follows:
%start symbol
Parser
. If this is
not appropriate, the directive %class-name
may be used as follows:
%class-name parser-class-nameThis directive defines the name of the C++ class that will be generated. It may be defined only once.
If you're familiar with the Bison++ program, please note:
namespace
namespace
. By default
no namespace is defined. If this options is used the
implementation header will contain a commented out using
namespace
directive for the requested namespace. This
directive is overridden by the --namespace command-line
option.
$-1
or $<type>-1
, where type
is a
%union
field-name. See also the sections 5.5.3 and
6.5.1.
typename
should be the name of an unstructured type (e.g.,
size_t/*unsigned*/). By default it is int. See YYSTYPE in
bison. It should not be used if a %union specification is
used. Within the parser class, this type may be used as
STYPE.
%union
directive specifies the entire collection of possible data
types for semantic values. The keyword %union
is followed by a pair of
braces containing the same thing that goes inside a union in C++.
For example:
%union { double u_val; symrec *u_tptr; };This says that the two alternative types are
double
and symrec
*
. They are given names u_val
and u_tptr
; these names are used in the
%token
and %type
directives to pick one of the types for a terminal or
nonterminal symbol (see section 5.6.4).
Notes:
union
definitions. Consequently, they cannot be used in %union
directives either. When a class type variant is required, a
pointer to a class object should be used. The responsibility for
the correct destruction of the pointed-to class objects lies with the
programmer. See also section 5.5.2.
struct LTYPE { int timestamp; int first_line; int first_column; int last_line; int last_column; char *text; };
typename
should be the name of an
alternate (predefined) type (e.g., size_t/*unsigned*/). It should not be
used if a %locationstruct specification is used (see
below). Within the parser class, this type may be used as
LTYPE.
struct-definition
header
header
as the pathname to the file pre-included in the
parser's class header. See the description of the --scanner
option for details about this option. Similar to the convention
adopted for this argument, header
will (by default) be
surrounded by double quotes. However, when the argument is
surrounded by pointed brackets #include <header>
will be
included. Note that using this directive implies the definition
of a composed Scanner d_scanner
data member into the generated
parser, as well as a predefined int lex() member, returning
d_scanner.yylex()
. If this is inappropriate, a user defined
implementation of int lex() must be provided.
<CLASS>
should be
interpreted as the name of the parser's class, Parser
by default, but
configurable using %class-name
(see section 5.6.7
<Class>Base.h
, configurable using
%baseclass-header
(see section 5.6.16.1) or %filenames
(see section 5.6.16.5);
<Class>.h
, configurable using
%class-header
(see section 5.6.16.2) or %filenames
(see section 5.6.16.5);
5.6.16.1: %baseclass-header: the Parser's Base Class header
%baseclass-header header
Defines the pathname of the file containing the parser's base
class. This directive is overridden by the
--baseclass-header or -b command-line options.
5.6.16.2: %class-header: the Parser's Class header
%class-header header
Defines the pathname of the file containing the parser
class. This directive is overridden by the
--class-header or -c command-line options.
5.6.16.3: %implementation-header: the Implementation Header
%implementation-header header
Defines the pathname of the file containing the implementation
header. This directive is overridden by the
--implementation-header or -i command-line options.
5.6.16.4: %parsefun-source: the parse() function's sourcefile
%parsefun-source source
Defines the pathname of the file containing the parser member
parse(). This directive is overridden by the
--parse-source or -p command-line options.
5.6.16.5: %filenames: the generic filenames
%filenames header
Defines the generic name of all generated files, unless overridden
by specific names. This directive is overridden by the
--filenames or -f command-line options.
header
header
as the pathname to the file pre-included in the
parser's base-class header. See the description of the
--baseclass-preinclude option for details about this option.
Like the convention adopted for this argument, header
will (by
default) be surrounded by double quotes. However, when the
argument is surrounded by pointed brackets #include <header>
will be included.
#ifdef DEBUG
macros are used
anymore. By rerunning bisonc++ without the --debug option an
equivalent parser is generated not containing the debugging
code.
|
). Each
alternative should begin with a unique identifying terminal token. The
terminal token may actually be hidden in a non-terminal rule, in which case
that nonterminal can be used as an alias for the non-terminal. In fact
identical terminal tokens may be used if at some point the terminal tokens
differ over different alternatives. Here are some examples:
// Example 1: plain terminal distinguishing tokens expr: ID | NUMBER ; // Example 2: nested terminal distinguishing tokens expr: id | number ; id: ID ; number: NUMBER ; // Example 3: eventually diverting routes expr: ID id | ID number ; id: ID ; number: NUMBER ;
ASCII
characters define a string, and multiple white-space delimited strings are
handled as one single string:
"hello" // multiple ws-delimited strings " " "world" "hello world" // same thingUsually a parser is responsible for concatenating the individual string-parts, receiving one or more
STRING
tokens from the lexical
scanner. A string
rule handles one or more incoming STRING
tokens:
string: STRING | string STRINGThe above rule can be used as a prototype for recognizing a series of elements. The token
STRING
may of course be embedded in another rule. The
generic form of this rule could be formulated as follows:
series: unit | series unitNote that the single element is first recognized, whereafter the left-recursive alternative may be recognized repeatedly.
opt_statements: // empty | opt_statements statementsThe above rule can be used as a prototype for recognizing a series of elements: the generic form of this rule could be formulated as follows:
opt_series: // empty | opt_series unitNote that the empty element is recognized first, even though it is empty, whereafter the left-recursive alternative may be recognized repeatedly. In practice this means that an action block may be defined at the empty alternative, which is then executed prior to the left-recursive alternative. Such an action block could be used to perform initializations necessary for the proper handling of the left-recursive alternative.
variables: IDENTIFIER | variables ',' IDENTIFIERThe above rule can be used as a prototype for recognizing a series of delimited elements. The generic form of this rule could be formulated as follows:
series: unit | series delimiter unitNote that the single element is first recognized, whereafter the left-recursive alternative may be recognized repeatedly. In fact, this rule is not really different from the standard rule for a series, which does not hold true for the rule to recognize an optional series of delimited elements, covered in the next section.
opt_parlist: // empty | opt_parlist ',' parameterTo define an optional series of delimited elements two rules are required: one rule handling the optional part, the other the delimited series of elements. So, the correct definition is as follows:
opt_parlist: // empty | parlist ; parlist: parameter | parlist ',' parameter ;Again, the above rule pair can be used as a prototype for recognizing an optional series of delimited elements. The generic form of these rules could be formulated as follows:
opt_series: // empty | series ; series: element | series delimiter elementNote that the
opt_series
rules neatly distinguishes the no-element
case from the case were elements are present. Usually these two cases need to
be handled quite differently, and the opt_series
rules empty alternative
easily allows us to recognize the no-elements case.
expr: NUMBER | ID | expr '+' expr | ... | '(' expr ')' ;This definition is simply characterized that the non-terminal
expr
appears within a set of parentheses, which is not too complex.
The definition of opt_statements
, however, is a bit more complex. But
acknowledging the fact that a statement
contains among other elements a
compound statement, and that a compound statement, in turn, contains
opt_statements
an opt_statements
construction can be formulated
accordingly:
opt_statements: // define an optional series // empty | opt_statements statement ; statement: // define alternatives for `statement' expr_statement | if_statement | ... | compound_statement ; compound_statement: // define the compound statement itself '{' opt_statements '}' ;
%class-name
directive, and construct parser objects
of each of the defined classes.