The Parse Tree
The Syntax Language automatically generates a representation for the parse tree based on the
grammar. This representation is visible in the name tree, and therefore
accesible from other languages. Since the grammar representation is implemented as subclasses to
core:lang:Type
, they are usable from languages who are not aware of the specific entities produced
by the Syntax Language. In particular, it is possible to inspect, manipulate, and even extend the
representation from Basic Storm, even if Basic Storm is not aware of the grammar representation.
This section focuses on the types generated by the Syntax Language to represent the parse tree. To illustrate the contents of the types, we will use Basic Storm.
Note: The types presented here are not entirely complete. This section focuses on the data
members of the types. The types also contain a function called transform
that is presented in the
next section.
Types
Rules and productions are represented by entities that inherit from core:lang:Type
in the name
tree. As such, they appear as types for most languages in the system. As mentioned above, these
types are used to represent the parse tree.
Rules are represented by an actor that inherits from the actor lang:bnf:Node
(declared to run on
the Compiler
thread). Productions are represented by classes that inherit from the rule they
extend. For example, consider the following grammar:
void A(); A : "a" = A1; A : "b" = A2;
Since the productions are named, they will have the names A1
and A2
in the name tree. Anonymous
productions are also visible in the name tree, but they will have names that are difficult to access
in most languages. This grammar generates the following types:
class A extends lang:bnf:Node {} class A1 extends A {} class A2 extends A {}
Parsing the string a
would result in a parse tree that consists of an instance of the class A1
,
and parsing the string b
would result in a parse tree that consists of an instance of the class
A2
.
Captured Subtrees
So far, the parse tree is not very interesting since none of the classes contain any data. The
exception is the data in lang:bnf:Node
, that contains the member pos
. The pos
member contains
the position of the match that corresponds to the node in the source code.
To include data in the parse tree, it is necessary to capture parts of the match. This is done by specifying a name (an identifier) after a token in a production. Capturing a regex token stores the string matched by the regex, and capturing a rule stores the parse tree that corresponds to the rule. There is also another way to capture tokens, which we will see in the next section.
To illustrate the idea, consider the following grammar that matches numbers:
void SNumber(); SNumber : SDigits digits = SInteger; SNumber : SDigits whole - "\." - SDigits decimal = SReal; void SDigits(); SDigits : "[0-9]+" digits = SDigitsMatch;
It generates the following types in the name tree:
class SNumber extends lang:bnf:Node {} class SInteger extends SNumber { B digits; } class SReal extends SNumber { B integer; B fraction; } class SDigits extends lang:bnf:Node {} class SDigitsMatch extends SDigits { lang:bnf:SStr digits; }
Note: The class SStr
is a class that is used to store strings captured from parsing. It can be
thought of as a small parse tree that stores a single string, and the position in the source code
where it appeared. It has a member, transform
, that can be used to extract the string. The
position is available as the member pos
, and the string is available as v
(for value).
Repeated Tokens
If tokens inside a repetition syntax are captured, the type of the tokens will be modified according
to the repetition type. This is done both to be able to store all matches, and to ensure
type-safety. If the token is repeated zero or one times (i.e. if ?
was used), then the type of the
captured member will have the type Maybe<T>
. In other cases (i.e. for *
or +
), then the
captured member will have the type Array<T>
. For example, consider the following grammar that
matches a number using the repetition syntax:
void SNumber(); SNumber : SDigits whole - ("\." - SDigits decimal)? = SNumberMatch; void SDigits(); SDigits : ("[0-9]" digit)+ = SDigitsMatch;
It will produce the following types:
class SNumber extends lang:bnf:Node {} class SNumberMatch extends SNumber { SDigits whole; // Not inside the repetition, captured as normal. SDigits? decimal; // Captured as a Maybe<SDigits> since it may not be present. } class SDigits extends lang:bnf:Node {} class SDigitsMatch extends SDigits { lang:bnf:SStr[] digit; // Captured as Array<SStr> since there may be multiple matches. }
Capturing Substrings
It is also possible to capture a substring of a part of a rule using a syntax similar to the repetition syntax. In fact, the ability to capture a substring in this manner is mutually exclusive with the repetition syntax, as they are stored similarly internally.
A substring can be captured by adding an identifier after a set of parentheses. For example,
consider the grammar below that matches the content of a pair of curly braces ({}
) that may
contain nested braces.
void BraceLiteral(); BraceLiteral : "{" - (BraceContent) content - "}"; void BraceContent(); BraceContent : "{" - BraceContent - "}"; BraceContent : "[^{}]+" - BraceContent; BraceContent : ;
In this case, the text matched by BraceContent
in the production to BraceLiteral
will be
captured as a string (using the SStr
class), even though the parse tree for the BraceContent
is
not captured.