The Parse Tree

The Syntax Language automatically generates a representation for the parse tree based on the grammar. This representation is visible in the name tree, and therefore accesible from other languages. Since the grammar representation is implemented as subclasses to core:lang:Type, they are usable from languages who are not aware of the specific entities produced by the Syntax Language. In particular, it is possible to inspect, manipulate, and even extend the representation from Basic Storm, even if Basic Storm is not aware of the grammar representation.

This section focuses on the types generated by the Syntax Language to represent the parse tree. To illustrate the contents of the types, we will use Basic Storm.

Note: The types presented here are not entirely complete. This section focuses on the data members of the types. The types also contain a function called transform that is presented in the next section.

Types

Rules and productions are represented by entities that inherit from core:lang:Type in the name tree. As such, they appear as types for most languages in the system. As mentioned above, these types are used to represent the parse tree.

Rules are represented by an actor that inherits from the actor lang:bnf:Node (declared to run on the Compiler thread). Productions are represented by classes that inherit from the rule they extend. For example, consider the following grammar:

void A();
A : "a" = A1;
A : "b" = A2;

Since the productions are named, they will have the names A1 and A2 in the name tree. Anonymous productions are also visible in the name tree, but they will have names that are difficult to access in most languages. This grammar generates the following types:

class A extends lang:bnf:Node {}

class A1 extends A {}

class A2 extends A {}

Parsing the string a would result in a parse tree that consists of an instance of the class A1, and parsing the string b would result in a parse tree that consists of an instance of the class A2.

Captured Subtrees

So far, the parse tree is not very interesting since none of the classes contain any data. The exception is the data in lang:bnf:Node, that contains the member pos. The pos member contains the position of the match that corresponds to the node in the source code.

To include data in the parse tree, it is necessary to capture parts of the match. This is done by specifying a name (an identifier) after a token in a production. Capturing a regex token stores the string matched by the regex, and capturing a rule stores the parse tree that corresponds to the rule. There is also another way to capture tokens, which we will see in the next section.

To illustrate the idea, consider the following grammar that matches numbers:

void SNumber();
SNumber : SDigits digits = SInteger;
SNumber : SDigits whole - "\." - SDigits decimal = SReal;

void SDigits();
SDigits : "[0-9]+" digits = SDigitsMatch;

It generates the following types in the name tree:

class SNumber extends lang:bnf:Node {}

class SInteger extends SNumber {
    B digits;
}

class SReal extends SNumber {
    B integer;
    B fraction;
}


class SDigits extends lang:bnf:Node {}

class SDigitsMatch extends SDigits {
    lang:bnf:SStr digits;
}

Note: The class SStr is a class that is used to store strings captured from parsing. It can be thought of as a small parse tree that stores a single string, and the position in the source code where it appeared. It has a member, transform, that can be used to extract the string. The position is available as the member pos, and the string is available as v (for value).

Repeated Tokens

If tokens inside a repetition syntax are captured, the type of the tokens will be modified according to the repetition type. This is done both to be able to store all matches, and to ensure type-safety. If the token is repeated zero or one times (i.e. if ? was used), then the type of the captured member will have the type Maybe<T>. In other cases (i.e. for * or +), then the captured member will have the type Array<T>. For example, consider the following grammar that matches a number using the repetition syntax:

void SNumber();
SNumber : SDigits whole - ("\." - SDigits decimal)? = SNumberMatch;

void SDigits();
SDigits : ("[0-9]" digit)+ = SDigitsMatch;

It will produce the following types:

class SNumber extends lang:bnf:Node {}

class SNumberMatch extends SNumber {
    SDigits whole;    // Not inside the repetition, captured as normal.
    SDigits? decimal; // Captured as a Maybe<SDigits> since it may not be present.
}


class SDigits extends lang:bnf:Node {}

class SDigitsMatch extends SDigits {
    lang:bnf:SStr[] digit; // Captured as Array<SStr> since there may be multiple matches.
}

Capturing Substrings

It is also possible to capture a substring of a part of a rule using a syntax similar to the repetition syntax. In fact, the ability to capture a substring in this manner is mutually exclusive with the repetition syntax, as they are stored similarly internally.

A substring can be captured by adding an identifier after a set of parentheses. For example, consider the grammar below that matches the content of a pair of curly braces ({}) that may contain nested braces.

void BraceLiteral();
BraceLiteral : "{" - (BraceContent) content - "}";

void BraceContent();
BraceContent : "{" - BraceContent - "}";
BraceContent : "[^{}]+" - BraceContent;
BraceContent : ;

In this case, the text matched by BraceContent in the production to BraceLiteral will be captured as a string (using the SStr class), even though the parse tree for the BraceContent is not captured.