Terminology
This section of the manual deals with many concepts from the subject of formal languages, which might be an area that not all readers are familiar with. Furthermore, some terms might be a bit vague when applied to parsers or compilers in general. For these reasons, we provide an overview of the most important concepts, and how they are used in Storm:
-
Context-Free Grammars
A context-free grammar is defined by a set of terminal symbols, a set of nonterminal symbols, and a set of productions. Terminal symbols are symbols that may appear in the text that is being parsed. Nonterminal symbols are identifiers that are used to refer to productions in the grammar. Productions then specify how nonterminals can be expanded into a sequence of terminals and nonterminals.
-
BNF (Backus Naur Form)
A context-free grammar can be expressed in BNF form. This is a way in which context-free grammars can be written, and it is similar to the syntax used in the Syntax Language.
-
Terminal Symbols
A terminal symbol is a symbol that may appear in the text of the parsed text (i.e. typically the source code of a program). In the Syntax Language, terminal symbols are expressed as regular expressions enclosed in double qoutes (
"
). -
Nonterminal Symbols
A nonterminal symbol is a symbol that does not appear in the parsed text. Rather, it is used to refer to a set of productions by name.
-
Rules
Storm uses the term rule to refer to nonterminal symbols. First, it is shorter and more concise thant "nonterminal", and secondly it better reflects the idea that it refers to a set of productions that match some portion of text.
-
Productions
A production is one of the possible expansions of a rule. It states that a particular identifier can be expanded to a sequence of terminal symbols and rules.
-
Parse Tree
In this manual, we use the term parse tree to refer to the tree created by the parser when parsing an input string. The parse tree therefore closely resembles the parsed string, and does not contain much language-specific information.
The Syntax Language automatically creates a typesafe representation of the parse tree that can be inspected, manipulated, and extended by other languages. Most importantly, the parse tree can be transformed into an abstract syntax tree using the annotations in the Syntax Language.
-
Abstract Syntax Tree (AST)
An abstract syntax tree is the next step in the compilation pipeline after the parse tree. The abstract syntax tree can be generated by transforming the parse tree using the annotations specified in the Syntax Language. In comparison to a parse tree, an AST is language-specific, and the Syntax Language therefore does not attempt to generate an AST directly.
Even though it is typically the case that languages utilize an AST to represent the code before generating machine code, it is not strictly necessary to do so. The syntax transforms in the Syntax Language are powerful enough to allow languages to generate code directly from the parse tree.
-
Info Tree
An info tree is a special form of a parse tree that is complete (i.e. it does not exclude branches that are not explicitly captured), but that does not contain any type information. This type of parse tree is typically used to debug grammars, or to perform syntax highlighting (possibly with error correction).
-
Syntax Transforms
This is the name given to the annotations in the grammar that the Syntax Language uses to transform a parse tree into an AST.