Parser Library
The parser library (in the parser
package) implements various parsers that are faster (but weaker)
than the compiler's parser. In general, they do not support all functionality of the compiler's
parser, and are not as flexible. On the other hand, they are pre-compiled (and have no startup time)
and are much faster. They are also able to parse text on any thread, and not locked to the Compiler
thread as the compiler's parser.
The limitations and characteristics of the parsers are outlined below. If a parser is fed grammar that is incompatible with the parser, an error is generated at compile-time unless otherwise noted. Another general difference between the compiler's parser and the parsers in this library is that these parsers typically start evaluating transform functions before parsing is complete. As such, some transform functions might be executed even if the string is not in the language described by the grammar.
The following parsers are supported:
-
Recursive Descent (names:
recursive
,recursive descent
)The recursive descent parser accepts LL(1) grammars, which essentially means:
- No left-recursion is allowed.
- Multiple productions for a non-terminal needs to be distinguished by the first token present (either directly or indirectly).
This parser transforms the parse tree directly, and as such it is not possible to capture parts of the parse tree in the syntax transformations. Due to how the parser works, it is also not possible to pass parameters to productions from tokens appearing later in the text.
-
Backtracking Recursive Descent (names:
backtracking
,backtracking recursive
,backtracking recursive descent
)This is an extension of the recursive descent parser. It allows parsing grammars that are not LL(1) by utilizing backtracking. This means that the backtracking parser is potentially slower than the non-backtracking variant. It also means that side-effects from backtracking attempts might be visible to the actions specified in the grammar. For example, if the backtracking parser parses a production successfully but later determines that it does not match, the side-effects of the production will still be visible (e.g. object creation, member function calls).
Syntax
The Basic Storm syntax is described below.
To generate a parser, use the following syntax:
<name> : parser(<type>, [binary]) { start = <start rule>; <syntax>; }
-
<name>
is the name of the generated parser. A function with the name<name>
will be generated. It can be called as either<name>(<input>)
or<name>(<input>, <start>)
, where<input>
is a string and<start>
is an iterator. -
<type>
is the type of the parser to use. It can be any of the ones mentioned above. For examplerecursive
orrecursive descent
uses the recursive descent parser, orbacktracking
uses the backtracking recursive descent parser. -
[binary]
generates a parser in binary mode if it is included. This means that the parser accepts its input as aBuffer
instead of aStr
, and iterators areNat
values, notStr:Iter
. As a consequence of this, captured parts of the input evaluate toBuffer
instead ofStr
. Therefore, you typically want to declare all syntax inside the parser block if you use the binary mode. Otherwise, you have to ensure that the syntax transforms type-check both forStr
andBuffer
(it is of course possible to do, but might be tedious). -
<start rule>
indicates the rule that shall be used as the start rule for the parser. The rule used here does not need to be defined inside the syntax inside the parser block. -
<syntax>
is syntax as specified in the syntax language. The syntax will be placed in an anonymous sub-package of the current package. Both the anonymous sub-package and the current package will be visible by default.use
statements can be used to import more grammar from other packages.
Special Rules
The package parser.special
contains a set of special rules. These are special in the sense that
some parsers are able to implement them even if they are not possible to express in context-free
grammars. See the file syntax.bnf
in the parser.special
package for more details.
One example of rules that fall into this category are the rules Bytes(Nat)
and Chars(Nat)
. These
match a pre-determined number of bytes and codepoints. These are only available in the recursive
descent parsers at the moment.
Another example are the rules Indent
, MinIndent
, and ExactIndent
. They can be used to parse
indent-sensitive languages (such as markdown). They match the indent specified by the supplied
parameters.