Syntax

Syntax is a central part of Storm. The Storm compiler provides a parser that reads syntax from the type system. This system is then used to implement other languages, like Basic Storm. The syntax definitions in the compiled are designed so that it is easy to choose a set of packages and combine their syntax, and in that way extend the language you are using.

This section will not discuss how syntax rules are written, please refer to BNF Syntax to learn how to use the built in syntax definition language.

Definition

The compiler implements a parser for context free grammars (with a few extensions, as we will see), and therefore rules look a lot like regular BNF grammar. Grammars in Storm consist of a set of rules. Each rule consists of a number of alternative matches, which are called options in Storm. Each of these options contains a number of tokens. A token is either a reference to another rule that should be matched, or a regular expression.

This is slightly different compared to regular BNF syntax. The BNF syntax does not have a separate entry for options, but instead includes the alternative operator | to accomplish the same goal. Storm provides the separate concept of options to allow rules to be easily extended from other parts of the system, and to make the semantics of constructing a syntax tree clearer.

Storm syntax also diverges slightly from BNF by allowing the use of repetitions. Each option may contain zero or one repetition. Currently, the repetitions present in regular expressions are supported, i.e. zero or one, zero or more and one or more times.

Semantics

When parsing a string, using core.lang.Parser<X>, the caller provides the rule that the parser should start from. The parser starts from there and finds a valid parse tree that matched the given string, or returns an error. The parser tries to match as far as possible in the given source string, and returns how much of the source string successfully matches the input. This allows the user of the parser to decide if a complete match is an error or not.

Some times, grammars are ambiguous. The Storm parser helps you to specify which of the many valid parses should be chosen by looking at the priority that is assigned to each option. This priority is an integer, and higher numbers have priority. Using these priorities, the storm parser tries to emulate the behavior of Parsing Expression Grammar, where higher priorities are executed "first". Since the Storm parser does not work the same way, there may be differences in some cases, but in return Storm allows left-recursive rules.

Regular expressions are always greedy, which means that they sometimes interact in unexpected ways with the rest of the tokens. This is most clearly illustrated by the following example:

A ::= "a*" "a"
B ::= "a"* "a"

In this case, the rule A uses two different regular expressions to match a sequence of at least one a. However, this rule will never succeed, since the regular expression "a*" always matches greedily, leaving nothing left for the final "a". The rule B, on the other hand, works as expected since we are instead instructing the parser to match the regular expression zero or more times. This is not greedy.

Output

Each rule and option declared in Storm live as a type in the type system. Therefore, the root rule when parsing is specified as a parameter to the Parser type like: core.lang.Parser<Root>. Rules generate types with the same name as the rule, and options generate either anonymous types or named types depending on how they are declared. These types are used to represent the syntax tree from a parse. The types representing each option inherit from the type representing the rule, and in that way the syntax tree can be conveniently represented in the type system.

By calling the tree member on the Parser object, you create a syntax tree that can be explored and manipulated from any Storm language. Each node contains variables corresponding to the declaration of the rule. Exact names of these variables depend on the language that were used to declare the syntax. The type of each variable corresponds to the rule that was matched, and if the match is inside a repetition block, it could be Array<T> or Maybe<T> as well. Regular expressions are represented using the core.lang.SStr type in the syntax tree.

Transform

To conveniently transform the syntax tree into another representation, each type generated by rules and options provide the transform function. The transform function is automatically generated according to how the rules were declared. Each option overloads the rule's version with the specific code required for that option.

Each option has a function call declared, that tells the parser how to construct that part of the syntax tree when that option is matched. Each of the tokens in the option may also be bound to a variable and passed on to the function. These variables are evaluated by calling the matched sub-rule whenever the value of the variable is needed. Regular expressions may also be bound to variables, in which case a Str is created. It is also possible to mark a part of the tokens in the option, and bind these to a variable as a Str. The start position of the rule is bound to the variable pos.

Aside from the function call, it is also possible to execute member functions on the return value before it is returned. This is useful when tokens are within a repetition syntax. In this case, the function call may create an array or other similar container, and instruct the parser to call the push member of the array for each item in the repetition syntax. When the result is created it is bound to the variable me in case it is needed.

To ease the development of transforms, syntax rules may also be declared to take a number of parameters. These parameters are treated as regular variables.

Everything in the syntax is resolved lazily. This means, for example, that any rules you refer to does not need to be declared until the rule is actually considered by the parser. This can be convenient if you know that the user will always include another package when using your syntax, or something similar. The same holds true for function calls. The actual function and overload to be called is not decided until the call is about to be made. This means that function calls does not necessarily have to be valid if they are never used, and that it is possible to express some kind of overloads that are not possible in other, more static, languages. For example, a rule could call the function a(x), where x can be either a A or a B. In this case, we can declare two versions of a(): a(A x) and a(B x), even though A and B are not related. In other languages it is necessary to declare it like a(Object x), and then manually check which type was actually sent to a. Another convenient usage of this is that it is possible to indicate that options should not return anything by specifying them returning the function call void (or anything, really). Since the function call will never be used unless it actually has to be used, the parser will work without problems as long as no one tries to get the result from the rule.

Parsing and threading

At the moment, the parser is always executed on the Compiler thread. When creating the syntax tree, only classes and actors are supported. Furthermore, actors have to be declared to run on the compiler thread, since the parser does not yet implement support for message passing. This restriction will probably be lifted in the future.