A New Language in Storm
This part of the tutorial describes how to create a new language in Storm. In a way, this tutorial can be seen as connecting the pieces used in all the previous tutorials, in order to create something larger. It is, however, worth to note that this tutorial does not aim to describe all concepts behind creating a language in detail. The goal is rather to illustrate how to implement common patterns in Storm. That being said, if you have followed the previous tutorials, you are in a good position to understand what will be done in this tutorial. After all, in my opinion, the hard part of writing compilers is to manage the complexity of the problem through good abstractions. Given good enough abstractions, the problems that a compiler need to solve are not extremely difficult (but still not always trivial). Since this tutorial will suggest abstractions and use the ones from Storm, it should not be extremely difficult to follow, even if you have not previously studied compiler construction.
Goal
The end goal of this tutorial is to create a new and simple language that we call demo. It will be
saved in file with the extension .demo that Storm will understand. The language itself allows
defining functions that are callable from other languages in Storm. Each function simply evaluates
an arithmetic expression and returns the result of said expression to the caller. For simplicity,
the language only works with integers. It is, however, not extremely difficult to extend this into
other types as well. It just requires a bit more code that would make the underlying ideas less
clear.
To illustrate the language, consider the following example that defines three functions f, g,
and h. f and g take two parameters, while h only takes one:
f(a, b) = a + b + 1 g(a, b) = a / b h(a) = bsfn(f(a, a), 5)
The first function computes a + b + 1 and returns the result, the second evaluates a / b and
returns the result. The third function first calls the function f with the parameters a and a.
Then uses the result as a parameter to the function bsfn that we will implement in Basic Storm,
and returns the result from that function. Note that we do not need any type definitions, since the
language only supports integers.
As we can see from the example, the language is fairly simple. Even so, it is able to utilize the strengths of the Storm platform: it is able to seamlessly interact with other languages in the system. As we shall see, this is something we get almost for free from the system, and not something we need to spend much effort to achieve.
Similarly, with the help of the language server, we get syntax highlighting almost for free as well.
Setup
This tutorial requires a bit more setup compared to the previous tutorials. We first need to create
a directory, for example language that we use to store our project in. Inside the directory, we
need two subdirectories. We will call one of them demo. We will load this directory as the package
lang.demo, and it will contain the implementation of our language. If you have installed Storm in
your home directory, you can simply create this directory as a subdirectory of root/lang directly
instead. Apart from the demo directory, we also need a test directory. This is where we will
store the test program in the demo language, alongside some code in Basic Storm to call the
functions we define in our new language.
During this tutorial, we will create the files demo.bs, ast.bs, and syntax.bnf in the demo
directory. Similarly, we will create the files test.bs and demo.demo in the test directory.
You can create these files now if you wish, or you can create them as needed throughout the
tutorial. In summary, the directory structure should look something like this:
language │ ├─ demo │ ├─ ast.bs │ ├─ demo.bs │ └─ syntax.bnf └─ test ├─ demo.demo └─ test.bs
When you have created the directories (or some of the files), you can then run the code using the
following command. This assumes that your terminal is in the language directory in the hierarchy
above.
storm -I lang.demo ./demo ./test
This command asks Storm to import the directory ./demo as lang.demo, and then to import the
directory test as test and then run the main function inside test. Since we have not added
any code yet, this will currently just start the top loop for Basic Storm.
Note: As mentioned earlier, if you have installed Storm in your home directory, you can place the
entire demo directory inside root/lang instead. This makes Storm load the package demo by
default, which means that you can omit the -I flag, and run Storm by simply typing storm ./test
instead.
As in the other tutorials, the code produced by this tutorial is available in the directory
root/tutorials/language. Note, however, that the code there contains some additional logic to
compensate for the fact that the demo directory is not loaded as core.lang. This makes it
possible to run the code by typing tutorials:language:main in the Basic Storm top-loop.
Teaching Storm to Handle .demo-files
The first step of creating a language is to teach Storm how to handle the file type that we wish to
use for our language. If we create a file with the extension .demo in the test directory, Storm
will actually give us a hint of how to do this. We start by creating the file demo.demo in the
test directory. We add the following content to the file to let us test our implementation later
on:
f() = 1
We also create a simple main function in the test.bs file in the test directory so that we
don't have to exit the top-loop every time we run Storm. For now, we simply try to call the function
f in our file demo.demo and print the result:
void main() { Int result = f(); print("Result: ${result}"); }
If we run the program at this point (see the command-line above), we get the following messages:
WARNING storm::Array<storm::PkgReader*>* storm::Package::createReaders( storm::Map<storm::SimpleName*, storm::PkgFiles*>*): No reader for [./../language/test/demo.demo] (should be lang.demo.reader(core.Array(core.io.Url), core.lang.Package)) @/home/storm/language/test/test.bs(28-29): Syntax error: Can not find "f" with parameters ().
The first message is a warning that tells us that Storm does not know what to do with files of the
type .demo, since there is no reader for the type. It then proceeds to tell us that it attempted
to find a function named lang.demo.reader(Array<core.io.Url>, core.lang.Package) to handle the
file type. The second error simply tells us that Basic Storm was unable to find the function f,
which is not surprising since we have not yet implemented the demo language to define the
function.
The first message is, however, interesting. It tells us that if we define a function called
lang.demo.reader(...), we get the opportunity to handle the new file type! Let's try implementing
it by adding the following code to the demo.bs file in the demo directory (that we load into the package lang.demo as indicated by the warning):
use core:io; use core:lang; lang:PkgReader reader(Url[] files, Package pkg) on Compiler { print("We should load ${files}"); lang:PkgReader(files, pkg); }
If we run the program again, we will now see that the warning has disappeared, and is now replaced
by the message we printed in the reader function:
We should load [/home/storm/language/test/demo.demo]
What happened is that Storm found a file with the extension .demo when it attempted to load the
code in test directory. Since Storm does not know how to deal with specific file types, it looked
for the function lang.demo.reader(...) and found the function we just defined. Storm then called
our function to create a reader for the files that it found, which is when the print function
was called.
In fact, Storm will call our reader function exactly once for every package that contains at least
one file with the extension .demo. If a package contains more than one file with the .demo
extension, they will all be passed in the array to the reader function. This makes it possible to
the language to assign special meaning to certain files if the language choses to do so. For
example, we might wish to make the file package.demo specify packages that are imported to all
other files, or contain special constructs that require it to be processed before the other files.
Since Storm does not make assumptions about how the language works, it simply gives the language a
list of all files of the specified file type that was found, and lets the language implement the
semantics.
Implementing a Reader
As we saw before, the reader function is expected to create a PkgReader class and return it to
Storm (we will simply refer to it as a Reader). Storm will then call the member functions in the
returned reader class to coordinate the loading process between different languages in order to
resolve dependencies between languages. As such, we are not supposed to read and parse the files
passed in the files array already in the reader function.
Since different files in our demo language can be parsed independently of each other, we can use
the FilePkgReader class provided by Storm. The FilePkgReader is an implementation of a
PkgReader that simply contains a list of FileReader classes, one for each file, and forwards
calls from the FilePkgReader to each individual FileReader. It also assumes that each file
contains text, and helps us with loading the contents of each file as well.
It is, of course, possible to skip this indirection, if desired. This does, however, mean that we need to load the contents of the files ourselves, and iterate through them when loading the contents of each file.
The first step is to create a class that inherits from lang:FileReader:
class DemoFileReader extends lang:FileReader { init(lang:FileInfo info) { init(info) {} } }
As we can see, the class receives a FileInfo object. This object contains, among other things, the
contents of the file as a string, the path of the file, and the package we should store the contents
of the file into. We simply pass it along to the FileReader constructor, which stores it as the
member variable info, that is accessible to the derived class.
We can then instruct the system to create our class by replacing the contents of the reader
function with the following:
lang:PkgReader reader(Url[] files, Package pkg) on Compiler { lang:FilePkgReader(files, pkg, (x) => DemoFileReader(x)); }
This line creates a FilePkgReader, and instructs it to create a DemoFileReader for each file in
the files array by passing it a lambda function that creates the appropriate type. The parameter
x is the FileInfo that the FilePkgReader creates for each of the files.
As mentioned before, Storm uses the created readers to coordinate the loading process between different language. It does this by calling the functions in the reader in the following order:
-
readSyntaxRules- Asks the reader to load any syntax rules in the file/package. -
readSyntaxProductions- Asks the reader to load any syntax productions in the file/package. -
readTypes- Asks the reader to load any types in the file/package. -
resolveTypes- Asks the reader to resolve any dependencies between types in the file/package. A typical example is to lookup the specified super-type of all types. Doing this earlier might fail, since the file that contains the referred type might not yet be loaded. -
readFunctions- Asks the reader to load any functions in the file/package. This is done after loading types, since it is necessary to look up parameter types to create function entities. -
resolveFunctions- Asks the reader to resolve any references between functions.
Since the demo language only defines functions, we only have to worry about readFunctions. As
such, we can override the function and add a call to the print function to see that our reader is
working as intended:
void readFunctions() : override { print("Should read functions from ${info.url}"); }
If done correctly, we should get the following output:
Should read functions from /home/filip/Projects/storm/language/test/demo.demo @/home/storm/language/test/test.bs(28-29): Syntax error: Can not find "f" with parameters ().
Parsing the Language
To actually define some functions with our language, we need to parse the text in the source files.
Storm itself lets the language choose how to do this, but it is usually a good idea to use the
Syntax Language that is already available. As such, we will write grammar in the syntax.bnf file
in the demo directory that we created earlier.
Delimiters
When writing grammar, it is usually a good idea to start by considering what constitutes whitespace in the language. In the Syntax Language, we express this by defining the delimiters at the top of the file to match our expectations. Note that it is convenient to make the comment syntax a part of the delimiters. That means that comments may appear at any place where whitespace may appar by default, and we do not have to think about comments in the rest of the grammar.
For our simple language we will only need the optional delimiter. Since we don't want the language
to be sensitive to whitespace, we consider any sequence of whitespace characters (space, newline,
linefeed, tab) to be a delimiter. For comments, we use the single-line C++-style comments (//). We
implement this as the rule SDelimiter with a single production as follows:
void SDelimiter(); SDelimiter : "[ \n\r\t]*" - (SComment - SDelimiter)?;
The production first matches zero or more whitespace character. After that, we optionally match a
SComment followed by another delimiter. The recursive nature of this rule means that we will
automatically match zero or more lines of comments, optionally separated by whitespace. Note that we
use the dash (-) as a separator in this rule. Since we will use SDelimiter as the comma
operator, it would be a bad idea to use the comma as a delimiter inside itself.
The SComment rule can be defined as follows:
void SComment(); SComment : "//[^\n]*\n";
Since the rule contains a single regular expression token it could be inlined directly into
SDelimiter. This structure is, however, more readable, and makes it easier to add other comment
styles in the future if desired.
Now that we have defined our delimiter syntax, we can tell the Syntax Language that we wish to use
it whenever an optional delimiter appears (i.e. ,):
optional delimiter = SDelimiter;
Since we have only defined the optional delimiter, we will not be able to use the required delimiter
(~) in the syntax file. This is fine in our case, since the language is simple enough to not need
it.
For languages that need a required delimiter we would define that in a similar way. The difference
is that we create a rule that always matches at least one whitespace. A first step towards creating
such a rule is to change the * to a + in the production for SDelimiter above. This would,
however, require at least one whitespace after and between a comments as well. This can be
solved by making the recursive step use SDelimiter. This is still not exactly what we might want,
since that rule would require a space before any comment. This can, however, be considered as close
enough, since comments rarely appear in places where whitespace is required, and when it is needed
it is often natural to add a space before the comment anyway.
Parsing the Language
After defining the delimiters, we are ready to define the syntax of our language. We start
prototyping the syntax, and worry about the syntax transforms later. In this example, we start at
the top-level and define the rule for the program as a whole. We call this rule SRoot since it
will be the root of the parse tree. Since files in the demo language simply contains zero or more
function definitions, we can write it as follows:
void SRoot(); SRoot : , (SFunction, )*;
The production might look a bit strange since the commas seems to be mismatched. However, recall
that we defined the optional delimiter (,) to match the SDelimiter rule. This means that we can
also write the production as:
SRoot : SDelimiter - (SFunction - SDelimiter)*;
That is, the rule matches zero or more SFunctions, surrounded by SDelimiters. The SDelimiter
at the start and end of the root rule are important since we want to allow arbitrary whitespace to
appear at the start and end of the source file. Not doing that would most likely confuse the users
of our language.
Next up is the rule for function definitions, SFunction. A function consists of an identifier that
specifies the name of the function, a list of zero or more parameters, and an expression that
defines the body of the function. We can write this as follows in the Syntax Language:
void SFunction(); SFunction : SIdentifier, "(", SParamList, ")", "=", SExpr;
As you see, we refer to other rules for the function's name and the parameter list. Using the rule
SIdentifier for the name is convenient since it allows us to define the allowed names in one
place, so that it is easier to modify it in the future. It also helps readability. The SParamList
for matching the list of parameters is for readability, but also to make it easier to implement the
repetition as we shall see.
The SIdentifier rule can simply be implemented like this:
void SIdentifier(); SIdentifier : "[A-Za-z]+";
The SParamList is perhaps a bit less obvious at first. The goal is to match a list of
comma-separated identifiers. This requires using multiple productions. The easiest way is perhaps to
have a special case for when the list is empty, and use the repetition syntax for the general case.
It is explained in further detail in the grammar tutorial.
void SParamList(); SParamList : ; // Zero parameters SParamList : SIdentifier - (, ",", SIdentifier)*; // One or more parameters
Now, the only thing that remains is the grammar for expressions. For this, we can more or less reuse the rules from the grammar tutorial:
void SExpr(); SExpr : SExpr, "\+", SProduct; SExpr : SExpr, "-", SProduct; SExpr : SProduct; void SProduct(); SProduct : SProduct, "\*", SAtom; SProduct : SProduct, "/", SAtom; SProduct : SAtom; void SAtom(); SAtom : "-?[0-9]+"; SAtom : "(", SExpr, ")";
To make our language a bit more interesting, we add the following productions to SAtom to support
variables and function calls as well:
// Variable access: SAtom : SIdentifier; // Function call: SAtom : SIdentifier, "(", SActuals, ")"; void SActuals(); SActuals : ; SActuals : SExpr - (, ",", SExpr)*;
As you have probably noticed, the SActuals follows the same structure as SParamList in order to
match comma-separated parameters.
Now that we have the grammar, we just need to use it to parse the text in our reader. We can do this
by replacing the print call with the following code in the readFunctions function:
void readFunctions() : override { Parser<SRoot> parser; parser.parse(info.contents, info.url, info.start); if (parser.hasError) throw parser.error(); }
The three parameters to parse are as follows: the first one is the string to parse. The
FileReader has read the file for us and placed it in info.contents. The second parameter is the
path of the file. Again, we just read it from info.url. The third parameter is an iterator into
the string that specifies where to start. For our purposes this is not important, since the iterator
will always be to the start of the string. However, if we wish to parse the file in multiple steps
(to support extensible syntax), we need to chain file readers. For this to work properly, it must be
possible to specify that a secondary file reader should start partway through the file. Therefore it
is a good idea to pass the iterator in info.start to avoid future confusion.
If you have done everything correctly, you will receive the same error as before. However, we can
verify that we are actually parsing the source file by introducing an error in the demo.demo file.
For example, replacing the first line with f() = a 1 produces a parse error as we would expect.
When you have verified the behavior, remember fix any errors you have introduced before continuing the tutorial in order to avoid unnecessary troubleshooting down the line!
Extracting Information from the Parse Tree
The next step is to actually use parse tree to create functions. To do this, we need to add
transform functions to the syntax we just created. We can do this one step at a time and test it
along the way. The first step is to extract the functions in the file. To do this, we first create a
class that will represent our functions. We add it to the file demo.bs in the demo directory:
class DemoFunction extends Function { Str[] paramNames; init(SrcPos pos, Str name, Str[] paramNames) { Value[] valParams; for (x in paramNames) valParams << named{Int}; init(pos, named{Int}, name, valParams) { paramNames = paramNames; } } }
Since we used the named{} syntax, we also need to add use lang:bs:macro; to the top of the file.
The class DemoFunction inherits from the Function class in Storm. This class is Storm's generic
representation of functions in the name tree. This means that we will be able to add the created
instances of DemoFunction to the name tree, and make it possible for other languages to find them
later on. This is how Basic Storm is able to find our created functions later on.
The constructor accepts three parameters. A SrcPos that contains the position of the function
definition, a Str that contains the name of the function, and an array of strings that contain the
parameter names.
In the constructor we then forward these parameters to the constructor of the parent class with some
minor modifications. As we can see, the constructor of the Function class accepts four parameters:
the position of the definition, the return type of the function, the name of the function, and the
types of the parameters to the function. This means that we can simply forward the pos and name
parameters. The paramNames need some care, however. The Function class only requires an array of
the type of all parameters, but we receive only the names of the parameters. Since all variables
in the demo language are integers, we can simply create an array with the correct number of
references to the Int type. We do, however, save the names of the parameters for later. Finally,
the return type is always Int since we always work with integers.
Now, we can turn to our grammar. The goal is to create a DemoFunction class from the grammar. To
determine what we need, we can start by modifying the SFunction rule:
DemoFunction SFunction(); SFunction => DemoFunction(pos, name, params) : SIdentifier name, "(", SParamList params, ")", "=", SExpr;
Since we assume that the SIdentifier and SParamList rules return something useful, we also need
to modify them. The SIdentifier is quite easy to fix, we just need to capture the regular
expression and return that:
Str SIdentifier(); SIdentifier => x : "[A-Za-z]+" x;
The SParamList requires a bit more thought, as we want it to return an array of the parameters. If
we would have managed to match all identifiers in the same repeat syntax, we could simply capture
that token and return it since the Syntax Language would turn it into an array for us. This is,
however, not the case. Fortunately, we can use the arrow (->) syntax to have the Syntax Language
call push on an array for us to create the array we are after:
Array<Str> SParamList(); SParamList => Array<Str>() : ; SParamList => Array<Str>() : SIdentifier -> push - (, ",", SIdentifier -> push)*;
Finally, we also need to modify the SRoot production to create an array of all function
definitions in the file. We can do this using a similar approach to above, or by simply capturing
and returning the SFunction token:
Array<DemoFunction> SRoot(); SRoot => Array<DemoFunction>() : , (SFunction -> push, )*;
or
Array<DemoFunction> SRoot(); SRoot => functions : , (SFunction functions, )*;
Now that we are done with the transform functions in the grammar, we just need to call the transform
functions and add the created DemoFunction instances to the name tree. We can do this by updating
the readFunctions function as follows:
void readFunctions() : override { Parser<SRoot> parser; parser.parse(info.contents, info.url, info.start); if (parser.hasError) throw parser.error(); DemoFunction[] functions = parser.tree().transform(); for (f in functions) info.pkg.add(f); }
If done correctly, this will cause the previous error to disappear. However, it is replaced by a crash. The reason for the crash is that Basic Storm found the function and attempted to call it. However, since we did not add any code to our function, the system crashes when we try to call the function (the function points to the address 0, which causes the crash).
Setting Up Code Generation
To avoid the crash, we need to add some code to our function. We do this by creating an instance to
one of the subclasses to the core.lang.Code and adding it to the function. In
particular, we embrace Storm's lazy-loading strategy by using the subclass
core.lang.LazyCode that implements this behavior for us by calling a callback function
whenever the function is called the first time. We can do this by adding the following line to the
end of the constructor in DemoFunction:
setCode(LazyCode(&this.createCode));
The expression &this.createCode creates a pointer to the member createCode in the current
instance of the class. As such, we need to define the function createCode as well. For now, we
just throw an exception to avoid the previous crash:
private CodeGen createCode() { throw InternalError("TODO: Generate code for ${name}."); }
At this point, if we run the program, we will see that the message "TODO: Generate code for f."
appears along with a stack trace. This means that our implementation works as expected! To verify
the lazy loading, we can surround the entire body of the main function in an if (false)
statement:
if (false) { Int result = f(); print("Result: ${result}"); }
Since we never call f above, the system will never try to generate code for it. Therefore we
don't see the exception in this case. Basic Storm will, however, still look up the function f in
order to type-check the body of the if statement, so the absence of the exception does not mean
that f was never created.
As before, remember to remove the if (false) you added to avoid spending time looking for problems
later on.
Generating Code
The next step is to actually generate some code for the function. This might seem like a difficult task at first. However, given a good structure, it is possible to break the problem into small enough pieces so that each piece is trivial to implement, and the complexity of the problem is handled by the structure.
We will implement a simplified version of the structure used in Basic Storm for our demo language.
This structure is what is typically called an Abstract Syntax Tree (AST). As in Basic Storm, we base
the AST around a class we call Expr. The Expr class will be an abstract class that represents
some expression in our language. We implement the class as follows in the file ast.bs in the
demo directory:
use core:asm; use core:lang; use lang:bs:macro; class Expr on Compiler { SrcPos pos; init(SrcPos pos) { init { pos = pos; } } void code(CodeGen to) : abstract; }
As you can see, the Expr class is very small. It only contains a SrcPos that stores the location
in the source code where the expression appears. This is not necessary for code generation. However,
having a source location for all nodes in the AST makes it much easier to produce good error
messages.
The important function in the Expr class is the function code. When called, this function should
generate code to evaluate the expression that the class represents, and store it in the listing in
the CodeGen object passed as the parameter. One important question is, however, yet unanswered.
Where is the result of evaluating the node stored? As we saw in previous
tutorials, Basic Storm uses the CodeResult object to specify the location of
the result. The code function here does not, so where is the result stored? To make the
implementation simpler, we once again utilize the fact that we only work with integers. This means
that we can simply decide that the code function always stores the result of the evaluation in the
eax register.
It is possible to generalize this scheme a bit to support all numeric types as well as class- and
actor types. We can observe that all of these types can be stored in a single register (numeric
types fit in a register, and classes and actors are handled by pointer). This means that such a
language can simply place results in the a register, resized to the type being created (i.e.
registers al, eax, rax, or ptrA depending on the size). It is, however, not possible to use
this scheme to support value types, since they do not always fit in registers. This is why Basic
Storm uses CodeResult objects instead.
To illustrate this idea, let's start by implementing an AST node for literals. It can be implemented as follows:
class Literal extends Expr { Int value; init(SrcPos pos, Str value) { init(pos) { value = value.toInt(); } } void code(CodeGen to) : override { to.l << mov(eax, intConst(value)); } }
The interesting part of the Literal class above is the implementation of the code function.
Remember that we specified that the code function should evaluate the node and place the result in
the eax register. Evaluating a literal is trivial: literals evaluate to themselves. This means
that the only thing that remains is to store the result in the eax register. We do this by adding
a mov instruction to the listing in the CodeGen object.
Since Storm is lazily compiled, we can test the implementation of our literal without implementing the semantics for the rest of the language. The lazy compilation means that Storm only attempts to compile and type-check the code that is actually used. As such, it does not matter that most of the semantics are missing, and would cause type errors as long as we don't use that part of the semantics!
So, let's do the minimal amount of work needed so that we can test our implementation, and
understand how it works before attempting to implement the rest of the semantics of the language. We
can do this by starting from the DemoFunction class and let the error messages by Storm direct us
to what needs to be done next. Let's start by modifying the constructor to accept and store a
SExpr parameter for the body:
class DemoFunction extends Function { Str[] paramNames; SExpr body; init(SrcPos pos, Str name, Str[] paramNames, SExpr body) { Value[] valParams; for (x in paramNames) valParams << named{Int}; init(pos, named{Int}, name, valParams) { paramNames = paramNames; body = body; } setCode(LazyCode(&this.createCode)); } // ... }
Note that we store the body as SExpr rather than Expr. The type SExpr refers to the type
created by the Syntax Language to create a parse tree starting in the rule SExpr. This means that
we wish to store the representation of the parse tree before it has been transformed into nodes of
our choice. The reason for this is that we wish to do as little work as possible until we are
actually asked to generate the machine code for the function. Currently, it would not matter much.
However, when we add function calls we will try to look up names during the syntax transforms. If we
do that too early, the lookups might fail because the function we try to find has not been added
yet.
The next step is to implement the createCode function to generate code. Since functions in the
demo language are supposed to evaluate and return the associated expression, we can simply call
the code function of the root node in the AST to do the heavy lifting for us. We get the root node
by calling transform on the body variable we just saved. What remains is to create a CodeGen
object, emit a function prolog, and to make sure that the function returns the right result. The
last step requires us to remember that we decided that code functions store the result of
evaluating the expression in the eax register:
private CodeGen createCode() { // Create a CodeGen instance. All parameters are members // in the Function class. CodeGen gen(runOn, isMember, result); // Emit the function prolog. gen.l << prolog(); // Transform the body into AST nodes, and generate code // for the expression. Expr e = body.transform(); e.code(gen); // Since the result of the evaluation is stored in 'eax' // by our convention, we can just generate code to return // the value in 'eax' here. gen.l << fnRet(eax); // Return the generated code. The LazyCode will then // compile the generated code into machine code and // execute it. return gen; }
Note that we also need to add use core:asm; to the top of the file. Otherwise names such as eax
will not be visible.
If we run the test program at this point, we will get the following error:
@/home/storm/language/demo/syntax.bnf(249-355): Syntax error: Can not find "DemoFunction" with parameters (core.lang.SrcPos&, core.Str&, core.Array(core.Str)&).
This error is because we now need to pass the body to the constructor of DemoFunction. Since we
wish to capture the original parse tree here, and not transform it first, we need to add the @
character when capturing the body. This looks like the code below:
DemoFunction SFunction(); SFunction => DemoFunction(pos, name, params, body) : SIdentifier name, "(", SParamList params, ")", "=", SExpr @body;
If we run the program again, we get a different error:
@/home/storm/language/demo/demo.bs(1039-1040): Syntax error: No appropriate constructor for lang.demo.Expr found. Can not initialize 'e'. Expected signature: __init(lang.demo.Expr, void)
This error is from the line where we assign the result of body.transform() to the variable e.
The reason is that the rule SExpr returns void, and not Expr as we expected. We fix this by
changing the return type of the SExpr rule:
Expr SExpr();
This causes yet another error to appear:
@/home/storm/language/demo/syntax.bnf(638-654): Syntax error: No return value specified for a production that does not return 'void'.
In this case, it is caused by the production SExpr : SProduct not returning anything. We can fix
the error by adding suitable transforms to forward the result:
SExpr => x : SProduct x;
If we continue to run the program and fix errors like above, we will see that we also need to change
the result of SProduct and SAtom in the same way, and that we need to modify the productions
SProduct : SAtom and SAtom : "-?[0-9]+"; as follows:
SExpr SProduct(); // ... SProduct => x : SAtom x; // ... Expr SAtom(); SAtom => Literal(pos, num) : "-?[0-9]+" num;
After these modifications, the program finally runs and produces the output:
Result: 1
This is exactly what we would expect, and it means that our new language can successfully compile and run very simple functions! A great first step!
Implementing Operators
Now that our implementation has progressed far enough to compile simple functions, let's implement
operators as well. This will show the strength of our assumptions regarding the code function,
since this will make it fairly straight-forward to implement the operators correctly, regardless of
the complexity of the expressions that are being evaluated.
To start, let's update our test program in the demo.demo file in the test directory as follows:
f() = 10 + 4
If we try to run the example now, we will get an error similar to before that says that the
production for + does not return anything, even though it should return a non-void value. We can
fix that by updating the production as follows:
SExpr => OpAdd(pos, l, r) : SExpr l, "\+", SExpr r;
Of course, we must also define the class OpAdd as well. The first part of the class is
straight-forward, we simply store the expressions that correspond to the left- and right-hand sides
of the operator:
class OpAdd extends Expr { private Expr lhs; private Expr rhs; init(SrcPos pos, Expr lhs, Expr rhs) { init(pos) { lhs = lhs; rhs = rhs; } } }
The code function is, however, at first a bit trickier. In order to evaluate the + operator, we
must first evaluate the left- and right hand sides in some order. Then we can add the results of
these expressions together.
At this time, we need to remember the convention we established: that the code function always
places the result in the eax register. This allows us to implement + using these steps:
-
Evaluate
rhs. The result is stored ineax. -
Store
eaxin a temporary variable, so that its value is not overwritten in the next step. -
Evaluate
lhs. The result is stored ineax. -
Add the value in the temporary variable to
eax. The result is now ineax.
We can implement this strategy as follows:
void code(CodeGen to) : override { // Create a temporary variable for step 2. Var tmp = to.l.createIntVar(to.block); // Step 1: rhs.code(to); // Step 2: to.l << mov(tmp, eax); // Step 3: lhs.code(to); // Step 4: to.l << add(eax, tmp); }
If we run the program now, it produces the correct result: 14. It seems like our + operator is
working!
One question that might arise at this point is: Why do we need to create a temporary variable? Is it
not enough just store the temporary variable in another register, like ebx or ecx? It is indeed
true that this would be enough in the case where only a single + appears in the expression.
However, consider what would happen for the expression 1 + 2 + 3 + 4 for example. Since all +
operators attempt to store their temporary inside the ebx register, they will overwrite each
other's stored values, and produce the wrong result. You can see this effect by replacing the line
Var tmp = ... with var tmp = ebx; and printing the gen variable in the createCode function
in the DemoFunction class.
We can now implement the remaining operators in the same way. This will allow us to evaluate
arbitrary expressions, for example f() = (10 + 4) / 2. This example also highlights that we need
to add a transform function to the production for parentheses as well. All in all, the expression
grammar should look like below after these modifications:
Expr SExpr(); SExpr => OpAdd(pos, l, r) : SExpr l, "\+", SProduct r; SExpr => OpSub(pos, l, r) : SExpr l, "-", SProduct r; SExpr => x : SProduct x; Expr SProduct(); SProduct => OpMul(pos, l, r) : SProduct l, "\*", SAtom r; SProduct => OpDiv(pos, l, r) : SProduct l, "/", SAtom r; SProduct => x : SAtom x; Expr SAtom(); SAtom => Literal(pos, num) : "-?[0-9]+" num; SAtom => x : "(", SExpr x, ")";
Also, the classes that implement the operators look like below. Note that only the name and the last
instruction in the code function differ:
class OpAdd extends Expr { private Expr lhs; private Expr rhs; init(SrcPos pos, Expr lhs, Expr rhs) { init(pos) { lhs = lhs; rhs = rhs; } } void code(CodeGen to) : override { Var tmp = to.l.createIntVar(to.block); rhs.code(to); to.l << mov(tmp, eax); lhs.code(to); to.l << add(eax, tmp); } } class OpSub extends Expr { private Expr lhs; private Expr rhs; init(SrcPos pos, Expr lhs, Expr rhs) { init(pos) { lhs = lhs; rhs = rhs; } } void code(CodeGen to) : override { Var tmp = to.l.createIntVar(to.block); rhs.code(to); to.l << mov(tmp, eax); lhs.code(to); to.l << sub(eax, tmp); } } class OpMul extends Expr { private Expr lhs; private Expr rhs; init(SrcPos pos, Expr lhs, Expr rhs) { init(pos) { lhs = lhs; rhs = rhs; } } void code(CodeGen to) : override { Var tmp = to.l.createIntVar(to.block); rhs.code(to); to.l << mov(tmp, eax); lhs.code(to); to.l << mul(eax, tmp); } } class OpDiv extends Expr { private Expr lhs; private Expr rhs; init(SrcPos pos, Expr lhs, Expr rhs) { init(pos) { lhs = lhs; rhs = rhs; } } void code(CodeGen to) : override { Var tmp = to.l.createIntVar(to.block); rhs.code(to); to.l << mov(tmp, eax); lhs.code(to); to.l << idiv(eax, tmp); } }
Variables
Now that we can evaluate expressions properly, let's make the language more useful by adding variables. Since there are no assignments in the language, the only way to define a variable is to specify it as a parameter to a function.
Under these constraints, it would be possible to implement variables trivially using a simple hash table. This tutorial will, however, illustrate a slightly more complex implementation that is able to accommodate more advanced languages as well. In particular, the implementation here will utilize the name lookup mechanisms in Storm, so that it is possible to look up variables and functions using an uniform interface. The structure also allows implementing things like blocks that introduce separate namespaces for example. All in all, the structure is a simplified version of the structure used in Basic Storm.
As in the previous parts of the tutorial, we start by updating our test file. To test parameters and
variables, let's add a new test to the demo.demo file, so that it looks like below:
f() = (10 + 4) / 2 f(a, b) = a + b + 1
If we run the program now, it will actually still work. This is due to the lazy compilation in
Storm. The file will parse successfully since we added a production that matches variables. This
means that the code parses correctly, even if we have not yet implemented support for variables.
Since we never call f that accepts two parameters, we will never try to compile it, and we thus
never discover the error. As mentioned here, we can
force Storm to compile all of our code eagerly to see the error. This does, however, mean that we
have to implement function calls as well before being able to test our implementation of variables.
The easiest way to test our implementation is to modify the main function in the test.bs file to
call the new function:
void main() { Int result = f(1, 2); print("Result: ${result}"); }
If we run the program at this stage, we get an error as we would expect:
@/home/storm/language/demo/syntax.bnf(962-981): Syntax error: No return value specified for a production that does not return 'void'.
As before, we solve this error by adding a transform function to the production. This time we use it
to create a node called VarAccess:
SAtom => VarAccess(pos, name) : SIdentifier name;
We also need to define the VarAccess class in the ast.bs file as before:
class VarAccess extends Expr { init(SrcPos pos, Str name) { init(pos) {} } void code(CodeGen to) : override { // ? } }
This is enough to allow us to run the program, but not enough to produce the correct result (I get
the number 3 when running it, but it might differ on your machine or even crash). The big question
that remains is: how do we know which variable to read in the code function?
To answer that question, let's start at the root of the AST, when we start code generation in the
createCode function inside the DemoFunction class. At the moment we completely ignore that the
function receives parameters. This means that we actually generate code for a function that accepts
0 parameters, even though we pass two integers when we call it. The first step is therefore to tell
the Listing object that the function accepts parameters, so that it can generate code that
receives the parameters properly. We do this by calling the createParam function for each
parameter, and giving it a TypeDesc (type description) that contains information about the type
that should be received. We can easily get a TypeDesc from a Type by calling its desc()
member:
private CodeGen createCode() { CodeGen gen(runOn, isMember, result); gen.l << prolog(); for (name in paramNames) { Var param = gen.l.createParam(named{Int}.desc()); } Expr e = body.transform(); e.code(gen); gen.l << fnRet(eax); return gen; }
As we can see from the code, the createParam function returns a Var object. This means that we
can treat parameters in the Intermediate Language just as we treat normal variables. This means that
as long as we manage to get the Var values from the createCode function to the VarAccess
class, we can easily generate code to load the variable into the eax register. The question that
remains is therefore: how do we allow the VarAccess class to access the parameters?
As mentioned above, we will solve name lookups using the facilities provided by Storm. As such, the
first step is to create a named entity for storing our variables. We can inherit from the
core.lang.Variable class in Storm to simplify the implementation a bit:
class DemoVar extends Variable { Var codeVar; init(SrcPos pos, Str name, Var codeVar) { init(pos, name, named{Int}) { codeVar = codeVar; } } }
As we can see, this class simply stores a Var instance that corresponds to the variable we
received from the createParam function. The base class Variable also gives it a result member
that stores the type of the variable (this is why we pass named{Int} to the parent constructor).
Furthermore, since Variable inherits from Named, it also has a name and a source position. The
inheritance from Named also means that it can be a part of the name tree in Storm, and that we can
search for it using the standard lookup mechanisms.
Before we can use the DemoVar class we just created, we need somewhere to store them. Since the
variables are local to the function, we don't want them to be accessible from the root of the name
tree. We do, however, wish to have a node in the name tree with a parent pointer to the right point
in the name tree, so that we can lookup functions and other variables later on. For this reason, we
create a class that inherits from core.lang.NameLookup. The NameLookup class is the
super class of core.lang.Named, and represents a node in the name tree where it is
possible to look up names, and that has a parent pointer to some location in the name tree. It does,
however, not have a name and it is therefore not possible to store a NameLookup in the name tree,
or to find them by name. This is exactly what we want. We start the implementation as follows:
class DemoLookup extends NameLookup { init(NameLookup parent) { init(parent) {} } }
Since we wish to be able to look up names in the name tree from the NameLookup later on, we let
the constructor accepts a parameter that specifies the parent node in the name tree. This also lets
us link multiple DemoLookup together to represent nested namespaces inside functions in our
language if we wish.
The next step is to add somewhere to store variables inside the DemoLookup. Since variables do not
have any parameters, we can simply use a Map<Str, DemoVar> (also spelled Str->DemoVar). We also
add a function that makes it convenient to add variables:
class DemoLookup extends NameLookup { Str->DemoVar variables; init(NameLookup parent) { init(parent) {} } void addVar(SrcPos pos, Str name, Var v) { DemoVar var(pos, name, v); var.parentLookup = this; variables.put(name, var); } }
One thing is worth noting in the code above. In the addVar function we explicitly set
parentLookup of the created variable to this. This connects the variable to the current block,
so that it is possible to start in the variable and traverse the name tree towards the root. We do
not need this ability in our language, but for some types of named entities (e.g. functions), it is
necessary to attach them to the name tree before using them. The function add in the NameSet
class sets the parentLookup internally, so it is only necessary to set parentLookup manually when
working with NameLookups like in this example.
We can now go back to the createCode function and store the parameters inside a DemoLookup:
private CodeGen createCode() { CodeGen gen(runOn, isMember, result); gen.l << prolog(); DemoLookup lookup(this); for (name in paramNames) { Var param = gen.l.createParam(named{Int}.desc()); lookup.addVar(pos, name, param); } Expr e = body.transform(); e.code(gen); gen.l << fnRet(eax); return gen; }
Now that we have stored the parameters somewhere, we just need to pass them to the VarLookup class
through the grammar. We do this by encapsulating the DemoLookup inside a
core.lang.Scope class. The Scope class can be thought of as a pointer to a location in
the name tree, together with a strategy for looking up names based on that location. For example,
when asked to look up the name Str in the context of the package lang.demo, the Scope used in
Basic Storm will first look in the package lang.demo, then in the root package, and after that in
the core package (the strategy is customizable by providing a ScopeLookup class, but we will not
need to do that in this tutorial).
To give the name tree access to the current scope, we modify the call to transform into the
following:
Expr e = body.transform(Scope(lookup));
We must also change the syntax to accept the new parameter and pass it along to the VarAccess
class. We first add use core.lang; to the syntax.bnf file so we do not have to write
core.lang.Scope every time we wish to refer to the Scope class. After that, we add a parameter
scope with the type Scope to the SExpr, SProduct, and SAtom rules, and modify the usages
of these rules to pass the scope parameter along. After the changes, the productions should look
like this:
use core.lang; // ... Expr SExpr(Scope scope); SExpr => OpAdd(pos, l, r) : SExpr(scope) l, "\+", SProduct(scope) r; SExpr => OpSub(pos, l, r) : SExpr(scope) l, "-", SProduct(scope) r; SExpr => x : SProduct(scope) x; Expr SProduct(Scope scope); SProduct => OpMul(pos, l, r) : SProduct(scope) l, "\*", SAtom(scope) r; SProduct => OpDiv(pos, l, r) : SProduct(scope) l, "/", SAtom(scope) r; SProduct => x : SAtom(scope) x; Expr SAtom(Scope scope); SAtom => Literal(pos, num) : "-?[0-9]+" num; SAtom => x : "(", SExpr(scope) x, ")"; SAtom => VarAccess(pos, name, scope) : SIdentifier name;
Now, we can finally return to our VarAccess implementation. We start by modifying the constructor
to accept the new Scope parameter. It can then use the new parameter to try to look up the
variable as follows:
init(SrcPos pos, Str name, Scope scope) { init(pos) {} if (x = scope.find(SimpleName(name)) as DemoVar) { print("Found ${x.name}!"); } else { throw SyntaxError(pos, "Unknown variable: ${name}"); } }
As we can see, the Scope class has a find function that accepts a single parameter that contains
the name that it should attempt to find. Since we have no unresolved parameters, we can simply
create a core.lang.SimpleName that consists of a single part that contains the name we
are looking for. The function returns a Maybe<Named> value that is null if no name was found.
Since we are looking for DemoVar instances, we cast it to a DemoVar using the as keyword. This
check is also able to check for null at the same time. For now, we simply print a message if we
found a matching variable, and throw a syntax error on failure.
If we run the code like this, we will (perhaps surprisingly) reach the error case:
@/home/storm/language/test/demo.demo(29-30): Syntax error: Unknown variable: a
To understand why the error occurs, we need to know a bit more about how the find function works
internally. On a high level, the find function iterates through the individual parts in the
supplied name. It attempts to find the first part in the NameLookup stored in the Scope. The
next part is resolved in the context of the entity found by the first, and so on. To find a single
part in a NameLookup class, the Scope calls the find function in the NameLookup. If it fails
by returning null, the name lookup is considered a failure, and the Scope may try again from a
different starting point (e.g. the parent to the current NameLookup).
Knowing this strategy allows us to find the problem in our implementation. While we store DemoVar
entities in our DemoLookup, we have not yet implemented the find function that lets the Scope
find the variables. As such, we need to take a slight detour and override the find function in our
DemoLookup class:
Named? find(SimplePart part, Scope scope) : override { if (part.params.empty) { if (found = variables.at(part.name)) { return found; } } return null; }
The function receives two parameters. The first one is the part of the name that the Scope wishes
to find. The second parameter is the scope itself. It can be used to inspect the context in which
the name lookup occurred, so that we can determine which variables are visible from that context
(e.g. for private variables). Since we are working with local variables that are not reachable from
anywhere outside the function, we can safely ignore the scope parameter in our implementation.
The find function starts by checking if the part contains any parameters. If it does, we know
that none of our variables will match, since variables do not take parameters by definition. If the
part had no parameters, we can proceed to look up the variable in the hash table, and return it if
we found a match.
This is enough for our needs, and if we try to run our program again, we can see that the error
message is replaced by the messages Found a! and Found b! This means that our name lookup works
as expected. The only thing that remains is to use the found entity to retrive the value of the
variable!
We add a variable in the VarAccess class that stores the DemoVar we found in the constructor.
Note, however, that due to how the unless statement is implemented in Basic Storm, it is not
possible to use it before init or super blocks. This means that we need to rely on the fact that
if statements are expressions when we initialize the DemoVar variable (it is possible to write a
helper function as well, but where is the fun in that?). Once we have stored the DemoVar in the
class, we can use it in the code function to load the codeVar into the register eax:
class VarAccess extends Expr { DemoVar toRead; init(SrcPos pos, Str name, Scope scope) { init(pos) { toRead = if (x = scope.find(SimpleName(name)) as DemoVar) { x; } else { throw SyntaxError(pos, "Unknown variable: ${name}"); }; } } void code(CodeGen to) : override { to.l << mov(eax, toRead.codeVar); } }
And with those changes, the implementation of variables is complete. If we run the program now, it
will print Result: 4 as expected.
Function Calls
After implementing variables, the last thing that remains is to implement function calls. Luckily,
it turns out that we can re-use much of the work we did to implement variables when implementing
functions. In particular, we can use the find function in the Scope class to look for functions
as well.
Before we get to the implementation, let's update our demo.demo file with some new tests:
f() = (10 + 4) / 2 f(a, b) = a + b + 1 g(a, b) = f(a, a) * b
We also change the main function in test.bs as follows:
void main() { print("f(1, 2) = ${f(1, 2)}"); print("g(4, 2) = ${g(4, 2)}"); }
If we run the program at this point, we get the familiar error message below that indicates that we are trying to transform a production that we have not yet specified a transform for:
@/home/storm/language/demo/syntax.bnf(1145-1184): Syntax error: No return value specified for a production that does not return 'void'.
To fix the issue, we need to add transforms to the function call production. Since it uses the
SActuals rule, we also need to update that production to include transforms and the Scope
parameter:
SAtom => FnCall(pos, name, params, scope) : SIdentifier name, "(", SActuals(scope) params, ")"; Array<Expr> SActuals(Scope scope); SActuals => Array<Expr>() : ; SActuals => Array<Expr>() : SExpr(scope) -> push - (, ",", SExpr(scope) -> push)*;
Next, we need to implement the FnCall node in the file ast.bs. We start with the basic
structure:
class FnCall extends Expr { private Expr[] params; init(SrcPos pos, Str name, Expr[] params, Scope scope) { init(pos) { params = params; } } void code(CodeGen to) : override { // ? } }
As was the case for variables, it is now possible to run the test. It will run and produce some answer (4 on my system, but it may differ on your system). This is nice as it lets us experiment a bit easier.
The implementation in the FnCall class will be similar to the VarAccess class. In the
constructor we will use the scope to find the function to call. We store it as a variable in the
class so that we can use it in the code function. The difference is that we need to create a name
that has the right number of integer parameters. Otherwise we will not find the function we are
looking for. Since we will be able to find functions in other languages in Storm, it is also a good
idea to verify the return type of the function we found.
We start by creating a name to look for in the constructor:
init(SrcPos pos, Str name, Expr[] params, Scope scope) { SimplePart part(name); for (x in params) part.params.push(named{Int}); SimpleName lookFor(part); // For debugging! print(lookFor.toS); init(pos) { params = params; } }
We start by creating a SimplePart that only contains the name of the function we are looking for.
We then add one integer parameter for each parameter that was passed to the function. The reason we
do not have to inspect the expressions in the params array is again since the demo language only
supports integers. This means that we know that all expressions evaluate to Int. If we add support
for other types, we would have to ask each parameter what type it evaluated to and add that type to
the part instead.
After we have created the part, we create a SimpleName out of the part. In our case, the name
only contains a single part since the demo language does not have any syntax to specify functions
that are located in other packages (e.g. tutorials.language.main()). Finally, we print the name to
see that we have created something that is sensible. If we run the program it will print
f(core.Int, core.Int) which looks exactly like what we would expect.
Since the created name looks right, we can try to look up the name using the scope. We add a new
variable toCall to store the function in, and like in the VarAccess class, we will initialize it
using an if statement in the init block:
class FnCall extends Expr { private Expr[] params; private Function toCall; init(SrcPos pos, Str name, Expr[] params, Scope scope) { SimplePart part(name); for (x in params) part.params.push(named{Int}); SimpleName lookFor(part); init(pos) { params = params; toCall = if (x = scope.find(lookFor) as Function) { if (!Value(named{Int}).mayStore(x.result)) throw SyntaxError(pos, "Functions used in the demo language must return Int!"); x; } else { throw SyntaxError(pos, "Unknown function: ${lookFor}"); }; } } // ... }
As can be seen above, the only difference from the VarAccess class is that we try to cast the
found element to a Function rather than a DemoVar, and that we also verify the result using the
mayStore function in the Value class. This function checks that the left hand side (the Int)
is a binary compatible with the right hand side (x.result). For value types, it would technically
be enough to check if the type is exactly Int. The mayStore function takes inheritance for
class- and actor-types as well, which is usually useful but not really necessary here.
Finally, the only thing that remains is to implement the code function. Here, we can take a
similar approach to the implementation of the code functions for the operators. We start by
evaluating all parameters to the function, and store them in temporary variables, and then call the
function. We can evaluate the parameters and store them in variables like this:
void code(CodeGen to) : override { Operand[] ops; for (x in params) { Var tmp = to.l.createIntVar(to.block); s.code(to); to.l << mov(tmp, eax); ops << tmp; } }
We then have two options to call the function. The most robust one is to utilize the autoCall
function inside the Function class. The autoCall function inspects the CodeGen object and
determines if a thread switch is necessary or not. Using it requires creating a CodeResult object
to let it know where it should place the result. This can be done as follows:
void code(CodeGen to) : override { Operand[] ops; for (x in params) { Var tmp = to.l.createIntVar(to.block); x.code(to); to.l << mov(tmp, eax); ops << tmp; } CodeResult result(named{Int}, to.block); toCall.autoCall(to, ops, result); to.l << mov(eax, result.location(to)); }
In the first new line, we create a CodeResult and specify that we want the result to be located in
a variable visible in the current block (to.block), and that the variable should store an integer.
Then, we call autoCall to ask it to generate code for the function call. Finally, we copy the
resultfrom the variable the CodeResult object created for it to the eax register to follow the
convention of the code function.
The other option is to emit fnParam and fnCall instructions directly. This works in cases where
we are sure that a thread switch is not necessary. We can not really assume this in our
implementation. Therefore, the following implementation is not recommended in this case. It is
only provided to illustrate how function calls in the intermediate language work:
for (x in ops) { to.l << fnParam(named{Int}.desc, x); } to.l << fnCall(toCall.ref(), toCall.isMember, named{Int}.desc, x);
With that, the implementation of function calls is complete. Running the tests now should produce the expected output:
f(1, 2) = 4 f(4, 2) = 18
Since we are using the generic Function interface for function calls, it is actually possible to
call functions implemented in Basic Storm from our demo language. To illustrate this, we can
create a new function in the file demo.demo:
h(a) = bsfn(f(a, a), 5)
We then update our main function in the file test.bs to call the new function. We also implement
bsfn there:
void main() { print("f(1, 2) = ${f(1, 2)}"); print("g(4, 2) = ${g(4, 2)}"); print("h(10) = ${h(10)}"); } Int bsfn(Int x, Int y) { print("In Basic Storm: ${x}, ${y}"); x + y; }
If we run the program now, we can see that it works as expected. The output from the print
statement in bsfn is visible as we would expect, and the correct result is produced:
f(1, 2) = 4 g(4, 2) = 18 In Basic Storm: 21, 5 h(10) = 26
Syntax Highlighting
Using the language server in Storm, we can easily
add syntax higlighting to our language. For this, we need two things: a function in our FileReader
that tells the language server which rule is the root in our grammar, and annotations in the
grammar.
We start by adding the required function to the FileReader. In the class DemoFileReader, we
simply need to add the following function:
lang:bnf:Rule rootRule() : override { named{SRoot}; }
This instructs the language server to start parsing demo files using the rule SRoot. Apart from
that, we just need to annotate the grammar to tell the language server which parts of the text to
highlight, and which color to use.
To allow the end-user to customize the appearance of the highlighted code, the colors for tokens are expressed in terms of what function the characters have in the source code. The following classes are available:
-
#comment- for comments -
#delimiter- for delimiters (e.g. punctuation) -
#string- for string literals -
#constant- for numeric literals (i.e. constant numbers) -
#keyword- for keywords -
#fnName- for function names -
#varName- for variable names -
#typeName- for names of types
Of course, Storm places no requirement on a language to highlight all parts of the text. Furthermore, some languages might make it difficult to distinguish between for example type- and variable names from the information available to the parser. In such cases, it might be possible to modify the grammar slightly to match conventions in the language. For example, it is possible to split a production that matches a generic identifier into two. One matches identifiers that start with an uppercase letter, and is highlighted as a type name. Another matches identifiers that start with a lowercase letter, and is highlighted as a function name. Since the rules can still contain the same syntax transforms, this does not impact the rest of the language.
With this in mind, we can then simply annotate our syntax by adding #-annotations after the
relevant tokens. This ends up looking like below:
Array<DemoFunction> SRoot(); SRoot => functions : , (SFunction functions, )*; DemoFunction SFunction(); SFunction => DemoFunction(pos, name, params, body) : SIdentifier name #fnName, "(", SParamList params, ")", "=", SExpr @body; Str SIdentifier(); SIdentifier => x : "[A-Za-z]+" x; Array<Str> SParamList(); SParamList => Array<Str>() : ; SParamList => Array<Str>() : SIdentifier -> push #varName - (, ",", SIdentifier -> push #varName)*; Expr SExpr(Scope scope); SExpr => OpAdd(pos, l, r) : SExpr(scope) l, "\+", SProduct(scope) r; SExpr => OpSub(pos, l, r) : SExpr(scope) l, "-", SProduct(scope) r; SExpr => p : SProduct(scope) p; Expr SProduct(Scope scope); SProduct => OpMul(pos, l, r) : SProduct(scope) l, "\*", SAtom(scope) r; SProduct => OpDiv(pos, l, r) : SProduct(scope) l, "/", SAtom(scope) r; SProduct => a : SAtom(scope) a; Expr SAtom(Scope scope); SAtom => Literal(pos, num) : "-?[0-9]+" num #constant; SAtom => VarAccess(pos, name, scope) : SIdentifier name #varName; SAtom => FnCall(pos, name, params, scope) : SIdentifier name #fnName, "(", SActuals(scope) params, ")"; SAtom => x : "(", SExpr(scope) x, ")"; Array<Expr> SActuals(Scope scope); SActuals => Array<Expr>() : ; SActuals => Array<Expr>() : SExpr(scope) -> push - (, ",", SExpr(scope) -> push)*;
In order to test the syntax highlighting, it is necessary to start the language
server. If you have developed the new language
directly in Storm's root/ directory, then you do not need to do anything else. If you developed it
separately, you need to make sure that the library is properly included in the Storm process used as
the language server. This is done by modifying the variable storm-mode-include in Emacs before
starting the language server. A good way to do this short-term is to simly set the variable to the
value we want using setq. This makes it easier to re-set the variable if we make a mistake:
(setq storm-mode-include '(("/path/to/demo" . "lang.demo")))
One way to evaluate the expression is to press M-x, type eval-expression, press Enter, and then
typing the expression above and finish by hitting Enter. Another way is to open the buffer
*scratch*, type the expression there, and then press C-M-x at the end of the expression.
Ideas for Extending the Language
The language created here provides a good starting point for further development. Below are some ideas for how to expand the language further, towards a more "real" language:
-
Assignments
Based on what is currently in the language, it is fairly straight-forward to add assignments (perhaps in the form of
letstatements) to the language. It is already possible to add new variables to theDemoLookupobject. The tricky part is that the variables must be added in the constructor of the AST node that creates them, but thecodeVarvariable can not be initialized until code is generated. Fortunately, theVartype has an empty constructor that creates an invalid variable value that can be used as a placeholder.It is also interesting to further extend the language by introducing blocks that delimit the scope of local variables. This can be done by representing the new scope as a new instance of the
DemoLookupclass. By setting theparentto the previousDemoLookupinstance, theScopewill actually traverse the scopes as we would expect by default. -
Extensible Syntax
Since the parser in Storm is built to support extensible languages, it is quite easy to use this functionality in our demo language. The goal is to call the function
addSyntaxon the parser to add new packages with syntax before we callparse. The question is how we manage to read which packages to include before we start parsing anything.The key to this problem is that it is possible to divide files into different sections by chaining
FileReaders. This allows us to treat a file in our language like this:use my.package --------------- f(a) = 10
Where the dashed line represents how we "split" the file into different parts. The two parts are essentially treated as different languages. The first one is only able to parse
usestatements, and the second only knows the function definition syntax. We also implement the first reader so that it does not produce an error if it is not able to reach the end of the file. This can be done by simply replacingparser.hasErrorwith!parser.hasTreein the code.In summary, the structure would look approximately like this:
lang:PkgReader reader(Url[] files, Package pkg) on Compiler { lang:filePkgReader(files, pkg, (x) => DemoIncludeReader(x)); } class DemoIncludeReader extends lang:FileReader { init(lang:FileInfo info) { init(info) {} } // We can provide 'readX' functions as well if we wish to. If we do, it is // a good idea to make sure that we only parse the text once, as parsing is // a fairly expensive operation. // Called by the system to request the "next" piece. lang:FileReader? next(lang:ReaderQuery query) : override { Parser<SIncludes> parser; parser.parse(info.contents, info.url, info.start); if (!parser.hasTree()) throw parser.error(); // Extract use-statements here. Name usedPkgs = []; // Create the "normal" reader. Ask it to start reading where we stopped. // We can pass any new parameters as well. return DemoFileReader(info.next(parser.matchEnd), usedPkgs); } } class DemoFileReader extends lang:FileReader { Name[] used; init(lang:FileInfo info, Name[] used) { init(info) { used = used; } } void readFunctions() : override { Parser<SRoot> parser; for (x in used) { // Resolve 'x' parser.addSyntax(resolved); } parser.parse(info.contents, info.url, info.start); // ... } }
-
Types
It requires some work to extend the language to other types, but it is doable. This requires adding a member to the
Exprclass that returns the type of each expression. This type information can then be used in the function call node to find a function that accepts the proper parameters. Since we can no longer guarantee that the result of evaluating a node fits in a register anymore, we need to transition to another scheme. One option is to let thecodefunction return aVarthat contains the value. This results in quite a number of copies of value types, but works. Copies can be eliminated by letting nodes work with references to value types, for example.
Full Source Code
For completeness, the full source produced in this tutorial is provided below.
syntax.bnf:
use core.lang; optional delimiter = SDelimiter; void SDelimiter(); SDelimiter : "[ \n\r\t]*" - (SComment - SDelimiter)?; void SComment(); SComment : "//[^\n]*\n" #comment; Array<DemoFunction> SRoot(); SRoot => functions : , (SFunction functions, )*; DemoFunction SFunction(); SFunction => DemoFunction(pos, name, params, body) : SIdentifier name #fnName, "(", SParamList params, ")", "=", SExpr @body; Str SIdentifier(); SIdentifier => x : "[A-Za-z]+" x; Array<Str> SParamList(); SParamList => Array<Str>() : ; SParamList => Array<Str>() : SIdentifier -> push #varName - (, ",", SIdentifier -> push #varName)*; Expr SExpr(Scope scope); SExpr => OpAdd(pos, l, r) : SExpr(scope) l, "\+", SProduct(scope) r; SExpr => OpSub(pos, l, r) : SExpr(scope) l, "-", SProduct(scope) r; SExpr => p : SProduct(scope) p; Expr SProduct(Scope scope); SProduct => OpMul(pos, l, r) : SProduct(scope) l, "\*", SAtom(scope) r; SProduct => OpDiv(pos, l, r) : SProduct(scope) l, "/", SAtom(scope) r; SProduct => a : SAtom(scope) a; Expr SAtom(Scope scope); SAtom => Literal(pos, num) : "-?[0-9]+" num #constant; SAtom => VarAccess(pos, name, scope) : SIdentifier name #varName; SAtom => FnCall(pos, name, params, scope) : SIdentifier name #fnName, "(", SActuals(scope) params, ")"; SAtom => x : "(", SExpr(scope) x, ")"; Array<Expr> SActuals(Scope scope); SActuals => Array<Expr>() : ; SActuals => Array<Expr>() : SExpr(scope) -> push - (, ",", SExpr(scope) -> push)*;
demo.bs:
use core:io; use core:lang; use lang:bs:macro; use core:asm; lang:PkgReader reader(Url[] files, Package pkg) on Compiler { lang:FilePkgReader(files, pkg, (x) => DemoFileReader(x)); } class DemoFileReader extends lang:FileReader { init(lang:FileInfo info) { init(info) {} } void readFunctions() : override { Parser<SRoot> parser; parser.parse(info.contents, info.url, info.start); if (parser.hasError) throw parser.error(); DemoFunction[] functions = parser.tree().transform(); for (f in functions) info.pkg.add(f); } } class DemoFunction extends Function { Str[] paramNames; SExpr body; init(SrcPos pos, Str name, Str[] paramNames, SExpr body) { Value[] valParams; for (x in paramNames) valParams << named{Int}; init(pos, named{Int}, name, valParams) { paramNames = paramNames; body = body; } setCode(LazyCode(&this.createCode)); } private CodeGen createCode() { CodeGen gen(runOn, isMember, result); gen.l << prolog(); DemoLookup lookup(this); for (name in paramNames) { Var param = gen.l.createParam(named{Int}.desc()); lookup.addVar(pos, name, param); } Expr e = body.transform(Scope(lookup)); e.code(gen); gen.l << fnRet(eax); return gen; } } class DemoVar extends Variable { Var codeVar; init(SrcPos pos, Str name, Var codeVar) { init(pos, name, named{Int}) { codeVar = codeVar; } } } class DemoLookup extends NameLookup { Str->DemoVar variables; init(NameLookup parent) { init(parent) {} } void addVar(SrcPos pos, Str name, Var v) { DemoVar var(pos, name, v); var.parentLookup = this; variables.put(name, var); } Named? find(SimplePart part, Scope scope) : override { if (part.params.empty) { if (found = variables.at(part.name)) { return found; } } return null; } }
ast.bs:
use core:asm; use core:lang; use lang:bs:macro; class Expr on Compiler { SrcPos pos; init(SrcPos pos) { init { pos = pos; } } // Convention: Output result into the eax register. void code(CodeGen to) : abstract; } class Literal extends Expr { Int value; init(SrcPos pos, Str value) { init(pos) { value = value.toInt(); } } void code(CodeGen to) : override { to.l << mov(eax, intConst(value)); } } class VarAccess extends Expr { DemoVar toRead; init(SrcPos pos, Str name, Scope scope) { init(pos) { toRead = if (x = scope.find(SimpleName(name)) as DemoVar) { x; } else { throw SyntaxError(pos, "Unknown variable: ${name}"); }; } } void code(CodeGen to) : override { to.l << mov(eax, toRead.codeVar); } } class FnCall extends Expr { private Expr[] params; private Function toCall; init(SrcPos pos, Str name, Expr[] params, Scope scope) { SimplePart part(name); for (x in params) part.params.push(named{Int}); SimpleName lookFor(part); init(pos) { params = params; toCall = if (x = scope.find(lookFor) as Function) { if (!Value(named{Int}).mayStore(x.result)) throw SyntaxError(pos, "Functions used in the demo language must return Int!"); x; } else { throw SyntaxError(pos, "Unknown function: ${lookFor}"); }; } } void code(CodeGen to) : override { Operand[] ops; for (x in params) { Var tmp = to.l.createIntVar(to.block); x.code(to); to.l << mov(tmp, eax); ops << tmp; } CodeResult result(named{Int}, to.block); toCall.autoCall(to, ops, result); to.l << mov(eax, result.location(to)); } } class OpAdd extends Expr { private Expr lhs; private Expr rhs; init(SrcPos pos, Expr lhs, Expr rhs) { init(pos) { lhs = lhs; rhs = rhs; } } void code(CodeGen to) : override { Var tmp = to.l.createIntVar(to.block); rhs.code(to); to.l << mov(tmp, eax); lhs.code(to); to.l << add(eax, tmp); } } class OpSub extends Expr { private Expr lhs; private Expr rhs; init(SrcPos pos, Expr lhs, Expr rhs) { init(pos) { lhs = lhs; rhs = rhs; } } void code(CodeGen to) : override { Var tmp = to.l.createIntVar(to.block); rhs.code(to); to.l << mov(tmp, eax); lhs.code(to); to.l << sub(eax, tmp); } } class OpMul extends Expr { private Expr lhs; private Expr rhs; init(SrcPos pos, Expr lhs, Expr rhs) { init(pos) { lhs = lhs; rhs = rhs; } } void code(CodeGen to) : override { Var tmp = to.l.createIntVar(to.block); rhs.code(to); to.l << mov(tmp, eax); lhs.code(to); to.l << mul(eax, tmp); } } class OpDiv extends Expr { private Expr lhs; private Expr rhs; init(SrcPos pos, Expr lhs, Expr rhs) { init(pos) { lhs = lhs; rhs = rhs; } } void code(CodeGen to) : override { Var tmp = to.l.createIntVar(to.block); rhs.code(to); to.l << mov(tmp, eax); lhs.code(to); to.l << idiv(eax, tmp); } }
test.bs:
void main() { print("f(1, 2) = ${f(1, 2)}"); print("g(4, 2) = ${g(4, 2)}"); print("h(10) = ${h(10)}"); } Int bsfn(Int x, Int y) { print("In Basic Storm: ${x}, ${y}"); x + y; }
demo.demo:
f() = (10 + 4) / 2 f(a, b) = a + b + 1 g(a, b) = f(a, a) * b h(a) = bsfn(f(a, a), 5)
