Map Syntax Using the Basic Storm AST
This page assumes that you have followed the setup for the syntax extension guide already,
so that you have somewhere to work. In this stage we will implement the map
syntax in terms of
Basic Storm. Compared to the repeat
syntax, the map
syntax has the added complexity that it
introduces new variables that should only be visible inside the body of the construct. As such, we
need to also consider the scoping of our new construct.
Syntax
The first step is, as with the repeat
statement, to create the grammar for
the new syntax. We do this by creating a file syntax.bnf
inside the map
directory we created in
the setup. As previously, we start by defining the the delimiters:
use lang.bs; optional delimiter = SDelimiter; required delimiter = SRequiredDelimiter;
After this, we add a production that defines the grammar for our extension. Since the map
syntax
acts as an expression, we extend SExpr
instead of SStmt
this time. As previously, we start by
adding the production itself and worry about the transforms later. We can write the map syntax as
follows:
SExpr : "map", "(", SExpr ~ "for" ~ SName ~ "in" ~ SExpr, ")";
Note that we use the ~
separator instead of ,
around the keywords for
and in
. Since SExpr
and SName
may be plain identifiers, it is useful to require a whitespace around them to avoid
surprises. For example, if the ~
after for
and in
were replaced with a ,
it would be
possible to write: map(x forx inarray)
. This does not look sensible, and would likely surprise
developers, so we disallow it.
As a sidenote: this ambiguity does not introduce problems in this particular case. It is, however,
useful to be careful when working with grammars, as they may inadvertently interfere with other
parts of the grammar. For example, if the spawn
keyword in Basic Storm would not require a space
after it, then the expression: spawnThread()
would parse as spawn Thread()
, which is most likely
not wat the developer intended.
We also add annotations for syntax highlighting of the keywords:
SExpr : "map" #keyword, "(", SExpr ~ "for" #keyword ~ SName #varName ~ "in" #keyword ~ SExpr, ")";
We are now ready to test that our grammar behaves as we expect. In the test.bs
file we used
before, add the following code to use our new syntax:
Int[] values = [8, 10, 3]; Int[] result = map(x + 2 for x in values); print(result.toS);
As in the previous example, we also need to include the package where the syntax is located. To do
this, we need to add the line use expressions:map
to the top of the file. After doing this, we can
run the test program (by typing storm .
in the terminal) to see if the grammar works as expected.
Since we have not yet implemented the semantics, we expect to get some form of error at this point,
as long as it is not a parse error. And indeed, running the example produces the following error
message:
@/home/storm/expressions/map/syntax.bnf(89-177): Syntax error: No return value specified for a production that does not return 'void'.
This makes sense since we have not specified a return value for our production yet.
Semantics
Since the syntax seems to be working as expected, we can now start working on the semantics of the
map
expression. To simplify the implementation a bit, we assume that we are working with arrays of
integers. It is of course possible to lift this limitation if we wished to, but it requires some
care to get type inference right with regards to array literals. Working only with integers also
simplifies the implementation of the IR version of the implementation.
Since the for
syntax needs to provide a loop variable that is only visible to the expression
"inside" the construct, we need to create a new scope for this expression. In Basic Storm, we do
this by creating a Block
instance. Each Block
instance roughly corresponds to a pair of curly
braces ({}
) in that it introduces a new scope where we can store local variables.
The Block
class actually corresponds to something that is slightly less than a pair of curly
braces ({}
). Namely, a Block
only provides a place to store variables. It does not contain any
expressions. Luckily, Basic Storm provides a subclass that both provides a new scope and is able to
contain expressions. This class is called ExprBlock
since it contains a list of expressions. As
such, we will use the ExprBlock
here to simplify our implementation. The IR implementation of the
map
construct will use a plain Block
for this purpose since we do not need the ability to store
expressions there.
As such, we can start our implementation by creating a new file called semantics.bs
and adding the
following code to it:
use lang:bs; use lang:bs:macro; use core:lang; use core:asm; class MapExpr extends ExprBlock { init(SrcPos pos, Block parent) { init(pos, parent) {} } }
Note that the constructor of ExprBlock
(and also Block
) requires two parameters: both a SrcPos
(the same as Expr
) and a Block
. The Block
parameter specifies the block inside which the new
block is created. This makes the scope in Basic Storm hierarchichal, meaning that the system does
not only look inside the currently active block for variables, but also in enclosing blocks. In the
case of our map
syntax, this makes it possible to use not only the loop variable in the
expression, but also variables from outside the loop.
We can then start to add annotations to the grammar to create our new class. As a start, we just
create the MapBlock
as follows:
SExpr => MapBlock(pos, block) : "map" #keyword, "(", SExpr ~ "for" #keyword ~ SName #varName ~ "in" #keyword ~ SExpr, ")";
If we run the program now, we get a different error. This time the error points to the code in our test program:
@/home/storm/expressions/test.bs(130-136): Syntax error: No appropriate constructor for core.Array(core.Int) found. Can not initialize 'result'. Expected signature: __init(core.Array(core.Int), void)
This error occurs because we have essentially transformed the map
expression into an empty block.
As such, our test code is now equivalent to:
Int[] values = [8, 10, 3]; Int[] result = {}; print(result.toS);
Since an empty block evaluates to void
, it is clear why the error arises when the compiler
attempts to initialize result
. This just means that we need to fill our block with some useful
code!
As a starting point, we can create some simple code using the pattern
macro again and add that to
our block by calling the add
function:
init(SrcPos pos, Block parent) { init(pos, parent) {} add(pattern(this) { Array<Int> out; out << 9; out; }); }
For now, we create a block that creates an array (out
), adds a single number to the array, and
then returns it back to the caller. We then add this block to our MapExpr
by calling the add
function. If we run the code at this point, it will produce an array that only contains the number 9
without any errors.
To inspect the generated AST, we can simply print the MapExpr
block at the end of the constructor
by adding a print
statement: print(this.toS)
. Currently, it produces the following output:
{ { core.Array(core.Int) out(core.Array(core.Int)()); out << 9; out; }; }
As we can see, we do indeed have two nested blocks. The outermost one is the MapExpr
block, and
the innermost one is the block inside the pattern
that we added. We do not technically need two
blocks like this, but it simplifies the implementation a bit and helps us hide internal variables.
Using the Input
The next step is to actually read data from the input array. We can do this similarly to the
repeat
statement. The first step is to capture the expression in the grammar and pass it as a
variable to the MapExpr
class. In the grammar, we simply bind the expression that corresponds to
the source array to the variable src
and pass it to MapExpr
:
SExpr => MapExpr(pos, block, src) : "map" #keyword, "(", SExpr ~ "for" #keyword ~ SName #varName ~ "in" #keyword ~ SExpr(block) src, ")";
Then, we modify the MapExpr
constructor to accept the new parameter and use it in our pattern:
init(SrcPos pos, Block parent, Expr src) { init(pos, parent) {} add(pattern(this) { Array<Int> original = ${src}; Array<Int> out; for (Nat i = 0; i < original.count(); i++) { out << original[i]; } out; }); }
Note that we start by saving the value of the expression ${src}
in the variable original
in the
beginning of the pattern, rather than using ${src}
throughout. Since ${src}
inserts the
expression src
into all locations where it appear, using ${src}
more than once would cause the
expression in src
to be evaluated more than once. Since we simply supply a variable in our
example, this would be fine. However, if src
were a more complex expression that potentially had
side-effects (like function calls), this would cause our map
expression to not behave according to
the programmer's expectations. Assigning ${src}
to a variable with the proper type has the
additional benefit that Basic Storm will automatically perform any type conversions for us during
the assignment, so we do not need the call to expectCastTo
call that was required in the repeat
syntax. We can still insert it if we wish to, as it does not do any damage.
The remainder of the pattern simply copies the contents of original
to out
using a loop. There
are, of course, other ways to do this more concisely, but we will use this structure to actually
perform transform each element later on.
If we run the program now, we will find that the program (unsurprisingly) produces a copy of our
original array. Again, it is possible to print the MapExpr
at the end of the constructor to see
what the pattern actually expanded to:
{ { core.Array(core.Int) original = values; core.Array(core.Int) out(core.Array(core.Int)()); { core.Nat i = 0; for (i < core.ArrayBase.count(original); core.Nat.*++(i)) { out << core.Array(core.Int).[](original, i); }; }; out; }; }
In particular, we can see that ${src}
was replaced by the variable values
, since we used that
variable in our test program. We can also see that the for
loop actually expands to two blocks.
One that declares the loop variable, and one that implements the remainder of the loop logic.
Transforming Values
The final step of our implementation is to use the second expression to actually transform the
values. To do this, we need to update the grammar to bind the relevant parts of the match to
variables and pass them to our MapExpr
class:
SExpr => MapExpr(pos, block, src, var, tfm) : "map" #keyword, "(", SExpr(block) tfm ~ "for" #keyword ~ SName var #varName ~ "in" #keyword ~ SExpr(block) src, ")";
Then we modify the constructor of MapExpr
so that it creates a variable with the specified name
and adds it to itself:
init(SrcPos pos, Block parent, Expr src, SStr varName, Expr tfm) { init(pos, parent) {} Var local(this, named{Int}, varName, Actuals()); add(local); // The remainder of the constructor is as before. }
The class Var
is a variable declaration statement. The constructor call corresponds to a
declaration like Int x();
. The first parameter is the block in which the variable shall be
created, the second parameter is the type of the variable, the third is the name (and position, as a
SStr
), and the final parameter is the parameters to the constructor call. We wish to call the
default constructor, so we pass a default-constructed Actuals
object.
Creating the variable declaration in this manner will automatically create a LocalVar
object that
describes the actual variable and add it to the block passed as the first parameter. To properly
initialize the variable we do, however, also need to add the variable declaration statement to the
block. We do this using the add
function as before. At this point, we can run our program and
verify that it works as intended. If we print the MapExpr
class, we can see that the
implementation currently produces the following code:
{ core.Int x; { core.Array(core.Int) original = values; core.Array(core.Int) out(core.Array(core.Int)()); { core.Nat i = 0; for (i < core.ArrayBase.count(original); core.Nat.*++(i)) { out << core.Array(core.Int).[](original, i); }; }; out; }; }
That is, we declare the variable x
in the outermost scope, just as we hoped!
So far, all seems well. Let's proceed to actually use the variable with the expression. We wish to produce a loop that looks like this:
for (Nat i = 0; i < original.count(); i++) { <varName> = original[i]; out << <tfm>; }
That is, we first set the variable we created to the element that we found at the current index of
the array. After that, we can evaluate the tfm
expression (which might use the variable) to
produce the transformed value for our array.
The first part, the assignment, requires us to be able to access the variable. Simply adding
${varName} = original[i]
will sadly not work, since the ${}
syntax expects the expression that
appears inside of it to evaluate to an Expr
. Luckily, Basic Storm provides an expression that lets
us access a variable. It is called LocalVarAccess
, and we can simply give it a position and the
variable we wish to access. For the second part, we can simply insert tfm
using ${tfm}
since
tfm
is already an Expr
. This means that we change our pattern
to the following:
add(pattern(this) { Array<Int> original = ${src}; Array<Int> out; for (Nat i = 0; i < original.count(); i++) { ${LocalVarAccess(pos, local.var)} = original[i]; out << ${tfm}; } out; });
If we run the program at this point, we will get a surprising error:
@/home/storm/expressions/test.bs(143-144): Syntax error: Can not find "x" with parameters ().
In an attempt to debug the issue, we can try to print the MapExpr
at the end of the constructor
once more. If we do this, we get the following output:
{ core.Int x; { core.Array(core.Int) original = values; core.Array(core.Int) out(core.Array(core.Int)()); { core.Nat i = 0; for (i < core.ArrayBase.count(original); core.Nat.*++(i)) { x = core.Array(core.Int).[](original, i); out << <x[invalid] + 2>; }; }; out; }; }
The output looks as we would expect, except for the last line inside the loop (out << <x[invalid] +
2>
). The angle brackets (<>
) are normal, they are simply used to clarify the precedence of
operators in the textual representation. However, we would expect it to read out << <x + w>
, not
out << <x[invalid] + 2>
. Why is x
invalid in this location, when we could assign to x
on the
line before?
Fixing the Scope
To understand why the error occurs, we must understand how Basic Storm resolves names.
In general, name resolution in Basic Storm happens as soon as nodes in the AST are created. This is
why it is necessary to pass a Block
to the SExpr
and SStmt
rules in the Basic Storm syntax.
Otherwise the system would not know how to resolve names!
As a sidenote, this might make you wonder why the error was not reported as soon as it was
discovered. The reason for this is that some parts of Basic Storm need to be able to detect and
re-try failed name lookups. One example of this is the =
operator. For example, consider an
assignment like a.b = c
. However, a.b
does not exist, so the =
operator should attempt to
lookup a.b(c)
instead to see if b
is an assignment function in this context. To achieve this,
the =
operator must be able to detect and recover from name resolution errors. With the current
implementation, it can simply examine the AST node that corresponds to the left-hand side and see if
it corresponds to an invalid name, and act accordingly. This would be more difficult if Basic Storm
reported name resolution errors directly.
The upside of this behavior is that it makes it clear why the local variables we introduced inside
the pattern are not visible to the tfm
expression. Since we never use the block produced by the
pattern
as a parent block for anything, the variables in the pattern
block will never be
reachable by name lookups in the tfm
expression, even after we make the variable in the MapExpr
itself visible.
Regardless, we need to ensure that the names that appear inside the tfm
expression are resolved in
the proper scope. We can solve this in the grammar by passing the newly created MapExpr
rather
than the parent block to the tfm
expression. Luckily, the MapExpr
is bound to the me
variable in the grammar, so we can just replace block
with me
as follows:
SExpr => MapExpr(pos, block, src, var, tfm) : "map" #keyword, "(", SExpr(me) tfm ~ "for" #keyword ~ SName var #varName ~ "in" #keyword ~ SExpr(block) src, ")";
Nowever, this introduces another issue. If we try to run the program now, we get the following error:
@/home/storm/expressions/map/syntax.bnf(89-247): Syntax error: Can not use 'me' this early!
If we examine the grammar a bit more closely, we can see the problem. We have specified that in
order to create tfm
, the Syntax Language needs to create the contents of the variable me
.
However, to create the variable me
requires calling MapExpr(pos, block, src, var, tfm)
, which
involves having created tfm
. As such, tfm
must be created before me
, and me
must be created
before tfm
. That is not possible.
To solve the issue, we must allow the MapExpr
to be created without the tfm
expression. We can
do this by creating another ExprBlock
in the MapExpr
constructor, and then later insert the
tfm
expression into that block. To do this, we modify the constructor to not require the tfm
parameter:
use lang:bs; use lang:bs:macro; use core:lang; use core:asm; class MapExpr extends ExprBlock { private ExprBlock loopBody; init(SrcPos pos, Block parent, Expr src, SStr varName) { init(pos, parent) { loopBody(pos, parent); } Var local(this, named{Int}, varName, Actuals()); add(local); add(pattern(this) { Array<Int> original = ${src}; Array<Int> out; for (Nat i = 0; i < original.count(); i++) { ${LocalVarAccess(pos, local.var)} = original[i]; out << ${loopBody}; } out; }); } }
This change essentially replaces the usage of tfm
with another ExprBlock
called loopBody
. We
use the loop body as a placeholder to make it easy to insert the actual loop body later on.
Since the constructor of MapExpr
no longer requres the tfm
expression as a parameter, we can
resolve the issue in the grammar by simply removing the last parameter to MapExpr
(to make it read
MapExpr(pos, block, src, var)
). If we run the program now, it will produce another error:
@/home/storm/expressions/map/semantics.bs(491-493): Syntax error: Can not find an implementation of the operator << for core.Array(core.Int)&, void.
To understand why the error occurred, we can once again print the generated syntax tree:
{ core.Int x; { core.Array(core.Int) original = values; core.Array(core.Int) out(core.Array(core.Int)()); { core.Nat i = 0; for (i < core.ArrayBase.count(original); core.Nat.*++(i)) { x = core.Array(core.Int).[](original, i); out << { }; }; }; out; }; }
From this output, it is fairly easy to see the issue. The last line in the loop currently reads:
out << {};
. Again, since empty blocks evaluate to void
, we are currently trying to insert a
value of the type void
into an array that stores integers. This is what the compiler complains
about.
Luckily, it is fairly easy to fix the error. We simply need to insert the tfm
expression into our
empty block. We can do this conveniently by creating a member function in the MapExpr
class that
we call from the grammar. We call the function setTransform
and implement it as follows:
void setTransform(Expr expr) { loopBody.add(expr); }
We can then call the function from the grammar by replacing tfm
with -> setTransform
. This means
that the syntax transform will call me.setTransform(<node>)
when the variable me
is created,
which is exactly what we need. As such, we can change our production into the following:
SExpr => MapExpr(pos, block, src, var) : "map" #keyword, "(", SExpr(me) -> setTransform ~ "for" #keyword ~ SName var #varName ~ "in" #keyword ~ SExpr(block) src, ")";
And with that, the circular dependency is resolved while making sure that the transform expression
is passed to the MapExpr
in the end. To inspect the final tree, we can add a print
statement to
the end of the setTransform
function. It produces the following output, which is what the compiler
eventually turns into machine code:
{ core.Int x; { core.Array(core.Int) original = values; core.Array(core.Int) out(core.Array(core.Int)()); { core.Nat i = 0; for (i < core.ArrayBase.count(original); core.Nat.*++(i)) { x = core.Array(core.Int).[](original, i); out << { x + 2; }; }; }; out; }; }
As we can see, the expression x + 2
was inserted into the previously empty block, which caused the
block evaluate to the correct type. Note that since we embedded the transform expression inside a
block, it is necessary to use a call to expectCastTo
to make sure that automatic type inference
works properly in some cases. For simplicity, this was omitted from the code above.
For completeness, the final implementation of the MapExpr
class is as follows:
use lang:bs; use lang:bs:macro; use core:lang; use core:asm; class MapExpr extends ExprBlock { private ExprBlock loopBody; init(SrcPos pos, Block parent, Expr src, SStr varName) { init(pos, parent) { loopBody(pos, parent); } Var local(this, named{Int}, varName, Actuals()); add(local); add(pattern(this) { Array<Int> original = ${src}; Array<Int> out; for (Nat i = 0; i < original.count(); i++) { ${LocalVarAccess(pos, local.var)} = original[i]; out << ${loopBody}; } out; }); } void setTransform(Expr expr) { loopBody.add(expectCastTo(expr, named{Int}, scope)); } }