Map Syntax Using the Intermediate Language
This page assumes that you have followed the setup for the syntax extension guide already,
so that you have somewhere to work. This page describes how to re-implement the
map syntax using the Intermediate Language. Since the grammar is identical, we
will not modify it here. So if you have not already implemented the map
grammar, copy it from the
end of the previous part. Also make sure you have added the example code to the
test.bs
file as well, as we will use it for testing purposes throughout this page.
Creating an AST Node
Similarly to the repeat
syntax, we need to create a node that can be
inserted into the Basic Storm AST in order to be able to generate intermediate code at the right
location. Since the map
syntax introduces additional variables, we need to inherit the Block
class rather than from the Expr
class. Since we will be generating code directly, there is no
benefit from inheriting from the ExprBlock
class as we did when we used the Basic Storm AST.
As such, to get started we can replace the old MapExpr
class with a minimal definition that
defines the operations that are used by the syntax:
use lang:bs; use lang:bs:macro; use core:lang; use core:asm; class MapExpr extends Block { init(SrcPos pos, Block parent, Expr src, SStr varName) { init(pos, parent) {} } void setTransform(Expr expr) {} }
Since the map expression does not do anything at the moment, we get the following error if we try to
run the example (using storm .
):
@/home/storm/expressions/test.bs(130-136): Syntax error: No appropriate constructor for core.Array(core.Int) found. Can not initialize 'result'. Expected signature: __init(core.Array(core.Int), void)
As we have seen previously, this message arises because our AST node currently reports that it
evaluates to void
. Since we use attempt to assign the result of our AST node to a variable, the
system will report this error.
To fix this error, we need to tell the system that the MapExpr
node evaluates to an expression of
the type Array<Int>
(or Int[]
). Like before, we do this by overriding the result
member like
so:
ExprResult result() : override { ExprResult(Value(named{Array<Int>})); }
If we run the program now, we will not get a type error. However, we will instead get the message
that the abstract function blockCode
was called. This function is similar to the code
function
in the Expr
class in that it is called to generate code for the contents of the block. Actually,
since the Block
class inherits directly from the Expr
class, the Block
class overrides the
code
function in Expr
and generates code that initializes the block and then calls blockCode
.
There is, however, no default implementation of the blockCode
function, which is why we see the
error now. This is also the reason why we used ExprBlock
in the previous page. The ExprBlock
does implement blockCode
by simply generating code for all expressions that have been added to it.
Since Block
does only contains variables, it is not possible for the Block
class to provide a
default implementation.
Anyway, to fix the problem, we can simply add a definition of the blockCode
function to our class:
void blockCode(CodeGen gen, CodeResult res) : override {}
This time, the program will compile and start running. However, it will crash just after the
MapExpr
with a Memory access error
. This is not surprising since the MapExpr
node promised to
produce a Array<Int>
, but did not actually do so.
Creating an Array
In order to fix the crash, we need to generate code that creates and returns an array in the
blockCode
function. It is a bit cumbersome to do this directly in the Intermediate Language, since
we need to take care of allocating memory and executing the constructor. Luckily, the code
generation library in Storm provides the
function allocObject
that helps us with this. As such, we can create a variable to store the
result as follows:
core:asm:Var resultVar = res.location(gen); allocObject(gen, named{Array<Int>:__init<Array<Int>>}, [], resultVar); res.created(gen);
The first line simply retrieves the variable where the result of the MapExpr
expression should be
stored. The next line calls allocObject
to generate the code for creating the array. The call
takes four parameters. The first one is the CodeGen
object that should receive the generated code,
and that contains metadata about the current code generation status. The second parameter is a
Function
that is expected to be the constructor to call. Here, we use the named
syntax to find
the default constructor for the Array<Int>
class (remember: the this
parameter is explicit). The
next parameter is an array of parameters to the constructor. Since we call the default constructor,
we do not need to supply any parameters. The final parameter is the variable where the created
object should be stored.
Finally, the last row informs the CodeResult
class that we have initialized the result, so that it
can set up exception handling if required. In this particular case, however, created
is a no-op
since heap allocations are cleaned up anyway. It is, however, still good practice to call created
to future-proof the code.
If we run the program now, we see that it no longer crashes and prints an empty array. A good first step!
As before, it is possible to view the generated code by printing the gen
object at the end of the
blockCode
function. By doing this we can see that the allocObject
function generated the
following code in this case:
fnParam p@core.Array(core.Int) - pointer:04(04)/08(08)@+0x00 fnCall p[Var0], p@C++:allocType - pointer:04(04)/08(08)@+0x00 fnParam p[Var0] - pointer:04(04)/08(08)@+0x00 fnCall p@core.Array(core.Int).__init(core.Array(core.Int)) - none:00(01)@+0x00[member]
That is, it first calls the built-in function allocType
to allocate memory for the object. It then
calls the constructor to initialize the memory.
Storing the Captured AST Nodes
In order to generate the correct code for the map
expression, we must first make sure to actually
store the expressions we captured from the grammar. Otherwise, it will be hard to do something
useful in the blockCode
function.
We need three things in order to generate relevant code: the expression that evaluates to the source
array (src
), the name of the variable used in the expression varName
, and the expression that
should be evaluated to transform individual elements (received in setTransform
). The expression
src
is easy, we can simply store it in the class and use it later. We also wish to store the
transform expression in a similar way. However, since we do not receive it in the constructor we
create a Maybe<Expr>
(i.e. Expr?
) variable so that we can leave it uninitialized in the
constructor and set it later:
use lang:bs; use lang:bs:macro; use core:lang; use core:asm; class MapExpr extends Block { private Expr src; private Expr? transform; init(SrcPos pos, Block parent, Expr src, SStr varName) { init(pos, parent) { src = expectCastTo(src, named{Array<Int>}, parent.scope); } } void setTransform(Expr expr) { transform = expectCastTo(expr, named{Int}, scope); } // ... }
As before, we use the function expectCastTo
to help us type-check the source expression, and to
perform any implicit conversions if necessary.
The variable varName
needs a bit more thought. We need to actually create the variable in the
constructor. Otherwise the transform expression will not be able to find it, which will eventually
result in a name error (as we saw in the previous page). We also save the created variable in a
member of the MapExpr
class so that we can easily find the variable in the blockCode
function
later. This way we do not have to look up the variable by name again.
We can achieve this by defining a member variable boundVar
that we initialize in the constructor
and then add to the block:
use lang:bs; use lang:bs:macro; use core:lang; use core:asm; class MapExpr extends Block { private Expr src; private Expr? transform; private LocalVar boundVar; init(SrcPos pos, Block parent, Expr src, SStr varName) { init(pos, parent) { src = expectCastTo(src, named{Array<Int>}, parent.scope); boundVar = LocalVar(varName.v, named{Int}, varName.pos, false); } add(boundVar); } // ... }
The constructor to LocalVar
takes four parameters as follows: the first is the name of the
variable as a string. Since we have a SStr
, we extract the string by reading the member v
. The
second parameter is the type of the variable. We wish to store integers, so we simply give the type
Int
. The third parameter is the location of the variable definition, which we retrieve from the
varName
variable through the member pos
. The final parameter indicates if the variable is a
parameter or not. Since this is not a function parameter, we simply pass false
. After creating the
variable, we add it to the MapExpr
, so that the transform expression can find it.
Generating Code
Now that we have all information that we need, we can finally start generating code for the map
construct. As before, we start by sketching the code we wish to generate with placeholders for the
different expressions. In this case, we wish to generate the following code:
# ... # create an array, <resultVar>, that will be our result # evaluate <src>, store it in <originalArray> # evaluate <originalArray>.count(), store it in <originalCount> mov <loopCounter>, 0 loopStart: cmp <loopCounter>, <originalCount> jmp loopEnd, ifAboveEqual # extract element <loopCounter> from <originalArray>, store in <boundVar> # evaluate <transform>, store result in <transformed> # call <resultVar>.push(<transformed>) add <loopCounter>, 1 jmp loopStart loopEnd: # ...
The general structure is similar to that of the repeat
statement. The contents of the loop is,
however, a bit more involved since we need to interact with more parts of the system.
Based on the outline above, we can now start generating code, one step at a time. Note that we have
already implemented the first step (create an array and store it in <resultVar>
) to avoid the
previous crash.
The next step is therefore to evaluate src
and store it in a temporary variable. We can actually
do this quite simply by utilizing the fact that the CodeResult
can create the variable for us:
CodeResult originalArrayRes(named{Array<Int>}, gen.block); src.code(gen, originalArrayRes); core:asm:Var originalArray = originalArrayRes.location(gen);
The first line creates a CodeResult
. We instruct it that we wish to receive the result in a
variable that is visible inside the currently active block, and that we expect to receive the type
Array<Int>
. This means that we will either reuse a variable that already exists elsewhere in the
system, or that the CodeResult
will create a suitable variable for us. Potentially reusing a
variable is perfectly fine, since we will only ever read from the variable, and never modify it.
The second line then asks the src
expression to generate code, and to place its results in the
location we described using the CodeResult
object. Finally, we extract the variable from the
originalArrayRes
so that it is more convenient to use later on.
The next step is to call <originalArray>.count
to find the length of the array. We can call
functions conveniently by using the member autoCall
in the Function
class to generate the code
for us. The autoCall
uses the CodeResult
class just like the code
function for expressions, so
the code looks very similar to what we have written so far:
Function countFn = named{Array<Int>:count<Array<Int>>}; CodeResult originalCountRes(named{Nat}, gen.block); countFn.autoCall(gen, [originalArray], originalCountRes); core:asm:Var originalCount = originalCountRes.location(gen);
The first line simply stores the function we wish to call in a variable to make the code more
readable. The second line creates another CodeResult
instance. Again, we instruct it that we wish
to have the result stored inside a variable that is visible in the active block. This time the type
should be Nat
to match the return type of the count
function.
In the third line, we call autoCall
to generate the code necessary for actually calling the
function. The first parameter is the CodeGen
instance where the generated code should be placed,
the second parameter is an array of parameters to the function. Here, we pass a single parameter
that is used as the this
parameter. The third and final parameter is our CodeResult
that
indicates where we want the result to be located.
The fourth and last line simply extracts the created variable and stores it in a variable for easy access later.
After extracting the count, we are almost ready to start the loop. First, we need the loop counter.
Since we initialize the loop counter ourselves, rather than evaluating an expression to get it, we
need to create the variable manually. We can do this using the createVar
function inside
CodeGen
. We then initialize the variable to zero by moving the value zero into it. Note that the
last step is not strictly necessary since the Intermediate Language guarantees that variables are
initialized to zero by default. The mov
instruction is shown here to illustrate the syntax.
core:asm:Var loopCounter = gen.createVar(named{Nat}).v; gen.l << mov(loopCounter, natConst(0));
After creating the variable, we can emit the header of the loop. We start by creating a label that
we call loopStart
and output it, so that we can jump back to the header of the loop later:
Label loopStart = gen.l.label(); gen.l << loopStart;
After emitting the label, we emit code for checking if the loop should be terminated. In our case,
we should compare the variable loopCounter
to the variable originalCount
. We can do this with a
cmp
instruction:
gen.l << cmp(loopCounter, originalCount);
This sets an internal register in the machine that we can use for conditional jumps. We wish to exit
the loop if loopCounter
is greater than or equal to originalCount
. Since both values are
unsigned integers (Nat
s), we use the condition ifAboveEqual
for this (ifGreaterEqual
can be
used for signed numbers). To tell the jmp
instruction where it should jump to, we need to create
the label loopEnd
already here. We will then emit the label at the end of the loop.
Label loopEnd = gen.l.label(); gen.l << jmp(loopEnd, CondFlag:ifAboveEqual);
Next, we emit the body of the loop. The first thing we need to do is to get the current element from
the array. We achieve this by calling the []
operator like before. One detail to note with the
[]
operator of the Array
class is that it returns a reference to the element rather than the
element itself. This is to allow modifying the element in the array by assigning to it. This does,
however, mean that we need to take this into consideration when generating machine code, since we
need to dereference the returned value to retrieve the actual value. Basic Storm handles this
transparently for us. We can check for this in the code by examining the member result.ref
inside
the function. If it is true, the function returns a reference (i.e. a pointer) to the value rather
than the value itself.
In spite of this the different return type, the call looks similar to the ones we have written previously:
Function accessFunction = named{Array<Int>:"[]"<Array<Int>, Nat>}; CodeResult elemPtr(accessFunction.result, gen.block); accessFunction.autoCall(gen, [originalArray, loopCounter], elemPtr);
There are two differences that might be interesting to observe. First, since the function []
is
not an identifier according to Basic Storm, we need to enclose it in quotes to be able to write it
in the named
syntax. This is a special case in the named
syntax to make it possible to retrieve
special members like the []
function. It is not possible to enclose parts in a name in quotes
anywhere else in Basic Storm.
The second difference is that we pass accessFunction.result
to the constructor for CodeResult
,
rather than simply specifying named{Int}
. This is to instruct the CodeResult
that we want a
reference to an integer rather than an integer. The constructor accepts a Value
instance as its
first parameter, but since the default constructor of Value
creates a non-reference by default,
simply calling CodeResult(named{Int}, ...)
would not have the desired effect. We could, however,
write CodeResult(Value(named{Int}, true), ...)
or CodeResult(Value(named{Int}).asRef(), ...)
instead. However, just passing the result type of the function makes the intent clearer in this
case.
After we have acquired a pointer to the current element, we can read the actual value and copy it to
the boundVar
variable. The Intermediate Language allows us to follow pointers by using the
intRel
operand. However, this operand requires that the pointer is in a register beforehand. Since
we do not care about the values in any of the three general purpose registers (a
, b
, and c
) at
the moment, we can simply use any one of them. As such, we simply pick register a
for the task.
Since we are goint to store a pointer in the register, we use the name ptrA
to indicate that we
are using the pointer-sized version of the register a
:
gen.l << mov(ptrA, elemPtr.location(gen)); gen.l << mov(boundVar.var.v, intRel(ptrA));
In the first line, we load the value returned from the function (stored in elemPtr
) into the
register ptrA
. We then load an integer from the address that ptrA
contains by using the operand
intRel
(there are versions for different sizes as well), and store it in the variable we created
in the constructor. The member var
contains a VarInfo
type, which in turn stores the actual
variable inside its v
member.
Note that the code above utilizes the fact that we are only working with integer types in this
implementation. If the array could contain other types, we would have to replace the second mov
with a copy operation that is appropriate for the contained type. For numeric and pointer types, it
is enough to replace intRel
with xRel
and specify the size. For value types, however, we need to
call the copy constructor, and ensure that any destructors are called at the end of the loop (or in
case of an exception). Another option would be to make boundVar
into a reference rather than a
copy of the value. This means that it is possible to assign to the variable in the map
syntax, but
it would avoid copies of value types.
After setting boundVar
to the current element, we can simply evaluate the transform
expression.
Since the transform
variable might contain null
, we must explicitly ensure that it does not
contain null
before using it. Perhaps the cleanest way to do this is to add an unless
block at
the start of the function:
unless (transform) throw InternalError("No transform set for this MapBlock!");
After that, at the end of the function, we can evaluate the expression since Basic Storm now knows
that transform
does not contain null
in the remainder of the function:
CodeResult transformed(named{Int}, gen.block); transform.code(gen, transformed);
Now that we have computed the new value, we can insert it into the new array by calling the push
member of the array. Similarly to the []
operator, the push
member expects a reference to the
element to be inserted rather than a value. As such, we need to get the address of the variable by
using the lea
(Load Effective Address) instruction in the Intermediate Language before passing it
to the autoCall
function. We could create a separate variable to store the address, but since we
only have one temporary value like this, we can simply use the register a
(using the name ptrA
to indicate that it is a pointer):
gen.l << lea(ptrA, transformed.location(gen)); Function pushFn = named{Array<Int>:push<Array<Int>, Int>}; pushFn.autoCall(gen, [resultVar, ptrA], CodeResult());
Note that since we are not interested in the result from the push
function, we can simply pass a
default-constructed CodeResult
to autoCall
to let it know that we do not need the result. This
is a useful thing to do in most situations, as different parts of the system can utilize the
knowledge that the result will be discarded to avoid performing some computations. For example,
passing a default-constructed CodeResult
to the AST node for an integer literal generates no code
at all, since the node knows that the result will not be used.
Now that we are done with the loop body, we can generate the end of the loop. We need to increment the loop counter to advance to the next element, and then we can jump back to the top of the loop:
gen.l << add(loopCounter, natConst(1)); gen.l << jmp(loopStart);
Finally, we need to define the location of the end of the loop by appending the loopEnd
label to
the listing. Otherwise, the jmp
instruction at the start of the loop will not know where to jump:
gen.l << loopEnd;
And with this, we have implemented our plan, and we are ready to test our implementation. If you have done everything correctly, the code should now work as expected.
For reference, the final implementation of the semantics are provided below:
use lang:bs; use lang:bs:macro; use core:lang; use core:asm; class MapExpr extends Block { private Expr src; private Expr? transform; private LocalVar boundVar; init(SrcPos pos, Block parent, Expr src, SStr varName) { init(pos, parent) { src = expectCastTo(src, named{Array<Int>}, parent.scope); boundVar = LocalVar(varName.v, named{Int}, varName.pos, false); } add(boundVar); } void setTransform(Expr expr) { transform = expr; } ExprResult result() : override { ExprResult(Value(named{Array<Int>})); } void blockCode(CodeGen gen, CodeResult res) : override { unless (transform) throw InternalError("No transform set for a MapBlock!"); core:asm:Var resultVar = res.location(gen); allocObject(gen, named{Array<Int>:__init<Array<Int>>}, [], resultVar); res.created(gen); CodeResult originalArrayRes(named{Array<Int>}, gen.block); src.code(gen, originalArrayRes); core:asm:Var originalArray = originalArrayRes.location(gen); Function countFn = named{Array<Int>:count<Array<Int>>}; CodeResult originalCountRes(named{Nat}, gen.block); countFn.autoCall(gen, [originalArray], originalCountRes); core:asm:Var originalCount = originalCountRes.location(gen); core:asm:Var loopCounter = gen.createVar(named{Nat}).v; gen.l << mov(loopCounter, natConst(0)); Label loopStart = gen.l.label(); gen.l << loopStart; Label loopEnd = gen.l.label(); gen.l << cmp(loopCounter, originalCount); gen.l << jmp(loopEnd, CondFlag:ifAboveEqual); Function accessFunction = named{Array<Int>:"[]"<Array<Int>, Nat>}; CodeResult elemPtr(accessFunction.result, gen.block); accessFunction.autoCall(gen, [originalArray, loopCounter], elemPtr); gen.l << mov(ptrA, elemPtr.location(gen)); gen.l << mov(boundVar.var.v, intRel(ptrA)); CodeResult transformed(named{Int}, gen.block); transform.code(gen, transformed); gen.l << lea(ptrA, transformed.location(gen)); Function pushFn = named{Array<Int>:push<Array<Int>, Int>}; pushFn.autoCall(gen, [resultVar, ptrA], CodeResult()); gen.l << add(loopCounter, natConst(1)); gen.l << jmp(loopStart); gen.l << loopEnd; } }