Map Syntax Using the Basic Storm AST

This page assumes that you have followed the setup for the syntax extension guide already, so that you have somewhere to work. In this stage we will implement the map syntax in terms of Basic Storm. Compared to the repeat syntax, the map syntax has the added complexity that it introduces new variables that should only be visible inside the body of the construct. As such, we need to also consider the scoping of our new construct.

Syntax

The first step is, as with the repeat statement, to create the grammar for the new syntax. We do this by creating a file syntax.bnf inside the map directory we created in the setup. As previously, we start by defining the the delimiters:

use lang.bs;

optional delimiter = SDelimiter;
required delimiter = SRequiredDelimiter;

After this, we add a production that defines the grammar for our extension. Since the map syntax acts as an expression, we extend SExpr instead of SStmt this time. As previously, we start by adding the production itself and worry about the transforms later. We can write the map syntax as follows:

SExpr : "map", "(", SExpr ~ "for" ~ SName ~ "in" ~ SExpr, ")";

Note that we use the ~ separator instead of , around the keywords for and in. Since SExpr and SName may be plain identifiers, it is useful to require a whitespace around them to avoid surprises. For example, if the ~ after for and in were replaced with a , it would be possible to write: map(x forx inarray). This does not look sensible, and would likely surprise developers, so we disallow it.

As a sidenote: this ambiguity does not introduce problems in this particular case. It is, however, useful to be careful when working with grammars, as they may inadvertently interfere with other parts of the grammar. For example, if the spawn keyword in Basic Storm would not require a space after it, then the expression: spawnThread() would parse as spawn Thread(), which is most likely not wat the developer intended.

We also add annotations for syntax highlighting of the keywords:

SExpr : "map" #keyword, "(", SExpr ~ "for" #keyword ~ SName #varName ~ "in" #keyword ~ SExpr, ")";

We are now ready to test that our grammar behaves as we expect. In the test.bs file we used before, add the following code to use our new syntax:

Int[] values = [8, 10, 3];
Int[] result = map(x + 2 for x in values);
print(result.toS);

As in the previous example, we also need to include the package where the syntax is located. To do this, we need to add the line use expressions:map to the top of the file. After doing this, we can run the test program (by typing storm . in the terminal) to see if the grammar works as expected. Since we have not yet implemented the semantics, we expect to get some form of error at this point, as long as it is not a parse error. And indeed, running the example produces the following error message:

@/home/storm/expressions/map/syntax.bnf(89-177): Syntax error:
No return value specified for a production that does not return 'void'.

This makes sense since we have not specified a return value for our production yet.

Semantics

Since the syntax seems to be working as expected, we can now start working on the semantics of the map expression. To simplify the implementation a bit, we assume that we are working with arrays of integers. It is of course possible to lift this limitation if we wished to, but it requires some care to get type inference right with regards to array literals. Working only with integers also simplifies the implementation of the IR version of the implementation.

Since the for syntax needs to provide a loop variable that is only visible to the expression "inside" the construct, we need to create a new scope for this expression. In Basic Storm, we do this by creating a Block instance. Each Block instance roughly corresponds to a pair of curly braces ({}) in that it introduces a new scope where we can store local variables.

The Block class actually corresponds to something that is slightly less than a pair of curly braces ({}). Namely, a Block only provides a place to store variables. It does not contain any expressions. Luckily, Basic Storm provides a subclass that both provides a new scope and is able to contain expressions. This class is called ExprBlock since it contains a list of expressions. As such, we will use the ExprBlock here to simplify our implementation. The IR implementation of the map construct will use a plain Block for this purpose since we do not need the ability to store expressions there.

As such, we can start our implementation by creating a new file called semantics.bs and adding the following code to it:

use lang:bs;
use lang:bs:macro;
use core:lang;
use core:asm;

class MapExpr extends ExprBlock {
    init(SrcPos pos, Block parent) {
        init(pos, parent) {}
    }
}

Note that the constructor of ExprBlock (and also Block) requires two parameters: both a SrcPos (the same as Expr) and a Block. The Block parameter specifies the block inside which the new block is created. This makes the scope in Basic Storm hierarchichal, meaning that the system does not only look inside the currently active block for variables, but also in enclosing blocks. In the case of our map syntax, this makes it possible to use not only the loop variable in the expression, but also variables from outside the loop.

We can then start to add annotations to the grammar to create our new class. As a start, we just create the MapBlock as follows:

SExpr => MapBlock(pos, block)
  : "map" #keyword, "(", SExpr ~ "for" #keyword ~ SName #varName
  ~ "in" #keyword ~ SExpr, ")";

If we run the program now, we get a different error. This time the error points to the code in our test program:

@/home/storm/expressions/test.bs(130-136): Syntax error:
No appropriate constructor for core.Array(core.Int) found. Can not initialize
'result'. Expected signature: __init(core.Array(core.Int), void)

This error occurs because we have essentially transformed the map expression into an empty block. As such, our test code is now equivalent to:

Int[] values = [8, 10, 3];
Int[] result = {};
print(result.toS);

Since an empty block evaluates to void, it is clear why the error arises when the compiler attempts to initialize result. This just means that we need to fill our block with some useful code!

As a starting point, we can create some simple code using the pattern macro again and add that to our block by calling the add function:

init(SrcPos pos, Block parent) {
    init(pos, parent) {}

    add(pattern(this) {
            Array<Int> out;
            out << 9;
            out;
        });
}

For now, we create a block that creates an array (out), adds a single number to the array, and then returns it back to the caller. We then add this block to our MapExpr by calling the add function. If we run the code at this point, it will produce an array that only contains the number 9 without any errors.

To inspect the generated AST, we can simply print the MapExpr block at the end of the constructor by adding a print statement: print(this.toS). Currently, it produces the following output:

{
    {
        core.Array(core.Int) out(core.Array(core.Int)());
        out << 9;
        out;
    };
}

As we can see, we do indeed have two nested blocks. The outermost one is the MapExpr block, and the innermost one is the block inside the pattern that we added. We do not technically need two blocks like this, but it simplifies the implementation a bit and helps us hide internal variables.

Using the Input

The next step is to actually read data from the input array. We can do this similarly to the repeat statement. The first step is to capture the expression in the grammar and pass it as a variable to the MapExpr class. In the grammar, we simply bind the expression that corresponds to the source array to the variable src and pass it to MapExpr:

SExpr => MapExpr(pos, block, src)
  : "map" #keyword, "(", SExpr ~ "for" #keyword ~ SName #varName
  ~ "in" #keyword ~ SExpr(block) src, ")";

Then, we modify the MapExpr constructor to accept the new parameter and use it in our pattern:

init(SrcPos pos, Block parent, Expr src) {
    init(pos, parent) {}

    add(pattern(this) {
            Array<Int> original = ${src};
            Array<Int> out;
            for (Nat i = 0; i < original.count(); i++) {
                out << original[i];
            }
            out;
        });
}

Note that we start by saving the value of the expression ${src} in the variable original in the beginning of the pattern, rather than using ${src} throughout. Since ${src} inserts the expression src into all locations where it appear, using ${src} more than once would cause the expression in src to be evaluated more than once. Since we simply supply a variable in our example, this would be fine. However, if src were a more complex expression that potentially had side-effects (like function calls), this would cause our map expression to not behave according to the programmer's expectations. Assigning ${src} to a variable with the proper type has the additional benefit that Basic Storm will automatically perform any type conversions for us during the assignment, so we do not need the call to expectCastTo call that was required in the repeat syntax. We can still insert it if we wish to, as it does not do any damage.

The remainder of the pattern simply copies the contents of original to out using a loop. There are, of course, other ways to do this more concisely, but we will use this structure to actually perform transform each element later on.

If we run the program now, we will find that the program (unsurprisingly) produces a copy of our original array. Again, it is possible to print the MapExpr at the end of the constructor to see what the pattern actually expanded to:

{
    {
        core.Array(core.Int) original = values;
        core.Array(core.Int) out(core.Array(core.Int)());
        {
            core.Nat i = 0;
            for (i < core.ArrayBase.count(original); core.Nat.*++(i)) {
                out << core.Array(core.Int).[](original, i);
            };
        };
        out;
    };
}

In particular, we can see that ${src} was replaced by the variable values, since we used that variable in our test program. We can also see that the for loop actually expands to two blocks. One that declares the loop variable, and one that implements the remainder of the loop logic.

Transforming Values

The final step of our implementation is to use the second expression to actually transform the values. To do this, we need to update the grammar to bind the relevant parts of the match to variables and pass them to our MapExpr class:

SExpr => MapExpr(pos, block, src, var, tfm)
  : "map" #keyword, "(", SExpr(block) tfm ~ "for" #keyword ~ SName var #varName
  ~ "in" #keyword ~ SExpr(block) src, ")";

Then we modify the constructor of MapExpr so that it creates a variable with the specified name and adds it to itself:

init(SrcPos pos, Block parent, Expr src, SStr varName, Expr tfm) {
    init(pos, parent) {}

    Var local(this, named{Int}, varName, Actuals());
    add(local);

    // The remainder of the constructor is as before.
}

The class Var is a variable declaration statement. The constructor call corresponds to a declaration like Int x();. The first parameter is the block in which the variable shall be created, the second parameter is the type of the variable, the third is the name (and position, as a SStr), and the final parameter is the parameters to the constructor call. We wish to call the default constructor, so we pass a default-constructed Actuals object.

Creating the variable declaration in this manner will automatically create a LocalVar object that describes the actual variable and add it to the block passed as the first parameter. To properly initialize the variable we do, however, also need to add the variable declaration statement to the block. We do this using the add function as before. At this point, we can run our program and verify that it works as intended. If we print the MapExpr class, we can see that the implementation currently produces the following code:

{
    core.Int x;
    {
        core.Array(core.Int) original = values;
        core.Array(core.Int) out(core.Array(core.Int)());
        {
            core.Nat i = 0;
            for (i < core.ArrayBase.count(original); core.Nat.*++(i)) {
                out << core.Array(core.Int).[](original, i);
            };
        };
        out;
    };
}

That is, we declare the variable x in the outermost scope, just as we hoped!

So far, all seems well. Let's proceed to actually use the variable with the expression. We wish to produce a loop that looks like this:

for (Nat i = 0; i < original.count(); i++) {
    <varName> = original[i];
    out << <tfm>;
}

That is, we first set the variable we created to the element that we found at the current index of the array. After that, we can evaluate the tfm expression (which might use the variable) to produce the transformed value for our array.

The first part, the assignment, requires us to be able to access the variable. Simply adding ${varName} = original[i] will sadly not work, since the ${} syntax expects the expression that appears inside of it to evaluate to an Expr. Luckily, Basic Storm provides an expression that lets us access a variable. It is called LocalVarAccess, and we can simply give it a position and the variable we wish to access. For the second part, we can simply insert tfm using ${tfm} since tfm is already an Expr. This means that we change our pattern to the following:

add(pattern(this) {
        Array<Int> original = ${src};
        Array<Int> out;
        for (Nat i = 0; i < original.count(); i++) {
            ${LocalVarAccess(pos, local.var)} = original[i];
            out << ${tfm};
        }
        out;
    });

If we run the program at this point, we will get a surprising error:

@/home/storm/expressions/test.bs(143-144): Syntax error:
Can not find "x" with parameters ().

In an attempt to debug the issue, we can try to print the MapExpr at the end of the constructor once more. If we do this, we get the following output:

{
    core.Int x;
    {
        core.Array(core.Int) original = values;
        core.Array(core.Int) out(core.Array(core.Int)());
        {
            core.Nat i = 0;
            for (i < core.ArrayBase.count(original); core.Nat.*++(i)) {
                x = core.Array(core.Int).[](original, i);
                out << <x[invalid] + 2>;
            };
        };
        out;
    };
}

The output looks as we would expect, except for the last line inside the loop (out << <x[invalid] + 2>). The angle brackets (<>) are normal, they are simply used to clarify the precedence of operators in the textual representation. However, we would expect it to read out << <x + w>, not out << <x[invalid] + 2>. Why is x invalid in this location, when we could assign to x on the line before?

Fixing the Scope

To understand why the error occurs, we must understand how Basic Storm resolves names.

In general, name resolution in Basic Storm happens as soon as nodes in the AST are created. This is why it is necessary to pass a Block to the SExpr and SStmt rules in the Basic Storm syntax. Otherwise the system would not know how to resolve names!

As a sidenote, this might make you wonder why the error was not reported as soon as it was discovered. The reason for this is that some parts of Basic Storm need to be able to detect and re-try failed name lookups. One example of this is the = operator. For example, consider an assignment like a.b = c. However, a.b does not exist, so the = operator should attempt to lookup a.b(c) instead to see if b is an assignment function in this context. To achieve this, the = operator must be able to detect and recover from name resolution errors. With the current implementation, it can simply examine the AST node that corresponds to the left-hand side and see if it corresponds to an invalid name, and act accordingly. This would be more difficult if Basic Storm reported name resolution errors directly.

The upside of this behavior is that it makes it clear why the local variables we introduced inside the pattern are not visible to the tfm expression. Since we never use the block produced by the pattern as a parent block for anything, the variables in the pattern block will never be reachable by name lookups in the tfm expression, even after we make the variable in the MapExpr itself visible.

Regardless, we need to ensure that the names that appear inside the tfm expression are resolved in the proper scope. We can solve this in the grammar by passing the newly created MapExpr rather than the parent block to the tfm expression. Luckily, the MapExpr is bound to the me variable in the grammar, so we can just replace block with me as follows:

SExpr => MapExpr(pos, block, src, var, tfm)
  : "map" #keyword, "(", SExpr(me) tfm ~ "for" #keyword ~ SName var #varName
  ~ "in" #keyword ~ SExpr(block) src, ")";

Nowever, this introduces another issue. If we try to run the program now, we get the following error:

@/home/storm/expressions/map/syntax.bnf(89-247): Syntax error:
Can not use 'me' this early!

If we examine the grammar a bit more closely, we can see the problem. We have specified that in order to create tfm, the Syntax Language needs to create the contents of the variable me. However, to create the variable me requires calling MapExpr(pos, block, src, var, tfm), which involves having created tfm. As such, tfm must be created before me, and me must be created before tfm. That is not possible.

To solve the issue, we must allow the MapExpr to be created without the tfm expression. We can do this by creating another ExprBlock in the MapExpr constructor, and then later insert the tfm expression into that block. To do this, we modify the constructor to not require the tfm parameter:

use lang:bs;
use lang:bs:macro;
use core:lang;
use core:asm;

class MapExpr extends ExprBlock {
    private ExprBlock loopBody;

    init(SrcPos pos, Block parent, Expr src, SStr varName) {
        init(pos, parent) {
            loopBody(pos, parent);
        }

        Var local(this, named{Int}, varName, Actuals());
        add(local);

        add(pattern(this) {
                Array<Int> original = ${src};
                Array<Int> out;
                for (Nat i = 0; i < original.count(); i++) {
                    ${LocalVarAccess(pos, local.var)} = original[i];
                    out << ${loopBody};
                }
                out;
            });
    }
}

This change essentially replaces the usage of tfm with another ExprBlock called loopBody. We use the loop body as a placeholder to make it easy to insert the actual loop body later on.

Since the constructor of MapExpr no longer requres the tfm expression as a parameter, we can resolve the issue in the grammar by simply removing the last parameter to MapExpr (to make it read MapExpr(pos, block, src, var)). If we run the program now, it will produce another error:

@/home/storm/expressions/map/semantics.bs(491-493): Syntax error:
Can not find an implementation of the operator << for core.Array(core.Int)&, void.

To understand why the error occurred, we can once again print the generated syntax tree:

{
    core.Int x;
    {
        core.Array(core.Int) original = values;
        core.Array(core.Int) out(core.Array(core.Int)());
        {
            core.Nat i = 0;
            for (i < core.ArrayBase.count(original); core.Nat.*++(i)) {
                x = core.Array(core.Int).[](original, i);
                out << {
                };
            };
        };
        out;
    };
}

From this output, it is fairly easy to see the issue. The last line in the loop currently reads: out << {};. Again, since empty blocks evaluate to void, we are currently trying to insert a value of the type void into an array that stores integers. This is what the compiler complains about.

Luckily, it is fairly easy to fix the error. We simply need to insert the tfm expression into our empty block. We can do this conveniently by creating a member function in the MapExpr class that we call from the grammar. We call the function setTransform and implement it as follows:

void setTransform(Expr expr) {
    loopBody.add(expr);
}

We can then call the function from the grammar by replacing tfm with -> setTransform. This means that the syntax transform will call me.setTransform(<node>) when the variable me is created, which is exactly what we need. As such, we can change our production into the following:

SExpr => MapExpr(pos, block, src, var)
  : "map" #keyword, "(", SExpr(me) -> setTransform ~ "for" #keyword ~ SName var #varName
  ~ "in" #keyword ~ SExpr(block) src, ")";

And with that, the circular dependency is resolved while making sure that the transform expression is passed to the MapExpr in the end. To inspect the final tree, we can add a print statement to the end of the setTransform function. It produces the following output, which is what the compiler eventually turns into machine code:

{
    core.Int x;
    {
        core.Array(core.Int) original = values;
        core.Array(core.Int) out(core.Array(core.Int)());
        {
            core.Nat i = 0;
            for (i < core.ArrayBase.count(original); core.Nat.*++(i)) {
                x = core.Array(core.Int).[](original, i);
                out << {
                    x + 2;
                };
            };
        };
        out;
    };
}

As we can see, the expression x + 2 was inserted into the previously empty block, which caused the block evaluate to the correct type. Note that since we embedded the transform expression inside a block, it is necessary to use a call to expectCastTo to make sure that automatic type inference works properly in some cases. For simplicity, this was omitted from the code above.

For completeness, the final implementation of the MapExpr class is as follows:

use lang:bs;
use lang:bs:macro;
use core:lang;
use core:asm;

class MapExpr extends ExprBlock {
    private ExprBlock loopBody;

    init(SrcPos pos, Block parent, Expr src, SStr varName) {
        init(pos, parent) {
            loopBody(pos, parent);
        }

        Var local(this, named{Int}, varName, Actuals());
        add(local);

        add(pattern(this) {
                Array<Int> original = ${src};
                Array<Int> out;
                for (Nat i = 0; i < original.count(); i++) {
                    ${LocalVarAccess(pos, local.var)} = original[i];
                    out << ${loopBody};
                }
                out;
            });
    }

    void setTransform(Expr expr) {
        loopBody.add(expectCastTo(expr, named{Int}, scope));
    }
}