Files and Streams

This tutorial explores how to handle files and other streams of data using Storm's standard library. The concepts covered here should be familiar to most programmers that have worked with languages like C++ and Java, although there are of course some differences.

The code presented in this tutorial is available in the directory root/tutorials/streams in the Storm release. You can run it by typing tutorials:streams:main in the Basic Storm interactive top-loop.

Setup

Before starting to write code, we need somewhere to work. For this tutorial we will create a directory where we can put our code and some text files that we will work with. Create a directory somewhere on your system. This tutorial assumes that you name it streams. If you wish to use another name you need to modify the names in the example accordingly.

Create a file main.bs inside the directory with the following contents as a starting point:

use core:io;

void main() {
}

After doing this, open a terminal and change to the directory where you created the directory streams. Then run the code by typing:

storm streams

If done correctly, Storm will exit without output since the main function was empty. Note that based on how you have installed Storm, you might need to modify the command-line slightly.

The Url Class

Storm uses the class core.io.Url to represent paths to files in the filesystem (and generic URL:s). The class represents the path as a protocol and a list of parts. This makes it easy to inspect and manipulate URLs programmatically.

The Url class has a default constructor that creates a representation of a relative path that refers to the current directory. We can do this as follows:

use core:io;

void main() {
    Url path;
    print(x.toS);
}

Running the program above (using storm streams) produces the output ./, which indicates that the Url is relative (it uses the "relative path" protocol). We can add parts to the Url using the / operator as follows:

use core:io;

void main() {
    Url path;
    path = path / "streams" / "input.txt";
    print(path.toS);
}

This program will print ./streams/input.txt. Again, the ./ at the start of the path simply indicates that the Url represents a relative path.

The Url class provides a number of operations for inspecting the path. For example, we can retrieve the name of the file referred to by the Url, with or without the extension:

use core:io;

void main() {
    Url path;
    path = path / "streams" / "input.txt";
    print("Path: ${path}");
    print("Name: ${path.name}");
    print("Title: ${path.title}");
    print("Extension: ${path.ext}");
    print("Parent: ${path.parent}");

    for (i, x in path) {
        print("Part ${i}: ${x}");
    }
}

The code above will print the following:

Path: ./streams/input.txt
Name: input.txt
Title: input
Extension: txt
Parent: ./streams/
Part 0: streams
Part 1: input.txt

It is worth noting that the Url class (or rather the relative protocol) does not support accessing the file system through relative paths. To actually interact with the file system, we first need to make the Url absolute. This has the additional benefit that the output format and comparisons will follow the appropriate conventions for the current operating system (e.g. case-insensitive comparisons on Windows).

To illustrate this, let's try to list the contents of the streams directory. We can do this using the children function in the Url class:

use core:io;

void main() {
    Url path = Url() / "streams";
    for (child in path.children()) {
        print("Child: ${child}");
    }
}

If we run the code above, we will get the following error since path is relative (followed by a stack-trace):

The operation 'children' is not supported by the protocol ./

To make path absolute, we can either start building path from an absolute Url, or making it absolute afterwards. In both of these cases we can use the function cwdUrl to retrieve an absolute Url for the current working directory:

use core:io;

void main() {
    // Either:
    Url path = cwdUrl() / "streams";
    // Or:
    Url path = Url() / "streams";
    path = path.makeAbsolute(cwdUrl());

    for (child in path.children()) {
        print(child.toS);
    }
}

With this modification, the program works and prints the name of the main.bs file, something like this:

/home/storm/streams/main.bs

Let's create some more contents in the streams directory to make the output more interesting. First, we create a directory inside streams that we call res (you can do this by running (cwdUrl() / "streams" / "res").createDir() in the code). Then we also create a file example.txt inside the streams directory with the following content:

Lorem ipsum dolor sit amet, consectetur adipisicing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

If we run the program now, it produces a warning, followed by the following output (the order may be different on your system):

/home/storm/streams/example.txt
/home/storm/streams/main.bs
/home/storm/streams/res/

As expected, we can see the file example.txt and the directory res that we created. It is worth noting that the directory res ends with a / to indicate that it is a directory. This information is stored in the Url class, and we can retrieve it using the dir member:

for (child in path.children()) {
    if (child.dir) {
        print("Dir:  ${child}");
    } else {
        print("File: ${child}");
    }
}

This change makes the program output the following:

File: /home/storm/streams/example.txt
File: /home/storm/streams/main.bs
Dir:  /home/storm/streams/res/

Now, let's turn our attention to the warning:

WARNING storm::Package::createReaders: No reader for [./streams/example.txt]
(should be lang.txt.reader(core.Array(core.io.Url), core.lang.Package))

It is produced since we instructed Storm to load the directory streams into the name tree. However, Storm does not know how to treat files with the extension .txt. As such, it informs us about this, and suggests how we could implement a language that handles .txt files.

Because of this, it is usually a good idea to store non-code resources that a program needs in a separate directory. This places them in a sub-package that is not compiled by default, which avoids the warning in most cases.

As such, to remove the warning, we simply move example.txt into res.

Accessing Files in the Name Tree

When a program requires some data files to function properly (e.g. images), it is convenient to place these files alongside the code. As mentioned above, it is a good idea to place such files in their own subdirectory (often called res), to avoid the warning from Storm.

The question that remains is, how do we reliably get the path of these files? One approach would be to create a function like below to get the Url to a file in the res directory:

Url resUrl(Str name) {
    return cwdUrl() / "streams" / "res" / name;
}

We can then use it as follows in the main function:

Url example = resUrl("example.txt");
print("Does 'example.txt' exist? ${example.exists}");

When we run the program, it should produce the following output:

Does 'example.txt' exist? true

However, since our starting point is the current working directory, it requires that the user starts the program from the right directory. For example, if we were to do the following, Storm loads the program successfully, but the program would fail to find the file:

cd streams
storm .

This still loads the package streams properly. However, the program prints:

Does 'example.txt' exist? false

A more robust way is to ask Storm where the package res is located and use that location instead. We can retrieve the representation of the package used by the compiler using the named{} macro in the package lang:bs:macro, and then get the package by calling url. However, since not all packages originate from the file system, this function may return null, which we need to handle. We can implement all of this as follows:

use core:io;
use lang:bs:macro;

Url resUrl(Str name) {
    if (url = named{res}.url) {
        return url / name;
    } else {
        throw InternalError("Expected the package 'res' to be non-virtual.");
    }
}

This version of the resUrl function will thus work correctly regardless of what the current working directory was when the user started Storm.

Reading Files

Now that we know how to find files in the file system, let's try reading the example.txt file we created before. We can get an input stream core.io.IStream for a file by calling the read() member of the Url class:

void main() {
    Url example = resUrl("example.txt");
    IStream input = example.read();
}

After opening the file and getting a stream to the file, we can read from the stream using either read or fill. The read functions has the same semantics as the read function in UNIX. That is, it is free to read fewer bytes than was requested, even if more data is available. The fill function, on the other hand, guarantees that it fills the buffer with data as long as the end of the stream is not reached. For this reason, the fill function is recommended since it is easier to use (the read function may, however, be necessary in some cases when working with sockets or pipes).

Both functions work with the core.io.Buffer type. It simply represents a sequence of bytes encapsulated in a convenient container. We can create an empty buffer and ask the stream to fill it as follows:

Buffer b = buffer(32);
b = input.fill(b);
print(b.toS);

input.close();

Note that even though Buffer is a value type, the underlying storage for the buffer is shared between instances. This means that it is not problematic to create large buffer and passing them around. The exception to this is, of course, that the buffer is copied whenever it crosses a thread boundary. This is why read and fill returns a buffer that is often the same as the one that was passed to the function: in case a thread switch was necessary, the original buffer will not be updated, and the one returned from the function has to be used. This is why the re-assignment of b is sometimes important.

The Buffer class contains a variable filled that indicates how much of the buffer is filled with data. Note that filled is just a marker that other parts of the system use to communicate what part of the buffer is valid. It is still possible to store data in all parts of the allocated space, regardless of the value of filled.

When we created the buffer with the call to buffer(32), filled is zero since the buffer is initially empty. The fill function then fills the buffer with data (starting at filled, in case it is non-zero), and fills as much as possible of the buffer with data as possible. We can see the result by observing that filled is updated to reflect the new state of the buffer. With this information, we can read the entire contents of the file as follows:

Buffer b = buffer(32);
do {
    b.filled = 0; // Reset from previous iteration.
    input.fill(b);
    print(b.toS + "\n");
} while (b.filled > 0); // Zero bytes read means end of stream.

input.close();

Note that we need to set filled to zero before calling fill the second time. Otherwise fill would conclude that the buffer is already full and not read any more data. It is worth noting that input has a member more that indicates if more data is available. It usually just keeps track of whether a fill or read operation has returned zero bytes previously, so it is just a convenience on top of the strategy used above.

When using UNIX line endings in the file, the program produces the following output:

00000000   4C  6F  72  65  6D  20  69  70  73  75  6D  20  64  6F  6C  6F
00000010   72  20  73  69  74  20  61  6D  65  74  2C  20  63  6F  6E  73
00000020 | 

00000000   65  63  74  65  74  75  72  20  61  64  69  70  69  73  69  63
00000010   69  6E  67  20  65  6C  69  74  2C  0A  73  65  64  20  64  6F
00000020 | 

00000000   20  65  69  75  73  6D  6F  64  20  74  65  6D  70  6F  72  20
00000010   69  6E  63  69  64  69  64  75  6E  74  20  75  74  20  6C  61
00000020 | 

00000000   62  6F  72  65  20  65  74  20  64  6F  6C  6F  72  65  20  6D
00000010   61  67  6E  61  20  61  6C  69  71  75  61  2E  0A| 20  6C  61
00000020   

00000000 | 62  6F  72  65  20  65  74  20  64  6F  6C  6F  72  65  20  6D
00000010   61  67  6E  61  20  61  6C  69  71  75  61  2E  0A  20  6C  61
00000020

The output shows that we needed to run the loop four times to read the entire file. Each output is a hex-dump of the contents of the Binary object. The numbers on the right is the hexadecimal offset from the start. The remainder of each line are individual bytes in the buffer, in hexadecimal.

The first time, we read 32 bytes successfully. We can see this by observing the location of the | character, that corresponds to the value of filled. Since the | is at the end of the output, we managed to fill the buffer entirely. The same is true for the next two buffer outputs. We can also see that the bytes are different the first three times, even though we re-used the buffer.

The fourth time we can see that something else happened. Here, the | is not at the end, but towards the end of the second line. We can see that the bytes before the | were updated, but the three last bytes are the same as in the previous iteration since they were not overwritten by the fill operation. In fact, since we used fill, this observation is enough to conclude that we have reached the end of the stream. This would not be the case if we had used read, since read is allowed to not fill the buffer fully, even if there is more data in the stream.

The last output shows a similar situation, but here fill read zero bytes, and therefore the | is before the first byte, and the contents of the buffer is unchanged.

At this point it is worth mentioning that both read and fill have overloads that creates and returns a buffer with the specified size (e.g. input.fill(32)). They are convenient when reading data once, but since they allocate new buffers all the time, they may be inefficient when working with large data.

Finally, it is worth noting that it is usually a good idea to work with larger buffer sizes than 32 bytes. Otherwise, the overhead from accessing the file system tends to be fairly large.

Reading Text

Since the file example.txt contains text, we would like to be able to interpret the contents of the file as text. Luckily, Storm provides storminfo:core.io.TextInput streams for this purpose. Once we have acquired an input stream from a file, we can call readText to create a suitable TextInput stream for us. The readText function will inspect the first few bytes of the stream, determine the encoding of the text, and then create a suitable subclass of TextInput to handle the encoding. The TextInput subclass also handles conversions of line endings as suitable.

We can do this as follows:

void main() {
    Url example = resUrl("example.txt");
    IStream input = example.read();
    TextInput text = readText(input); // or input.readText()
    while (text.more()) {
        print("Line: " + text.readLine());
    }
    text.close(); // Also closes 'input'.
}

This program prints the contents of the text file we created earlier. The TextInput class also contains the functions readAll that reads the entire file into a string if desired.

It is worth noting that the text input stream is buffered. That is, it attempts to read ahead from its input stream when possible. This is usually not a problem, as the stream will only read more data if it is readily available, and never wait for additional data to become available. However, it might cause issues if the IStream is used for other purposes in addition to being passed to the text stream. If it is necessary to extract a part of a stream as text, it is better to store that portion of the stream in a Buffer and use a MemIStream as a source for the TextInput in order to ensure that the TextInput does not read too many bytes in the input.

Finally, the system contains a convenience function readAllText that accepts a Url and reads the entire file as text into a string. As such, we could simplify the entire program above into:

Str text = resUrl("example.txt").readAllText();

Writing Files

The Url class also contains a member write that creates a core.io.OStream for a file. The file is created if it does not already exists, and truncates it if it exists. After creating the stream, we can use the write function to write a buffer to the stream. For example, the code below will write the character A to the file out.txt:

void main() {
    Url out = resUrl("out.txt");
    OStream output = out.write();

    Buffer b(2);
    b[0] = 0x41;
    b[1] = 0x0A;
    b.filled = 2; // Indicate that the buffer is filled with data.

    output.write(b);
    output.close();
}

Note that we need to set the buffer's filled to 2 in order to tell write to write the two bytes in the buffer. This makes write work well alongside read and/or fill of the IStream.

Writing Text

Similarly to reading text, we can use a storminfo:core.io.TextOutput class to encode text for us. However, since it is not possible to automatically detect character encoding in this case, we need to create the appropriate subclass ourselves.

To encode text into utf-8, we use the storminfo:core.io.Utf8Output class. To specify how line endings and byte-order-marks should be handled, we can pass it a TextInfo object that describes the configuration. It is a good idea to create the TextInfo object by calling sysTextInfo to get the default behavior for the current system.

In summary, we can write text to a file as follows:

void main() {
    Url out = resUrl("out.txt");
    OStream output = out.write();

    Utf8Output textOut(output, sysTextInfo());
    textOut.writeLine("Text from Storm!");
    textOut.writeLine("Another line");

    textOut.close();
}

When working with text output, it is worth noting that the text streams are buffered. That is, the output stream waits until it has gathered a bit of data before writing to the stream. This is usually not a problem since flushing happens automatically. However, if text streams are used over a socket or a pipe (e.g. for HTTP), it might be necessary to manually flush an output stream.

Standard Streams

It is possible to access standard input, standard output, and standard error of the Storm process as streams in Storm. They are available as:

core.io.IStream in()
core.io.OStream out()
core.io.OStream error()

The system also provides text streams for standard input, standard output, and standard error as:

core.io.TextInput stdIn()
core.io.TextOutput stdOut()
core.io.TextOutput stdError()

Actually, the print function is simply implemented as: stdOut.writeLine(...).

Furthermore, it is possible to replace the text streams if desired. This has the effect of redirecting the output from Storm code to the desired class. This makes it possible to redirect output to other places programmatically if desired. For example, the language server uses standard in and standard out to communicate with the editor, and therefore replaces the text streams to be able to forward them to the editor.