Text IO

Storm generally reads input from text files in the file system. Storm provides tools for reading and converting text files to the internal character representation (currently UTF-16). Most languages will this standard mechanism to read their input, and will therefore accept text files as described here.

When reading a file (or a stream), the encoding is determined based on the first few bytes of the file. If a Byte Order Mark (BOM) is found as the first character of the file, the encoding of the BOM is assumed to represent the encoding of the entire file. If the BOM is not found, the file is assumed to be encoded in UTF-8 (even if your system codepage is something different). Currently, UTF-8 and UTF-16 (both little and big endian) are supported.

A BOM is the UTF codepoint U+FEFF. If encoded in UTF-8, it will be encoded into the following bytes: EF BB BF, UTF-16 uses either FE FF or FF FE based on endianness. In the standard decoding process, the BOM is silently consumed before the decoded text is passed on further, so the users of Storm's standard text input will not notice the presence or absence of a BOM.

The IO library

The IO library is located in the core.io package, and is based on streams. A stream is a raw byte-based data stream in either direction. Streams are implemented by IStream and OStream for input and output, respectively.

To read text, the TextReader and TextWriter classes can be used. These read binary data from a stream and converts it from the detected character set into UTF-32 characters or Str. To auto-detect the input encoding from a stream and create a TextReader based on the result, use the function readText. Currently there is nothing equivalent for output streams, but it will be possible to copy the format from a TextReader to get the same output format.

Another central part of the IO library is the Url class. An Url represents a file or resource somewhere. In Storm, an Url is a protocol, followed by a list of strings that makes up the path itself. The Url class keeps track if it is referring to a directory or a file, and indicates this by outputting directory names with a trailing /. The protocol is a class that tells the Url how to access files for paths relative to that protocol. Currently, Storm only implements the file:// protocol. If an Url does not have a protocol, it is assumed to represent a relative path. Relative paths can not be accessed directly, but must first be appended to some base Url. To create a Url from a string representing a path on your local machine, use the parsePath function.