Text IO

Storm needs to read text files from the file system to parse source code among other tasks. As such, the standard library contains fairly competent facilities for reading and writing text files in various encodings.

As mentioned in the documentation for the string class, Storm internally works with unicode codepoints in UTF-32 (of course, the internal representation is more compact). For this reason, the task of the text IO system is to convert encodings to and from UTF-32 in order to read and write them to streams.

When reading a stream (e.g. a file), the encoding is determined by the first few bytes of the file. Storm assumes that text files are encoded in some form of unicode, unless instructed otherwise. If a Byte Order Mark (BOM) is found as the first character in the file, the encoding of the BOM determines the encoding of the entire file (this is commonplace in Windows). If no BOM is present, the system assumes that the file is encoded in UTF-8.

A BOM is the codepoint U+FEFF. If encoded in UTF-8, it will be encoded as EF BB BF. In UTF-16, it is encoded as either FE FF or FF FE depending on whether big- or little- endian encoding was used. In the standard decoding process the BOM is silently consumed before the decoded text is passed on further. The users of the library will therefore typically never see any BOMs in the file.

The text library also handles differences in line endings between different systems. The text decoding process converts line endings into \n, and the output process converts them as appropriate for the system.

Text Information

The class core.io.TextInfo stores information about encoded text. It is used to specify how the text output system should behave when encoding text:

init(core.io.TextInfo& other)
Copy constructor.
init()
Create the default configuration (all members set to false).
core.Bool useCrLf
Use windows-style line endings.
core.Bool useBom
Output a byte order mark first.

There are also functions that create TextInfo instances that describe the default behavior on various systems:

core.io.TextInfo sysTextInfo()
Create the default text information for the current system.
core.io.TextInfo unixTextInfo()
Create a text information that produces Unix-style line endings.
core.io.TextInfo windowsTextInfo()
Create a text information that procuces Windows-style line endings.
core.io.TextInfo windowsTextInfo(core.Bool bom)
Create a text information that produces Windows-style line endings, and specify whether a BOM should be outputted.

Text Input

The core interface for reading text from a stream is core.io.TextInput. Derived classes override the abstract function readChar that the rest of the class uses to read single characters and compose them into higher-level read operations. As mentioned above, this includes hiding the byte order mark if required, converting line endings, etc.

Subclasses generally take a stream as an input source, and they implement their own buffering to avoid excessive calls to IStream.read().

The core.io.TextInput class has the following members:

core.Bool more()
Does the file contain any more data?
core.Char read()
Read a single character from the stream. Returns Char(0) on failure.
core.Char peek()
Peek a single character. Returns Char(0) on failure.
core.Str readLine()
Read an entire line from the file. Removes any line endings.
core.Str readAll()
Read the entire file into a string.
core.Char readRaw()
Read a single character from the stream without line-ending conversion. Returns Char(0) on failure.
core.Str readAllRaw()
Read the entire file without any conversions of line endings (still ignores any BOM).
void close()
Close the underlying stream.

There are also a number of convenience functions that creates TextInput instances, and reads text:

core.io.TextInput readText(core.io.IStream stream)
Create a text reader. Identifies the encoding automatically and creates an appropriate reader.
core.io.TextInput readText(core.io.Url file)
Create a text reader from an Url. Equivalent to calling readText(file.read()).
core.io.TextInput readStr(core.Str from)
Create a text reader that reads data from a string. Utilizes StrInput.
core.Str readAllText(core.io.Url file)
Read the text from a file into a string. Equivalent to calling readText(file).readAll().

Text Output

Text output is handled by the interface core.io.TextOutput. Output objects must generally be created manually to select the output encoding. The next section provides an overview of supported encodings. Creating the output object typically requires specifying a core.io.TextInfo object that determines how line endings should be handled, and if a BOM should be emitted or not. Output streams generally buffer output to avoid many small writes to the underlying streams. As such, applications that need control over when data is written to the stream (e.g. networking) may need to call flush explicitly.

The text output interface has the following members:

init()
Create. Outputs plain Unix line endings.
init(core.io.TextInfo info)
Create. Specify line endings.
core.Bool autoFlush
Automatic flush on newline? (on by default)
void write(core.Char c)
Write a character.
void write(core.Str s)
Write a string.
void writeLine(core.Str s)
Write a string, add any line endings.
void writeLine()
Write a new-line character.
void flush()
Flush all buffered output to the underlying stream.
void close()
Close the underlying stream.

Text Encodings

The following text encodings are supported by the system:

core.io.Utf8Input
Reading and decoding UTF-8 text.
core.io.Utf16Input
Read and decode UTF-16 text. Both little- and big endian are supported.
core.io.Utf8Output
Encoding and writing UTF-8 text.
core.io.Utf16Output
Encode and write UTF-16. Both little- and big endian are supported.