Text IO
Storm needs to read text files from the file system to parse source code among other tasks. As such, the standard library contains fairly competent facilities for reading and writing text files in various encodings.
As mentioned in the documentation for the string class, Storm internally works with unicode codepoints in UTF-32 (of course, the internal representation is more compact). For this reason, the task of the text IO system is to convert encodings to and from UTF-32 in order to read and write them to streams.
When reading a stream (e.g. a file), the encoding is determined by the first few bytes of the file. Storm assumes that text files are encoded in some form of unicode, unless instructed otherwise. If a Byte Order Mark (BOM) is found as the first character in the file, the encoding of the BOM determines the encoding of the entire file (this is commonplace in Windows). If no BOM is present, the system assumes that the file is encoded in UTF-8.
A BOM is the codepoint U+FEFF
. If encoded in UTF-8, it will be encoded as EF BB BF
. In UTF-16,
it is encoded as either FE FF
or FF FE
depending on whether big- or little- endian encoding was
used. In the standard decoding process the BOM is silently consumed before the decoded text is
passed on further. The users of the library will therefore typically never see any BOMs in the file.
The text library also handles differences in line endings between different systems. The text
decoding process converts line endings into \n
, and the output process converts them as
appropriate for the system.
Text Information
The class core.io.TextInfo
stores information about encoded text. It is used to specify
how the text output system should behave when encoding text:
-
init(core.io.TextInfo& other)
Copy constructor.
-
init()
Create the default configuration (all members set to false).
-
core.Bool useCrLf
Use windows-style line endings.
-
core.Bool useBom
Output a byte order mark first.
There are also functions that create TextInfo
instances that describe the default behavior on
various systems:
-
core.io.TextInfo sysTextInfo()
Create the default text information for the current system.
-
core.io.TextInfo unixTextInfo()
Create a text information that produces Unix-style line endings.
-
core.io.TextInfo windowsTextInfo()
Create a text information that procuces Windows-style line endings.
-
core.io.TextInfo windowsTextInfo(core.Bool bom)
Create a text information that produces Windows-style line endings, and specify whether a BOM should be outputted.
Text Input
The core interface for reading text from a stream is core.io.TextInput
. Derived classes
override the abstract function readChar
that the rest of the class uses to read single characters
and compose them into higher-level read operations. As mentioned above, this includes hiding the
byte order mark if required, converting line endings, etc.
Subclasses generally take a stream as an input source, and they implement their own buffering to
avoid excessive calls to IStream.read()
.
The core.io.TextInput
class has the following members:
-
core.Bool more()
Does the file contain any more data?
-
core.Char read()
Read a single character from the stream. Returns
Char(0)
on failure. -
core.Char peek()
Peek a single character. Returns
Char(0)
on failure. -
core.Str readLine()
Read an entire line from the file. Removes any line endings.
-
core.Str readAll()
Read the entire file into a string.
-
core.Char readRaw()
Read a single character from the stream without line-ending conversion. Returns
Char(0)
on failure. -
core.Str readAllRaw()
Read the entire file without any conversions of line endings (still ignores any BOM).
-
void close()
Close the underlying stream.
There are also a number of convenience functions that creates TextInput
instances, and reads text:
-
core.io.TextInput readText(core.io.IStream stream)
Create a text reader. Identifies the encoding automatically and creates an appropriate reader.
-
core.io.TextInput readText(core.io.Url file)
Create a text reader from an
Url
. Equivalent to callingreadText(file.read())
. -
core.io.TextInput readStr(core.Str from)
Create a text reader that reads data from a string. Utilizes
StrInput
. -
core.Str readAllText(core.io.Url file)
Read the text from a file into a string. Equivalent to calling
readText(file).readAll()
.
Text Output
Text output is handled by the interface core.io.TextOutput
. Output objects must
generally be created manually to select the output encoding. The next section provides an overview
of supported encodings. Creating the output object typically requires specifying a
core.io.TextInfo
object that determines how line endings should be handled, and if a BOM
should be emitted or not. Output streams generally buffer output to avoid many small writes to the
underlying streams. As such, applications that need control over when data is written to the stream
(e.g. networking) may need to call flush
explicitly.
The text output interface has the following members:
-
init()
Create. Outputs plain Unix line endings.
-
init(core.io.TextInfo info)
Create. Specify line endings.
-
core.Bool autoFlush
Automatic flush on newline? (on by default)
-
void write(core.Char c)
Write a character.
-
void write(core.Str s)
Write a string.
-
void writeLine(core.Str s)
Write a string, add any line endings.
-
void writeLine()
Write a new-line character.
-
void flush()
Flush all buffered output to the underlying stream.
-
void close()
Close the underlying stream.
Text Encodings
The following text encodings are supported by the system:
-
core.io.Utf8Input
Reading and decoding UTF-8 text.
-
core.io.Utf16Input
Read and decode UTF-16 text. Both little- and big endian are supported.
-
core.io.Utf8Output
Encoding and writing UTF-8 text.
-
core.io.Utf16Output
Encode and write UTF-16. Both little- and big endian are supported.