Strings
The standard library contains a type for strings, named core.Str. The types
core.Object and core.TObject both contain a member toS that creates a
string representation of objects. There is also a template function toS that converts values to
strings using the output operator for a string buffer. As such, most types can be converted to
string by typing <obj>.toS().
The Str type is immutable, meaning that it is not possible to modify a string once it has been
created. It provides an interface where it is possible to access individual codepoints without
specifying an internal representation of the string. Since the internal representation is not the
same as the representation that is exposed, the Str class does not provide indexed access to the
codepoints (this would not be efficient). Rather, it is necessary to utilize
iterators to refer to codepoints in the string. Since the iterators provide a +
operator, it is possible to step iterators a specific number of codepoints conveniently.
String Operations
The Str class contains many functions for inspecting and modifying the string. Below is a
selection of its functionality, categorized by theme:
Inspecting Content
-
core.Bool empty()Is the string empty?
-
core.Bool any()Does the string contain any characters?
-
core.Nat count()Count the number of characters in the string. This counts the number of steps the iterators would take when iterating through the representation. That is, this count represents the number of code points in the string.
-
core.Str.Iter begin()Begin and end.
-
core.Str.Iter end()Begin and end.
-
core.Bool ==(core.Str o)Equal to another string?
-
core.Bool <(core.Str o)Lexiographically less than another string?
Note that languages like Basic Storm automatically derives comparison operators from the ones that are provided.
Manipulation
-
core.Str +(core.Str o)Concatenate strings.
-
core.Str escape()Escape/unescape characters. Any unknown escape sequences are kept as they are. The parameters
extraandextraare additional characters that should be escaped/unescaped if present. -
core.Str unescape()Escape/unescape characters. Any unknown escape sequences are kept as they are. The parameters
extraandextraare additional characters that should be escaped/unescaped if present. -
core.Str unescapeKeepBackslash(core.Char extra)Version of
unescapethat keeps sequences of\\intact. This is useful when using thisunescapeas a first pass for other languages (e.g. regex where.and[also needs to be escaped at a later stage). -
core.Str toCrLf()Convert to/from cr-lf line endings. Returns the same string if possible.
-
core.Str fromCrLf()Convert to/from cr-lf line endings. Returns the same string if possible.
Substrings
The following operations extract and inspect substrings:
-
core.Str cut(core.Str.Iter from)Extract a substring, starting at
fromuntil the end of the string. -
core.Str cut(core.Str.Iter from, core.Str.Iter to)Extract a substring, starting at
fromuntil, but not including,to. -
core.Str remove(core.Str.Iter from, core.Str.Iter to)Remove characters from the middle of the string.
-
core.Bool startsWith(core.Str s)Does the string start with the string
s? -
core.Bool endsWith(core.Str s)Does the string end with the string
s? -
core.Bool contains(core.Str s)Does the string contain the substring
s? Note that the implementation is not necessarily efficient for long search strings. -
core.Str.Iter find(core.Char ch, core.Str.Iter start)Find a character in the string. Returns the first appearance of the character.
-
core.Str.Iter find(core.Str str, core.Str.Iter start)Find a substring in the string. Returns the first match. Note: this approach is not necessarily optimal for long search strings.
-
core.Str.Iter findLast(core.Char ch, core.Str.Iter last)Find the last occurrence of
chin the string. Note that 'last' is not examined. -
core.Str.Iter findLast(core.Str str, core.Str.Iter last)Find the last occurrence of
strin the string. Note that the match has to end beforelastif specified.
The second parameter to find and findLast is optional. If it is omitted, the search starts at
the start or end of the string respectively.
Conversion
The Str class contains functions for converting strings to numbers. As with other types, Str
provides members named int, nat, long, word, float, and double to convert a string into
another type. Since these conversions may fail, they all return Maybe<T> to indicate whether the
conversion was successful or not. In cases where the conversion is expected to succeed (e.g. when
the string originated from a match in a grammar), the Str class also provide functions toX where
X is one of the types Int, Nat, Long, Word, Float, or Double. These functions all
throw a core.StrError if the format is invalid.
For conversion from hexadecimal, Str provides the functions hexNat and hexWord. They work like
nat and word. Similarly, there are versions toHexNat and toHexWord that throw an exception
instead of returning Maybe<T>.
As usual, conversion from numbers to string can be done by calling toS on almost any type.
Other Utilities
There are also a few other utility functions provided:
-
core.Str removeIndentation(core.Str str)Remove the indentation from a string.
-
core.Str trimBlankLines(core.Str src)Remove leading and trailing empty lines from a string.
-
core.Str trimWhitespace(core.Str src)Strip whitespace from a string.
Characters
The type Char represents a single unicode codepoint. This is the type that is returned from the
iterators in the Str type, and as such what Str essentially contains (however, Str uses a more
compact internal representation).
The String Buffer
The string buffer, StrBuf, is a mutable string that is able to build strings efficiently. The
toS member function that exists for all types usually calls an overloaded version of toS that
accepts a core.StrBuf as a parameter. This makes it possible for objects to create their
string representation efficiently, rather than relying on string concatenation. For example, a toS
implementation for a simple class could look like below:
class MyClass { Int value; protected void toS(StrBuf to) : override { to << "My class: " << value; } }
As can be seen above, the StrBuf class utilizes the << operator to add strings to the end of the
string buffer. There is also a member add that can be used in languages where the << operator is
not available (e.g. the Syntax Language). The string buffer contains overloads for the primitive
types in the standard library.
The string buffer also has the ability to format output. The following formatting options are available:
-
core.StrFmt width(core.Nat width)Set the width of the next item that is outputted. It is filled with the character set by
fillto match the size if it is not already wide enough. -
core.StrFmt left()Left align the next output.
-
core.StrFmt left(core.Nat width)Left align the next output and specify a width.
-
core.StrFmt right()Right align the next output.
-
core.StrFmt right(core.Nat width)Right align the next output and specify a width.
-
core.StrFmt fill(core.Char fill)Set the fill character used to pad output.
-
core.StrFmt precision(core.Nat digits)Set precision of the floating point output without modifying the mode.
-
core.StrFmt significant(core.Nat digits)Output floating point numbers in decimal form with the specified number of significant digits, at maximum. This is the default.
-
core.StrFmt fixed(core.Nat decimals)Output floating point numbers in decimal form with the specified number of decimal places.
-
core.StrFmt scientific(core.Nat digits)Output floating point numbers in scientific notation with the specified number of decimal digits.
-
core.HexFmt hex(core.Byte v)Create hex formats.
-
core.HexFmt hex(core.Byte v, core.Nat digits)Create hex formats.
-
core.HexFmt hex(core.Nat v)Create hex formats.
-
core.HexFmt hex(core.Nat v, core.Nat digits)Create hex formats.
-
core.HexFmt hex(core.Word v)Create hex formats.
-
core.HexFmt hex(core.Word v, core.Nat digits)Create hex formats.
The StrBuf also contains the members indent and dedent that automatically indents the output
by one additional of the indentation string. The indentation string can be set by calling
indentBy. There is also a class Indent, that can be used to indent a StrBuf as long as the
Indent object is in scope.
