Instructions
This page lists all instructions in the Intermediate Language. In general, all instructions operate
on up to two operands. There are, however, some exceptions to this as function call instructions
require some more metadata. Instructions have the form <instr> <dst>, <src>
in general, where
<dst>
is an operand that the instruction may write to, while <src>
is only read from.
Instructions that only accept one operand may read or write to the single operand.
The instruction set used is based on a subset of that used in Intel processors, but it has gradually been generalized to target ARM processors as well. There are also numerous pseudo-instructions for handling local variables, function calls, etc. in a platform independent way.
All instructions have a function accessible in Storm with the same name as the instruction mnemonic.
These instructions can then be appended to a Listing
using the <<
operator. This means that it
is possible to write fairly readable code in the Intermediate Language inside of Basic Storm. A
simple example is the following function that simply adds three to the parameter and returns the
result.
use core:asm; void main() { Listing l(false, intDesc); Var param = l.createParam(intDesc); // Start of the function: l << prolog(); // Load the parameter into eax: l << mov(eax, param); // Add 5 to eax: l << add(eax, intConst(5)); // Return and end the function. l << fnRet(eax); print(l.toS()); }
Instruction List
The remainder of this page lists all instructions and describes their semantics. They are listed based on the type of operation they perform.
Unless otherwise noted, all operands must have the size of one of the supported types (that is, bytes, integers, longs, or pointers). Furthermore, all operands to the instruction must be of the same size. Finally, only writable operands may appear in a position that is written to. Writable operands are registers, variables, and memory references. These constraints are checked when an instruction is created. The Intermediate Language throws an appropriate exception if they are violated.
Apart from these restrictions, the Intermediate Language places few restrictions on what operands are allowed. For example, there is no restriction that certain operands may not be in memory. Such constraints are handled by the code generation backends. Because the code is not analyzed immediately, some errors are not reported until the code is compiled to machine code.
Variables and Blocks
The following instructions are pseudo-instructions that denote the beginning and end of blocks in the generated code. As such, they do not correspond to actual machine instructions. However, they typically mean that the backend needs to generate some code to initialize and/or destroy variables. As such, it is not safe to assume that the values of any registers are preserved across any of these instructions. They must therefore be stored in variables or somewhere else in memory before executing one of these instructions.
Since these instructions are pseudo-instructions, the system also assumes that they are always
executed. For example, for all instructions that occur after a begin
instruction, the system
assumes that the begin
instruction has been executed. It is thus not advisable to use a
conditional jump to jump into the middle of a block, or to jump out from a block. For jumping out of
a block, the instruction jmpBlock
performs the necessary bookkeeping to achieve this.
-
prolog
Emits the function prolog. Also marks the beginning of the root block in the listing. As such, no variables or parameters are visible before the prolog instruction. This instruction is typically the first instruction in all functions.
-
epilog
Emits the function epilog. Also marks the end of the root block in the listing. As such, no variables are visible after the epilog.
It is typically not necessary to emit the epilog manually. This is done automatically by the
fnRet
instruction, that also handles any return values properly. -
begin <block>
Marks the start of
<block>
. This involves initializing any variables in the block to zero. The specified block must be a direct child of the block that was the topmost block in before thebegin
instruction. -
end <block>
Marks the end of
<block>
. This involves destroying any variables in the block that needs destruction. The specified<block>
must be the currently topmost block. After this instruction, the direct parent to<block>
will be the topmost active block instead. -
activate <variable>
Activates the exception handling of
<variable>
. Used to mark<variable>
as being properly initialized from that point and onwards. This operation does not affect the contents of any registers, it typically does not emit any actual instructions but only affects metadata.
Data Manipulation
These instructions manipulate data without being concerned about the type of data.
-
mov <dest>, <src>
Move (copy) data from
<src>
to<dest>
. -
lea <dest>, <src>
Compute the address (load effective address) of
<src>
and store it in<dest>
. As such<dest>
must always be pointer-sized, and<src>
may be of any size.If
<src>
is a reference, then the instruction extracts the associatedRefSource
. -
swap <a>, <b>
Swap contents of operands
<a>
and<b>
. This is the only instruction that writes to the second operand.
Integer Arithmetics
These instructions perform various arithmetic operations on integers. The Intermediate Language assumes that signed integers are represented as two's complement. Therefore, addition, subtraction and multiplication are the same regardless of whether or not they operate on signed numbers. The exception is division and modulo, which have separate instructions for signed and unsigned numbers.
-
or <dest>, <src>
Computes
<dest> | <src>
and stores the result in<dest>
. -
and <dest>, <src>
Computes
<dest> & <src>
and stores the result in<dest>
. -
xor <dest>, <src>
Computes
<dest> ^ <src>
and stores the result in<dest>
. -
not <dest>
Computes
~<dest>
(the bitwise inverse) and stores the result in<dest>
-
shl <dest>, <shift>
Computes
<dest> << <shift>
and stores the result in<dest>
. The behavior is not well-defined if<shift>
is larger than the number of bits in<dest>
. A CPU may then either compute<dest> << (<shift> % bits_in_dest)
, or<dest> << min(<shift>, bits_in_dest)
. -
shr <dest>, <shift>
Computes a right shift,
<dest> >> <shift>
. As withshl
, the behavior is not well-defined if<shift>
is larger than the number of bits in<dest>
. Theshr
is suitable when right-shifting unsigned numbers. -
sar <dest>, <shift>
Computes an arithmetic right shift,
<dest> >> <shift>
. An arithmetic shift assumes that<dest>
is unsigned, and duplicates the sign-bit for each step the number is shifted to the right. As withshl
, the behavior is not well-defined if<shift>
is larger than the number of bits in<dest>
. Theshr
is suitable when right-shifting unsigned numbers. -
add <dest>, <src>
Computes
<dest> + <src>
and stores the result in<dest>
. Due to the use of two's complement, this operation works regardless of whether signed or unsigned numbers are added. -
sub <dest>, <src>
Computes
<dest> - <src>
and stores the result in<dest>
. Due to the use of two's complement, this operation works regardless of whether signed or unsigned numbers are added. -
mul <dest>, <src>
Computes
<dest> * <src>
and stores the result in<dest>
. Due to the use of two's complement, this operation works regardless of whether signed or unsigned numbers are added. -
idiv <dest>, <src>
Computes
<dest> / <src>
and stores thre result in<dest>
. Assumes that both operands are signed integers. -
imod <dest>, <src>
Computes
<dest> % <src>
and stores thre result in<dest>
. Assumes that both operands are signed integers. -
udiv <dest>, <src>
Computes
<dest> / <src>
and stores thre result in<dest>
. Assumes that both operands are unsigned integers. -
umod <dest>, <src>
Computes
<dest> % <src>
and stores thre result in<dest>
. Assumes that both operands are unsigned integers.
Floating Point Arithmetics
The following instructions operate on floating point numbers. As such, their operand sizes must be either 4 (float) or 8 (double). In contrast to many architectures, floating point numbers are stored in general purpose registers or in memory. The code generation backends transform the code as needed for the particular platform in use.
-
fadd <dest>, <src>
Compute
<dest> + <src>
and store the result in<dest>
. -
fsub <dest>, <src>
Compute
<dest> - <src>
and store the result in<dest>
. -
fneg <dest>
Compute
-<dest>
and store the result in<dest>
. -
fmul <dest>, <src>
Compute
<dest> * <src>
and store the result in<dest>
. -
fdiv <dest>, <src>
Compute
<dest> / <src>
and store the result in<dest>
.
Type Conversions
To convert between registers and memory locations of different sizes, one of the cast
operations
can be used. As such, it is not necessary for <dest>
and <src>
to have the same size for these
instructions (they would be rather pointless if they were the same size).
There are not operations to convert between signed and unsigned numbers. This conversion is easily
achievable using regular bitwise operators. To cast from an unsigned integer to a signed integer, it
is sufficient to just resize the unsigned integer to a suitable size using the ucast
instruction.
Similarly, to convert from a signed to an unsigned integer it is enough to resize the signed integer
using icast
, and then possibly clear the sign bit using an and
instruction if desired.
-
icast <dest>, <src>
Convert a signed integer from the size of
<src>
to the size of<dest>
. The sign bit is extended or moved as required. -
ucast <dest>, <src>
Convert an unsigned integer from the size of
<src>
to the size of<dest>
. Any unused bits from<src>
are cleared as needed before the number is copied to<dest>
. -
fcast <dest>, <src>
Convert floating point numbers from the size of
<src>
to the size of<dest>
. That is, convert from float to double or vice versa. -
fcasti <dest>, <src>
Convert from a floating point number in
<src>
into a signed integer with the size of<dest>
. -
fcastu <dest>, <src>
Convert from a floating point number in
<src>
into an unsigned integer with the size of<dest>
. -
icastf <dest>, <src>
Convert from a signed integer in
<src>
to a floating point number with the size of<dest>
. -
ucastf <dest>, <src>
Convert from an unsigned integer in
<src>
to a floating point number with the size of<dest>
.
Control Flow
These instructions modify the control flow, either conditionally or unconditionally. As mentioned
previously, it is not possible to simply use the jmp
instruction to skip certain
pseudo-instructions, like begin
, end
, or activate
.
Note: the comparison instructions below affect the flags in the CPU. No other instructions in the
Intermediate Language are defined to update the flags. However, on some platforms, some of the
instructions do modify the flags anyway. As such, to use comparison instructions safely, they need
to appear immediately before the instruction that utilizes the flags (i.e. jmp
or setCond
).
-
test <lhs>, <rhs>
Compute the bitwise AND of
<lhs>
and<rhs>
, and set flags accordingly. The only tests that are defined to work after atest
instruction areifAlways
,ifNever
,ifEqual
, andifNotEqual
.ifEqual
is true if<lhs> & <rhs>
produced the number zero. -
cmp <lhs>, <rhs>
Compute
<lhs> - <rhs>
and set flags accordingly. Any of the flags, except for those dedicated to floating point numbers, can be used to interpret the result. -
fcmp <lhs>, <rhs>
Compute
<lhs> - <rhs>
and set flags accordingly. Only the flags for floating point numbers are guaranteed to produce relevant results. -
setCond <dest>, <cond>
Sets the byte
<dest>
to 1 if the condition in<cond>
is true. Otherwise, stores 0 in<dest>
. Must appear immediately after atest
,cmp
, orfcmp
instruction. -
jmp <label>, <cond>
Conditional jump. Jumps to
<label>
if the condition in<cond>
is true. Must appear immediately after atest
,cmp
, orfcmp
instruction. -
jmp <target>
Unconditional jump. Jumps to
<target>
, which may be either a label, a reference, or a pointer sized register. -
jmpBlock <label>, <block>
Unconditiona jump to
<label>
. Assumes that<block>
is active at the point jumped to, and performs necessary bookkeeping before performing the jump. Due to<block>
being specified, it is possible to jump out of blocks using this instruction. This is not possible using a regularjmp
instruction.
Function Calls
These pseudo-instructions are used to emit function calls in a platform-agnostic manner. These
instructions take a TypeDesc
that specify parameter- and return types in order for them to be able
to follow the calling convention on the target platform. As we shall see, some of them seemingly
take more than two operands, due to the required metadata.
Since these are pseudo-instructions, they can not appear everywhere. They must appear inside a
block. Furthermore, the fnParam
or fnParamRef
instructions that specify parameters must appear
immediately before the fnCall
or fnCallRef
instruction. The fnParam
instructions do not emit
any code on their own, they only store data that is later used by the fnCall
or fnCallRef
instruction. It is therefore not possible to conditionally include a parameter by using a
conditional jump to skip one of the fnParam
instructions.
Function calls may overwrite all registers. As such, all data in registers need to be saved to memory (e.g. in a variable or through a pointer) before the function call, and restored afterwards.
-
fnParam <type>, <src>
Specify a function parameter to an upcoming
fnCall
orfnCallRef
instruction. Parameters are specified left-to-right.<type>
is aTypeDesc
that describes the type of<src>
. -
fnParamRef <type>, <src>
Specify a function parameter to an upcoming
fnCall
orfnCallRef
instruction. Parameters are specified left-to-right.<src>
is a reference (a pointer) to the data, and must therefore be pointer sized.<type>
is aTypeDesc
that describes the type of the data pointed to by<src>
. -
fnCall <target>, <is member>
Perform a function call to
<target>
, which is typically a reference, with the parameters specified earlier.<is member>
is a boolean that specifies if the function is a member function or not. The function is assumed to returnvoid
. -
fnCall <target>, <is member>, <type>, <dest>
Perform a function call to
<target>
, which is typically a reference, with the parameters specified earlier.<is member>
is a boolean that specifies if the function is a member function or not. The function is assumed to return<type>
(aTypeDesc
). The return value is stored in<dest>
, which may be either a variable, a memory location, or a register. -
fnCallRef <target>, <is member>, <type>, <dest>
Like
fnCall
,fnCallRef
calls a function and handles the return value. The difference is that<dest>
is assumed to be a pointer to a location where the result should be stored. As such,<dest>
is not written, but instead used to read an address from. -
fnRet
Perform a return from the current function. Assumes that the function returns
void
(as specified to the constructor).As a part of the return, all variables in the current block(s) are destroyed as necessary. This pseudo-operation does not affect the current block after the instruction (since control flow will never proceed past the
fnRet
instruction). This means that it is possible to return from arbitrary positions in a function, and to jump across afnRet
instruction with a conditional jump. -
fnRet <src>
Perform a return from the current function, and return the value in
<src>
. The type of<src>
is assumed to match the type specified passed to the constructor of theListing
object.As with
fnRet
, it is possible to used this version to return from arbitrary points in the code. -
fnRetRef <src>
Perform a return from the current function, and return the value pointed to by
<src>
. The data pointed to by<src>
is assumed to match the type passed to the constructor of theListing
object.As with
fnRet
, it is possible to used this version to return from arbitrary points in the code.
Metadata
These pseudo-instructions provide metadata about the program. They may be used by debuggers, or in visualizations such as Progvis.
-
location <pos>
Indicates that the instructions from here and until the next
location
instruction were generated to implement the source code indicated by theSrcPos
object in<pos>
.
Low-level Instructions
These instructions are mainly used by the code generation backends to implement function prologs, function calls, etc. Using them directly may result in breaking variable access and/or exception handling.
-
nop
Inserts a no-op instruction. This is typically not necessary to do. Some of the backends need a no-op to correctly determine what variables to clean up around function calls.
-
dat <op>
Insert data at the current location in the code. This is used by the backends to create tables of metadata, or to spill large constants to memory. If execution ever reaches a
dat
literal, then the system will likely crash. -
align <offset>
Align the next instruction to the value in
<offset>
(typeOffset
). -
alignAs <size>
Align the next instruction according to the alignment in the
Size
object specified as<size>
. -
call <operand>
A low-level function call. This corresponds to the
call
instruction in the CPU, and as such it does not handle function parameters or return values. It is mainly used to implement thefnCall
instruction by the backends. -
ret <size>
A low-level function return. This corresponds to the
ret
instruction in the CPU, and it does not handle the calling convention or cleanup of local variables correctly. It is mainly used to implement thefnRet
instruction by the backends. The<size>
parameter is used to determine what size of thea
register to use when analyzing register usage. -
push <src>
Push an operand to the stack. Note, many backends require that the stack pointer is not modified inside functions. As such, using the
push
instruction likely breaks exception handling. Not all platforms support thepush
instruction. -
pop <dest>
Pop an operand from the stack. Note, many backends require that the stack pointer is not modified inside functions. As such, using the
pop
instruction likely breaks exception handling. Not all platforms support thepop
instruction. -
pushFlags
,popFlags
Push and pop the flags register to/from the stack. Note, many backends require that the stack pointer is not modified inside functions. As such, using these instructions likely breaks exception handling. Not all platforms support these instructions.
-
preserve <dest>, <register>
Emit metadata to record that register
<register>
has been preserved on the stack at the location indicated by<dest>
. If<dest>
is omitted, the register has been pushed on the stack. This pseudo-instruction only generates metadata used for exception handling.