Instructions

This page lists all instructions in the Intermediate Language. In general, all instructions operate on up to two operands. There are, however, some exceptions to this as function call instructions require some more metadata. Instructions have the form <instr> <dst>, <src> in general, where <dst> is an operand that the instruction may write to, while <src> is only read from. Instructions that only accept one operand may read or write to the single operand.

The instruction set used is based on a subset of that used in Intel processors, but it has gradually been generalized to target ARM processors as well. There are also numerous pseudo-instructions for handling local variables, function calls, etc. in a platform independent way.

All instructions have a function accessible in Storm with the same name as the instruction mnemonic. These instructions can then be appended to a Listing using the << operator. This means that it is possible to write fairly readable code in the Intermediate Language inside of Basic Storm. A simple example is the following function that simply adds three to the parameter and returns the result.

use core:asm;

void main() {
    Listing l(false, intDesc);

    Var param = l.createParam(intDesc);

    // Start of the function:
    l << prolog();

    // Load the parameter into eax:
    l << mov(eax, param);

    // Add 5 to eax:
    l << add(eax, intConst(5));

    // Return and end the function.
    l << fnRet(eax);

    print(l.toS());
}

Instruction List

The remainder of this page lists all instructions and describes their semantics. They are listed based on the type of operation they perform.

Unless otherwise noted, all operands must have the size of one of the supported types (that is, bytes, integers, longs, or pointers). Furthermore, all operands to the instruction must be of the same size. Finally, only writable operands may appear in a position that is written to. Writable operands are registers, variables, and memory references. These constraints are checked when an instruction is created. The Intermediate Language throws an appropriate exception if they are violated.

Apart from these restrictions, the Intermediate Language places few restrictions on what operands are allowed. For example, there is no restriction that certain operands may not be in memory. Such constraints are handled by the code generation backends. Because the code is not analyzed immediately, some errors are not reported until the code is compiled to machine code.

Variables and Blocks

The following instructions are pseudo-instructions that denote the beginning and end of blocks in the generated code. As such, they do not correspond to actual machine instructions. However, they typically mean that the backend needs to generate some code to initialize and/or destroy variables. As such, it is not safe to assume that the values of any registers are preserved across any of these instructions. They must therefore be stored in variables or somewhere else in memory before executing one of these instructions.

Since these instructions are pseudo-instructions, the system also assumes that they are always executed. For example, for all instructions that occur after a begin instruction, the system assumes that the begin instruction has been executed. It is thus not advisable to use a conditional jump to jump into the middle of a block, or to jump out from a block. For jumping out of a block, the instruction jmpBlock performs the necessary bookkeeping to achieve this.

Data Manipulation

These instructions manipulate data without being concerned about the type of data.

Integer Arithmetics

These instructions perform various arithmetic operations on integers. The Intermediate Language assumes that signed integers are represented as two's complement. Therefore, addition, subtraction and multiplication are the same regardless of whether or not they operate on signed numbers. The exception is division and modulo, which have separate instructions for signed and unsigned numbers.

Floating Point Arithmetics

The following instructions operate on floating point numbers. As such, their operand sizes must be either 4 (float) or 8 (double). In contrast to many architectures, floating point numbers are stored in general purpose registers or in memory. The code generation backends transform the code as needed for the particular platform in use.

Type Conversions

To convert between registers and memory locations of different sizes, one of the cast operations can be used. As such, it is not necessary for <dest> and <src> to have the same size for these instructions (they would be rather pointless if they were the same size).

There are not operations to convert between signed and unsigned numbers. This conversion is easily achievable using regular bitwise operators. To cast from an unsigned integer to a signed integer, it is sufficient to just resize the unsigned integer to a suitable size using the ucast instruction. Similarly, to convert from a signed to an unsigned integer it is enough to resize the signed integer using icast, and then possibly clear the sign bit using an and instruction if desired.

Control Flow

These instructions modify the control flow, either conditionally or unconditionally. As mentioned previously, it is not possible to simply use the jmp instruction to skip certain pseudo-instructions, like begin, end, or activate.

Note: the comparison instructions below affect the flags in the CPU. No other instructions in the Intermediate Language are defined to update the flags. However, on some platforms, some of the instructions do modify the flags anyway. As such, to use comparison instructions safely, they need to appear immediately before the instruction that utilizes the flags (i.e. jmp or setCond).

Function Calls

These pseudo-instructions are used to emit function calls in a platform-agnostic manner. These instructions take a TypeDesc that specify parameter- and return types in order for them to be able to follow the calling convention on the target platform. As we shall see, some of them seemingly take more than two operands, due to the required metadata.

Since these are pseudo-instructions, they can not appear everywhere. They must appear inside a block. Furthermore, the fnParam or fnParamRef instructions that specify parameters must appear immediately before the fnCall or fnCallRef instruction. The fnParam instructions do not emit any code on their own, they only store data that is later used by the fnCall or fnCallRef instruction. It is therefore not possible to conditionally include a parameter by using a conditional jump to skip one of the fnParam instructions.

Function calls may overwrite all registers. As such, all data in registers need to be saved to memory (e.g. in a variable or through a pointer) before the function call, and restored afterwards.

Metadata

These pseudo-instructions provide metadata about the program. They may be used by debuggers, or in visualizations such as Progvis.

Low-level Instructions

These instructions are mainly used by the code generation backends to implement function prologs, function calls, etc. Using them directly may result in breaking variable access and/or exception handling.