Protocol

The language server expects to communicate with a client using standard input and standard output. Because of this, it is the client's responsibility to start the language server as it desires. To allow regular use of standard input and standard output for debugging purposes, all data in the protocol starts with the null-byte. Everything else is considered UTF-8-encoded text which is to be shown to the user.

The protocol used in the language server is based on s-expressions in LISP. Therefore, it can represent four basic data types: integers, strings, symbols and cons-cells. A cons-cell is an object containing two members, usually referred to as car and cdr or first and rest. These members may refer to any instance of the data types in the protocol, including other cons-cells. They can therefore be used to form more complex data structures, such as singly linked lists. A singly linked list is implemented by creating one cons-cell for each element in the list. The list content is stored in the car or first member of each cons cell. cdr or rest is used to point to the next cons-cell in the list. The last element contains the symbol nil, which is the LISP equivalent of null in C++ or Java.

S-expressions use the same textual representations as in many LISP dialects, such as Emacs LISP. Strings and numbers are represented in the same way string- and numeric literals are represented in most programming languages. Symbols are similar to identifiers in programming languages, they are a sequence of non-whitespace characters that do not start with digits or a dash (since that is used for unary negation). Cons-cells are represented by two primitives enclosed in parenthesis and separated by a dot. (10 . 11) is a cons-cell containing the numbers 10 and 11. In this notation, a linked list containing two elements is represented as follows: (10 . (11 . nil)). This notation quickly becomes impractical for large lists. To remedy this, linked lists can also be written as a sequence of s-expressions separated by whitespace enclosed in parenthesis. The list above can then be written as (10 11), which is equivalent to (10 . (11. nil)).

The protocol is encoded in binary and based around messages. A message is a complete s-expression, usually a list of some sort. As mentioned earlier, messages always start with a null-byte. The null-byte is followed by the length of the message, in bytes, encoded as a 32-bit (= 4-bytes) integer in big-endian format. The length bytes following the size constitutes the message body, which contains a single s-expression. S-expressions always start with a single byte indicating the type of the stored element, followed by the actual data. The supported types are as follows:

For example, the message (a 10 a "b"), or equivalently (a . (10 . (a . ("b" . nil)))) would be encoded as follows:

0x00                     //       // Start of message.
0x00 0x00 0x00 0x1F      //       // Length of the message, 31 bytes.
0x01                     // (     // Start of a cons-cell.
 0x04                    // a     // A new symbol...
  0x00 0x00 0x00 0x01    //       //  with index 1...
  0x00 0x00 0x00 0x01    //       //  identifier length 1...
  0x61                   //       //  identifier content "a"
 0x01                    // . (   // Start of a cons-cell.
  0x02                   // 10    // A number...
   0x00 0x00 0x00 0x0A   //       // value 10
  0x01                   // . (   // Start of a cons-cell.
   0x05                  // a     // A previously seen symbol...
    0x00 0x00 0x00 0x01  //       //  with index 1
   0x01                  // . (   // Start of a cons-cell.
    0x03                 // "b"   // A string...
     0x00 0x00 0x00 0x01 //       //  with length 1...
     0x62                //       //  and content "b".
   0x00                  // . nil // The symbol nil.
                         // ))))  //

Messages

Using the protocol described above, it is possible to send s-expressions as messages from the language server to a text editor and vice versa. Both the language server and the editor may send messages at any time, but communication is usually initiated by the client. In the language server, messages are always lists where the first element is a symbol describing the message type.

The language server associates each open file with an integer identifier. This is decided by the client when a new file is opened and used to refer to that file in all further communication. If an identifier is reused, the previous file is closed. Each open file contains the following state: the complete contents of the file, the index of the last edit operation and an approximate location of the user's cursor. The index of the last edit operation is passed along in color messages, so that the client can determine which version of the file the language server is referring to. The edit number is initially set to zero, and is changed every time the client sends an edit message. The cursor location is maintained so that the language server is able to prioritize sending syntax coloring information for text visible to the user (i.e. close to the cursor).

The following messages can be sent from the text editor to the language server:

The following messages can be sent from the language server to the text editor:

The following colors are available to the language server:

The data inside the documentation is a list containing the following data: