Sigma16 User Guide
Table of Contents
- Introduction
- Core architecture tutorials
- Standard architecture tutorials
- The Sigma16 architecture
- Instruction set
- Summary of instruction set
- Assembly language
- Object code and linker
- Programming
- Installation
- About Sigma16
- License
- A
- C
- R
Introduction
Sigma16 is a computer architecture designed for research and teaching in computer systems. This application provides a complete environment for experimenting with the architecture. It includes an editor, assembler, linker, emulator, a collection of examples, a user guide with tutorials, and an integrated development environment (IDE) with a graphical user interface.
Running Sigma16
Sigma16 runs in a web browser: you don't need to download or install anything.
To run Sigma16, visit the Sigma16 Home Page at https://jtod.github.io/home/Sigma16 and click the prominent Run link.
There are additional command line tools that can run in a shell; see the Installation section. There is also a digital circuit (available separately) that implements the architecture. Using these tools, machine language programs can run on both the emulator and the circuit.
Browser compatibility
Sigma16 Version 3.9 runs on up-to-date versions of browsers based on Chromium, including Opera, Chrome, Edge, DuckDuckGo. It does not work on Safari or Firefox.
Organization
The architecture is organized into subsets to make it easier to learn and to use:
- The Core subset has a small instruction set which is a good starting point for learning about computer architecture. Although simple, Core is powerful enough to support realistic programming.
- The Standard subset offers flexible programming techniques, including manipulation of bits, Boolean expressions, shifting, extracting fields, arithmetic on natural numbers and arbitrary precision integers, and concise support for stack frames. It also supports the study of systems programming, providing interrupts, protection, concurrent processes, mutual exclusion, and memory management.
For a quick start, begin with the Core tutorials, which introduce the architecture step by step. The tutorials explain the machine, show how to program it, and demonstrate how to enter and run a program and how to use the programming environment. A reference section follows the tutorials.
Our focus is on fundamental concepts, ideas and principles. Sigma16 illustrates the basic ideas of computer systems but it avoids unnecessary complexity. For example, Sigma16 has just one word size (16 bits) while most commercial machines provide a variety. That variety is useful for practical applications but it complicates many of the details while not adding new fundamental ideas. Most commercial computers that achieve success in the marketplace evolve through updates and extensions, and eventually become encrusted with complications that help support backward compatibility, leading to complexity.
Core architecture tutorials
The following short tutorials introduce the system; full details appear in later sections. You can keep the tutorials visible in the right panel while following along with the examples in the main panel.
Hello, world!
Let's begin by running a simple example program. For now, we focus only on how to use the software tools. You don't need to understand the example code yet. An explanation of the program and the Sigma16 architecture will come later.
To launch the app, visit the Sigma16 Home Page and click on the link to run it. Sigma16 runs in the browser; you don't need to download or install anything.
- Click Editor, then Hello, world!. This will enter a small assembly language program into the editor window. Later, we'll load some of the more complex example programs into the editor, and you can also modify a program or type in a new one from scratch. For now, don't worry about the content of the program.
- Click Assembler. The assembler translates programs in assembly language (the source program) to machine language (the object program). Assembly language is a human-readable notation, while machine language is what the computer can execute. When you first enter the Assembler tab, the assembly language text in the Editor window is copied over; this is the source program.
- Click the Assemble button. This will do the translation. The assembly listing is displayed: this shows the original source code along with the machine language, and any error messages. The Show object button displays the object code, which is the machine language program produced by the translation. The Show source button displays the original source code, and the Show Listing button displays the assembly listing again.
- For this simple example, we don't need the Linker, so you can skip it. The linker is needed for larger and more complex programs with multiple modules, or with external references, or that need relocation.
- Click Processor, which shows the main components of the computer architecture, including registers and memory. These components are explained later. For now, just note that this page is where you can run programs using the emulator.
- Still on the Processor page, click Boot. This reads the machine language program into the memory, and you can see it in the Memory display. The source code (the assembly language) appears in the bottom section. If the assembler produced any error messages, the program will not boot until you fix the errors.
Click Step. The processor executes a single instruction and displays the effects on the registers and memory: blue for using a value, and red for modifying it. The assembly listing shows the instruction that just executed by highlighting it in red. It also shows the instruction that will execute next by highlighting it in blue. This is just to make it easier to follow what is happening; the machine executes the machine language program, which is in memory, and it ignores the assembly language listing which is just a convenience to help you follow the program. The machine relies only on the registers and memory, and doesn't even "know" that the assembly listing exists.
There is a keyboard shortcut: after you have clicked Step once, you can press the space bar to execute the next instruction. Thus you can step through the program by clicking Step, and then pressing the space bar repeatedly.
- Click Step repeatedly to watch the program execute, instruction by instruction. When the program terminates, the small window labelled Emulator will display Halted.
- You can also run the program to completion, without having to click Step so many times. Click Boot again to get the machine back into the initial state. Now click Run, and the program will continue executing instructions until it halts.
To run the program slowly, click Step repeatedly. To run the program faster but without updating the display after each instruction, click Run. At any time you can click Pause to stop the processor, and you can resume execution with either Step or Run. Sometimes it's useful to let the processor run at full speed until it reaches a particular instruction, and then stop. This can be done by setting a breakpoint (described in the Breakpoint tutorial below).
To exit the app, just close the browser window or tab. This may put up a dialogue box warning that any unsaved data may be lost and asking you to confirm.
A quick tour
This tutorial introduces the main components of the architecture as well as the graphical user interface.
The main window contains two main sections. The largest area, on the left side, is the main working area. When the program launches, this will show the Welcome page. The user guide is on the right side. At the top is a row of buttons (Welcome, Examples, etc.). These select which page is displayed in the main working area.
It's convenient to see the main working area and the user guide side by side. Begin by resizing the entire window (bigger is better). Then you can change the amount of space given to the user guide by clicking the arrow symbols on the right side of the top button bar. These arrows will expand or shrink the user guide: the small arrows adjust by one pixel, the larger arrows by ten pixels. If you resize the entire browser window, Sigma16 will maintain the same relative sizes of the main working area and the user guide sections.
If your screen is small, and the main working area isn't big enough, click Hide User Guide and all the space will be made available to it. The button will change to Show User Guide.
You can also open the User Guide in a separate browser tab or window. The Welcome page contains a link to do this.
The main working area has several pages, with tab buttons at the top to switch between them:
- Welcome contains some introductory information and links.
- Examples contains a collection of assembly language programs organized by the architecture subset. Start with the Core examples.
- Modules shows a summary of all the files and modules you currently have open. It also provides buttons allowing you to open files on your computer, close them, and select one to work on.
- Editor shows the selected module, where it can be edited. You can assemble and execute the selected module. To run a program, you'll load it into the Editor (there are several ways to do this), then assemble it (Assembler tab) and then run it (Processor tab).
- Assembler translates a program from assembly language to machine language, and shows the assembly listing as well as the object code.
- Linker is used to combine a collection of object code modules into a single executable program. It also performs external name resolution and relocation. Programs that consist of just one module do not need the linker, and you can skip it.
- Processor shows the components of the architecture and executes machine language programs.
- Options allows you to configure how the system operates.
- About gives general information about the software, including the version.
- Hide User Guide makes the entire window available to the main working area. It is a toggle that changes to Show User Guide.
Registers, constants, and arithmetic
Programs do most of their work using the register file, which is an array of 16 registers named R0, R1, R2, …, R15. The Register File is displayed in a box on the Processor page.
A register is a circuit that can hold a number, and the elements of the register file can be used to hold variable values. They are analogous to the registers in a calculator: think of each register as a box that can hold a number, and think of the register name as a variable name. Two of the registers, R0 and R15, are special and should not be used to hold variables.
A computer program is a sequence of instructions. Instructions are similar to statements in a programming language, but they are simpler.
Sigma16 performs arithmetic on data in registers. To do any computation on some numbers, we first need to get those numbers into registers. The lea instruction. can be used to place a constant into a register. For example, to load 42 into register 3, write
lea R3,42[R0] ; R3 := 42
This is a statement in assembly language, and it describes one instruction. This statement contains three parts:
- The operation is lea. This tells the computer what action to perform, and "lea" says to put a value into a register. Later we will see why this instruction is called "lea".
- The operands are R2,42[R0]. R2 is the destination, where the value will be placed. The value to be put into the destination is 42. For now, ignore the [R0]; its purpose will be described later.
- Everything after the semicolon is a comment. This comment is a programming language statement that describes what the instruction does: it sets the variable R3 to 42.
The name lea is the operation, i.e. the name of the instruction. The operand field consists of two operands separated by a comma. The first operand, R2, is called the destination; this is the register where the result will be placed. The second operand is a constant 42 followed by [R0]. When the computer executes this instruction, it simply places the constant into the destination. In a higher level language, we could write R2 := 42.
Most instructions follow a similar pattern: the first operand is the destination where the result is placed, and the subsequent operands are the arguments to the computation. This is the same convention used in assignment statements in many programming languages: the registers in add R1,R2,R3 appear in the same order as the variables in R1 := R2 + R3.
All arithmetic operations take place in the registers, and there is a separate instruction for each operation. For example, the following instruction will add the values in R8 and R1 and then put the result into R4:
add R4,R8,R1 ; R4 := R8 + R1
Notice that the operand field doesn't use operators like := or +; those are specified by the operation field (lea, add). Each instruction performs only one operation. The operand field just separates the operands with commas. The first operand (R4 in this example) is the destination, which is where the result will be placed. The last two operands (R8 and R1) are the values that will be added.
To perform a calculation, we need to get the data into registers (using lea) and then perform the calculation (using arithmetic instructions). The following program calculates 3 + 4 and puts the result into R2:
lea R5,3[R0] ; R5 := 3 lea R8,4[R0] ; R8 := 4 add R2,R5,R8 ; R2 := R5 + R8 = 3+4 = 7
It's a good idea to use comments to explain the meaning of an instruction. For now, comments like "R4 := R8 + R1" will be used to show what the instruction does. That's useful while learning what the instructions do, but later on we will use comments to give more meaningful information (for example, what do the values in the registers mean, and why are we adding them?).
There are three more arithmetic instructions. These follow the same pattern as add: in each case, the arithmetic is performed on the last two registers and the result is placed in the destination (the first register):
add R4,R11,R6 ; R4 := R11 + R6 sub R5,R2,R13 ; R5 := R2 - R13 mul R2,R10,R7 ; R2 := R10 * R7 div R5,R6,R12 ; R5 := R6 / R12, R15 := R6 rem R12
The divide instruction is slightly different: it produces two results, the quotient and the remainder. The quotient is placed in the destination, and the remainder is automatically placed into R15, even though the instruction doesn't mention R15. If you write div R15,R1,R2, the quotient is placed in R15 and the remainder is discarded.
Normally an arithmetic instruction will put a new value into the destination register, but the operand registers are left unchanged. However, what happens if one of the operands is the same as the destination, for example add R7,R7,R8?
An arithmetic instruction proceeds in three phases: (1) obtain the values in the operand registers; (2) perform the arithmetic on those values; and (3) put the result into the destination, discarding whatever value was previously there. So consider this example:
lea R7,20[R0] ; R7 := 20 lea R8,30[R0] ; R8 := 30 add R7,R7,R8 ; R7 := R7 + R8
After the two lea instructions have executed, R7 contains 20 and R8 contains 30. The add instruction does the following:
- It fetches the values in R7 and R8, obtaining 20 and 30
- It adds the values, obtaining the result 50
- It puts the result 50 into the destination R7, discarding the previous value.
The final result is that R7 contains 50.
Constant data can be specified using either decimal or hexadecimal notation.
- Decimal numbers are written as strings of digits, optionally preceded by a minus sign: 3,-19, 42. Leading zeros are optional, so 23 and 0023 both mean the same (twenty three).
- Hexadecimal numbers are written as four hex digits, and in assembly language programs they are indicated by putting $ before the number. Thus $00a5 and 0165 both represent the integer 165.
lea R1,13[R0] ; R1 = 13 (hex 000d) lea R2,$002f[R0] ; R2 := 47 (hex 002f) lea R3,$0012[R0] ; R3 := 18 (hex 0012) lea R4,0012[R0] ; R4 := 12 (hex 000c)
The processor page shows numbers as hex without the leading $, but in an assembly language program the $ is needed to avoid ambiguity.
Sigma uses := as the assignment operator; thus we write R7 := R7 + R8 (and we don't write R7 = R7 + R8). This is because an assignment statement is profoundly different from an equation, and mathematicians have long used the = operator to indicate equations. It isn't just an academic or theoretical point; there have been plenty of occasions where computer programmers get confused between assignment and equality, and using the wrong operator leads to serious bugs. (A major security breach occurred because a C programmer forgot that in C, = does not mean equals.)
Why does assembly language use a notation like add R5,R2,R3 instead of R5 := R2 + R3? In short, every instruction will use a similar notation: a keyword for the operation, followed by the operands separated by commas. This notation is also related closely to the way instructions are represented in memory, which we'll see later.
An arithmetic instruction performs just one operation. Several instructions are needed to evaluate a larger expression. In general, you'll need a separate instruction for every operator that appears in an expression.
Example: calculate 6 + 2 * 4 and put the result into R10. We have to put the numbers into registers, using lea, and then perform the arithmetic. It doesn't matter which registers are used (as long as we avoid R0 and R15).
lea R1,6[R0] ; R1 := 6 lea R2,2[R0] ; R2 := 2 lea R3,4[R0] ; R3 := 4 mul R2,R2,R3 ; R2 := R2*R3 = 2*8 = 8 add R10,R1,R2 ; R10 := R1+R2 = 6+8 = 14 (hex 000e) trap R0,R0,R0 ; halt
This is nearly enough to constitute a complete program. Only one more thing is needed: a way to terminate the program when it finishes. There is a special instruction to do this: a trap instruction, where the first operand is R0, will stop the program.
trap R0,R0,R0 ; halt
Here is a complete program named ConstArith. It consists of
- Comments that identify the program
- Comments that say what it does
- The instructions
- A trap instruction to terminate execution
; ConstArith: Illustrate lea and arithmetic instructions. ; Use lea to load constants into registers, then do arithmetic. ; Sigma16: https://jtod.github.io/home/Sigma16/ John O'Donnell, 2025 ; Calculate 6 + 2 * 4 and put the result into R10 ; Use lea to put a constant into a register ; Use mul and add to do arithmetic lea R1,6[R0] ; R1 := 6 lea R2,2[R0] ; R2 := 2 lea R3,4[R0] ; R3 := 4 mul R2,R2,R3 ; R2 := R2*R3 = 2*8 = 8 add R10,R1,R2 ; R10 := R1+R2 = 6+8 = 14 (hex 000e) trap R0,R0,R0 ; halt
Running an example program
You can go to the Editor, and type it in (or copy and paste it in), but this program is part of the collection of examples built in to Sigma16. Here's how to run it:
- Go to the Examples page. Click Examples, then Core, then Simple. This brings up a list of small and simple example programs. Click ConstArith.asm.txt. You should see the listing of the program. The file name has an extension .asm.txt that identifies the file as an assembly language source program.
- Click Editor, and you should see the text of the program in the window.
- Go to the Assembler page and click Assemble.
- Go to the Processor page. Click Boot, then Step repeatedly and watch the effect of each instruction by observing how the registers are changed.
The Processor page shows numbers in hexadecimal. The add instruction puts 14 into R10, and this is displayed as hex 000e.
It's a good idea to step through the program slowly, rather than running it to completion at full speed. The emulator will show the next instruction to be executed, highlighted in blue. Think about what the instruction should do; in particular what changes to the registers will occur? Then click Step and check to see if the right thing happened.
Generally you can use any register you like (apart from R0 and R15), and the choices of registers in the previous examples are arbitrary. Registers R1 through R14 behave the same. However, two of the registers are different:
- R0 contains the constant 0 and it will never change. Any time an instruction uses R0, the value it gets will be 0. It is legal for an instruction to attempt to modify R0 (for example, add R0,R3,R4 is legal) but after executing this instruction R0 still contains 0. The reason for this is that we frequently need to have access to a register containing 0.
- R15 is used for two specific purposes. We have already seen the first: the divide instruction places the remainder into R15. The second purpose is that R15 contains the condition code, which is a word that contains a number of bits that provide some information about an instruction. For example, if an addition produces a result that is too large to fit in a register, a special flag indicating this is set in R15. Several of the instructions, particularly the arithmetic instructions, change the value of R15 as well as placing the result in the destination register. For this reason, R15 cannot be used to hold a variable since its value would be destroyed almost immediately.
To summarise, Registers R1 through R14 are all identical and can be used for variables. R0 contains 0 and will never change. R15 changes frequently and can be used to detect various error conditions and other information about an instruction.
Here is another example:
- Suppose we have variables a, b, c, d
- And suppose we already have these variables in registers
- Choose a register for each variable: R1=a, R2=b, R3=c, R4=d
- We wish to compute R5 = (a+b) * (c-d)
add R6,R1,R2 ; R6 := a + b sub R7,R3,R4 ; R7 := c - d mul R5,R6,R7 ; R5 := (a+b) * (c-d)
Summary.
- A lea instruction of the form lea Rd,const[R0] will put the constant into Rd. It can also be written as lea Rd,const.
- The general form of an arithmetic instruction is op d,a,b.
The meaning is Rd := Ra op Rb, and the fields are:
- op is one of add, sub, mul, div
- Rd is the destination register (where the result goes)
- Ra is the register containing the first operand
- Rb is the register containing the second operand
Keeping variables in memory
So far we have used registers in the register file to hold variables. However, there are only 16 of these, and two have special purposes (R0 and R15). That leaves only 14 registers for general use, and most programs need more than 14 variables.
To solve this problem, the computer contains another subsystem called the memory. The memory contains a sequence of memory locations, each of which can hold a word (16 bits). Each location is identifed by an address, which is a natural number \(0, 1, 2, 3, \ldots, 65535\). The notation M[a] denotes the value stored in the memory location with address a.
The register file and the memory serve different purposes:
- The register file is used to perform calculations. In computing something like x := (2*a + 3*b) / (x-1), all the arithmetic must be done using the register file. For example, the instruction add R1,R5,R3 has both operands and destination in registers, and this is true for all arithmetic instructions. The purpose of the registers is to provide fast access to a small number of variables while they are needed for calculations.
- There are only a few registers available, so we can't keep all the variables permanently in registers. The memory is much larger: it contains 65,536 locations so it can hold all the variables in a program. But the memory is slow, and it has a limitation: there is no instruction that can do arithmetic on data in the memory. The purpose of the memory is to hold large amounts of data for future use.
Normally, a program keeps its variables in memory, so a variable name refers to a memory location. The variable name just stands for the address of the location that contains the variable. This allows you to refer to a variable by a name (x, sum, count) rather than an address (003c, 0104, 00d7).
load and store
Since we need a lot of variables, they need to be kept in memory. But since we need to do arithmetic, and arithmetic can be performed only on data in registers, we adopt the following strategy:
- Keep variables permanently in memory.
- When you need to do arithmetic on some variables, copy them from memory to registers.
- When finished, copy the results from registers back to memory.
Two instructions are needed to do this:
- load copies a word from a memory location into a register. Suppose xyz is a variable in memory; then to copy its value into R2 we write load R2,xyz[R0].
- store copies a word from a register into a memory location. If R3 contains the result of some calculations, and we want to put it back into memory in a variable named result, we would write store R3,result[R0].
At this point we have enough instructions to write an assignment statement in assembly language. Typically we will first write an algorithm using higher level language notation, and then translate it into instructions. Translating from a high level language to assembly language is called "compiling".
Example: translate x := a+b+c into assembly language.
Solution:
load R1,a[R0] ; R1 := a load R2,b[R0] ; R2 := b add R3,R1,R2 ; R3 := a+b load R4,c[R0] ; R4 := c add R5,R3,R4 ; R5 := (a+b) + c store R5,x[R0] ; x := a+b+c
The data statement
The variables used in a program need to be defined and given an initial value. This is done with the data statement. The variable name comes first, and it must start at the beginning of the line (no space before it). Then comes the keyword data, followed by the initial value, which may be written in either decimal or hexadecimal. These pieces of the statement must be separated by white space.
For example, to define variables x, y, z and give them initial values:
x data 34 ; x is a variable with initial value 34 y data 9 ; y is initially 9 z data 0 ; z is initially 0 abc data $02c6 ; specify initial value as hex
The data statements should come after all the instructions in the program. This may look surprising: in some programming languages you have to declare your variables at the beginning, before using them. There is a good reason why we will put the instructions first, and the data statements after; but the reason will come later.
Here is a simple example of a complete program that uses load, store, and data statements, to implement z := x + y.
; Program Add. See README in top Sigma16 folder ; A minimal program that adds two integer variables ; Execution starts at location 0, where the first instruction will be ; placed when the program is executed. load R1,x[R0] ; R1 := x load R2,y[R0] ; R2 := y add R3,R1,R2 ; R3 := x + y store R3,z[R0] ; z := x + y trap R0,R0,R0 ; terminate ; Expected result: z = 37 (0025) ; Static variables are placed in memory after the program x data 23 y data 14 z data 0
Now you can run the program:
- Go to the Examples page. Click Core, then Simple, then Add.
- Click Editor, and you should see the text of the program in the window.
- Go to the Assembler page. Click Assemble.
- Go to the Processor page. Click Boot, then Step repeatedly and watch the effect of each instruction by observing how the registers and memory are changed.
The processor page shows the contents of the memory. For each memory location, the address is shown followed by the contents of that location. For example, if a line in the memory display shows 00b7 4c3f, this means that the memory location with address 00b7 contains 4c3f. This is written as M[00b7] = 4c3f.
Memory addresses and contents, as well as register contents, are shown in hexadecimal with no leading $. Since all these numbers are hexadecimal, putting $ in front (or 0x as in C) just adds distraction without telling you anything useful.
The processor displays two independent views into the memory. This is convenient because it's frequently necessary to look at two different portions of the memory at the same time. For example, it's helpful to look at the machine language code in one view and the data in the other view. With programs that use advanced techniques (e.g. pointers and activation records) it's essential to be able to view two different areas of memory. Despite the two views, there is just one memory.
Relationship between arrays, registers, memory
The register file and the memory are similar in many ways, and they are also similar to ordinary arrays in some programming languages. The essential difference is that registers are fast but there are only a few of them, whereas memory is large but slow.
array | reg file | memory | |
---|---|---|---|
Type of element: | declared | word | word |
Size of element: | declared | 16 bits | 16 bits |
Element notation: | a[i] | R3 | M[a] |
Element is called: | element | register | location |
Index is called: | index | reg number | address |
Range of index: | declared | 0..15 | 0..65535 |
Number of elements: | declared | small | large |
Access time: | slow | fast | slow |
Why does the computer have both registers and memory? After all, this makes programming a little more complicated. You have to keep track of which variables are currently in registers, and you have to use load and store instructions to copy data between the registers and memory. Wouldn't it be easier just to get rid of the distinction between registers and memory, and do all the arithmetic on memory?
Yes, this would be simpler, and there have been real computers like that. However, this approach makes the computer slower. With modern circuits, a computer without load and store instructions (where you do arithmetic on memory locations) would run approximately 100 times slower. So nearly all modern computers do arithmetic in registers, and use instructions like load and store to copy data back and forth between registers and memory.
Assembly language
The programs we have seen so far are written in assembly language. The machine itself executes programs in machine language, which is covered later. Assembly language is translated to machine language by a program called an assembler.
The purpose of assembly language is to give the programmer absolute control over the machine language program without having to remember lots of numeric addresses and codes. Assembly language is readable for humans, while machine language is executable by machines. For example, it is easier to remember the name "mul" for multiply than to remember the machine language code (which happens to be 3). Similarly, it's easier to remember the names of variables (x, y, sum, total) than the numeric addresses of the memory locations that hold these variables.
The syntax of assembly language is simple and rigid. Every statement must fit on one line of source code; you cannot have a statement that spans several lines, and you cannot have several statements on one line.
Sigma16 assembly language uses a small set of characters. Any character not on this list will generate an error message. A Sigma16 program can manipulate any 16-bit character, but the source assembly language code is restricted to this source character set. There are many characters that look similar but are actually distinct. For example, the minus sign, the hyphen, the en-dash, and the em-dash all look similar – you have to look closely to see the difference – but Sigma16 assembly language uses the minus sign, and the hyphens and dashes won't work.
These are the legal characters in an assembly language program:
- letters
- _abcdefghijk
- lmnopqrstuvwxyz
- ABCDEFGHIJKLMN
- OPQRSTUVWXYZ
- digits: 0123456789
- separators: space tab ,;
- quotes: " '
- punctuation: ".$[]()+-*
- other: ?<=>!%^&{}#~@:|/\'
Word processors often substitute characters. For example, when you type a minus sign in a paragraph of English text, word processors may replace the minus sign with a hyphen or dash, which is correct for typeset English but incorrect for assembly language. The Sigma16 editor will insert the correct characters, as will plain text editors.
Each statement has a rigid format that consists of up to four fields. The fields must be separated by one or more spaces, and a field cannot contain a space. Every field is optional, but if a field is missing then the following fields must also be missing, except for an optional comment. The fields are:
- label (optional) – If present, the label must begin in the first character of the line. If a line starts with a space, then there is no label field. A label has the same syntax as names or identifiers in many languages: it may contain letters, digits, underscores, and must begin with a letter. Both upper and lower case letters are allowed, and the syntax is case sensitive (Loop and LOOP and loop are three different labels).
- mnemonic – This is the name of the operation: load, lea, add, sub, etc. The mnemonic must be preceded by white space, and it must be the name of a valid instruction or assembler directive.
- operands field – the operands required by the type of statement. There are several formats possible for the operands field, depending on the instruction; these are detailed later. For example, for the add instruction the operand field must consist of three registers, separated by commas (e.g. R1,R2,R3). Spaces are not allowed in the operands field: R1,R2,R3 is fine but R1, R2, R3 is an error.
- comments – anything that follows the operands field, or anything that appears after a semicolon, is a comment. The semicolon is not required if the mnemonic and operands fields are present, but it is good practice to include it.
Here are some syntactically valid statements:
loop load R1,count[R0] ; R1 := count add R1,R1,R2 ; R1 := R1 + 1
Each of the following statements is wrong!
add R2, R8, R9 ; spaces in the operand field loop1 store x[R0],R5 ; wrong order: should be R5,x[R0] addemup ; invalid mnemonic loop2 load R1,x[R0] ; Space before the label
If you forget some detail, look at one of the example programs.
When the assembler is translating a program, it begins by looking at the spaces in order to split each statement into the four fields. This happens before it looks at the operation and operands. The assembly listing uses colors to indicate the different fields. If you get a syntax error message, the first thing to check is that the fields are what you intended. For example if you meant to say
add R1,R2,R3 ; x := a + b
but you have a spurious space, like this
add R1, R2,R3 ; x := a + b
the assembler will decide that the mnemonic is add, the operands field is "R1," and all the rest - "R2,R3 ; x := a + b" – is a comment, and the colors of the text in the assembly listing will show this clearly.
The rules above mean that a label must begin in the first character of a line, and also that if any non-space character appears at the beginning, it is taken to be a label. The following statement has a label "add", an operation "R1,R2,R3" (and of course there is no such instruction), and the operand field is ";". Again, the syntax highlighting will show the error clearly.
add R1,R2,R3 ; x := a + b
In assembly language, you can write constants in either decimal or hexadecimal.
- decimal: 50 -39
- hexadecimal: $003b
Examples:
lea R1,40[R0] ; R1 = 40 lea R2,$ffff[R0] ; R2 = -1 x data 25 y data $2c9e
There are two instruction formats in the Core architecture, which differ in the form of the operands:
- RRR instructions have an operand field containing three registers separated by commas. Example: add R8,R13,R6.
- RX instructions have an operand field that specifies a register and an address. The address is a name or constant, followed by a register. Examples: load R12,array[R6] and lea R5,23[R0].
It isn't enough just to get the assembler to accept your program without error messages. Your program should be clear and easy to read. This requires good style. Good style saves time writing the program and getting it to work, while aA sloppy program looks unprofessional and makes testing and debugging harder. Here are a few tips.
Write good comments. You should include good comments in all programs, regardless of language. Comments are especially important in assembly language, because the code tends to need more explanation. At the beginning of the program, use comments to identify the program and author and to say what it does. Use full line comments to describe in general what's going on, and put a comment on every instruction to explain what it's doing.
Indent your code consistently. Each field should be lined up vertically, like this:
load R1,three[R0] ; R1 = 3 load R2,x[R0] ; R2 = x mul R3,R1,R2 ; R3 = 3*x store R3,y[R0] ; y = 3*x trap R0,R0,R0 ; stop the program
Not haphazard like this:
load R1,three[R0] ; R1 = 3 load R2,x[R0] ; R2 = x mul R3,R1,R2 ; R3 = 3*x store R3,y[R0] ; y = 3*x trap R0,R0,R0 ; stop the program
The exact number of spaces each field is indented isn't important; what's important is to make the program neat and readable.
Spaces, not tabs. To indent your code, always use spaces – avoid tabs. In general, never use tabs except in the (rare) cases they are required (for example in makefiles). The tab character was introduced long ago into computer character sets to try to mimic the tab key on old mechanical typewriters. Unfortunately, software does not handle tab characters consistently. If you use tabs, your program can look good in one application and like a mess in another. It's easy to indent with spaces, and it works everywhere.
Some programming languages use indentation to indicate program structure. If some of that indentation is done with tabs, it could work on one computer but give syntax errors on another system. Even worse, it could compile but give different results as the program structure has changed. (Wrong results are worse than error messages.)
The Editor
To run an assembly language program, you first need to put it into the editor window. Then you can go to the Assembler page, assemble it and run it on the Processor page. There are several ways to enter a program into the editor window.
- Starter program. Editor tab. Click Hello World. This will load a small fixed example program into the editor buffer.
- Collection of examples. Examples tab. The Sigma15 app provides a collection of example programs. Click Core to see the examples that use only the core instruction set. Click Simple, then ConstArith. Go to the Editor tab and the example you selected will be in the window. All of these example programs are in the Sigma16 system, not on your local computer.
- Type it in. Editor tab. Type in a program. If there was already some text there, click New.
- Read from a file. Editor tab. Click Open File. This brings up a file chooser dialog where you can traverse the file system on your computer and select a file.
- Copy from file and paste. If your program is visible in another window on your computer, you can copy its text, then go to the Sigma16 editor, and paste. Normally it's better to use Open File, but if your browser has disabled file access the copy-paste method is ok.
You can edit the text in the Editor window:
- Click the arrow keys or the left mouse button to move the cursor.
- Click left mouse button and drag to select text.
- Click right mouse button and select Cut, Copy, Paste, Undo.
Save your work. After typing in a program, or editing it, you should save it to a file. Click Save As. This brings up a dialog where you can traverse your file system and provide a file name (which should end in .asm.txt).
If you leave the browser without saving the text in the editor window to a file, that text will be lost, so it's important to save your work regularly. If you don't do that, sooner or later the system will crash and you'll lose your data. When you close a browser tab that's running Sigma16, the browser may put up a message warning you that "Changes you made may not be saved". The browser can't tell whether the editor text needs to be saved.
The Sigma16 app will know what file is associated with your program if you have read it from a file (Editor: Open File) or if you have saved it to a specific location in your computer (Editor: Save As).
Once the app knows which file contains your program, you can edit it and save it again just by clicking Editor: Save. This is easier than Save As because you don't need to traverse the file system again.
You can also edit your assembly language program using an external text editor, rather than the simple editor in the Sigma16 app. If you have modified the file using an external editor, click Editor: Refresh to reload the file from your disk.
If you do use an external text editor, use a real text editor (emacs, vim, notepad++, or whatever you prefer), not a word processor. Word processors embed extra formatting data into the file, and the assembler can't handle that. Furthermore, word processors often change the characters you type. For example, they may change the minus character (-) into an en-dash. There are four different characters that look similar to a minus sign (minus, hyphen, en-dash, em-dash) and the assembly language only accepts the minus sign. If you get bad characters, the assembler will give an error message.
Jumps and conditionals
Conditionals allow a program to decide which statements to execute based on Boolean expressions. One example is the if-then statement, for example:
if x<y then statement 1 statement 2
A related form is the if-then-else statement:
if x<y then statement 1 else statement 2 statement 3
Most high level control constructs can be translated into code that contains just one form of conditional, the conditional goto. This uses a Boolean expression bexp to decide whether to jump to someLabel, or not to jump:
if bexp then goto someLabel
The commonest case is where bexp is a comparision between two integers:
if x < y then goto someLabel
This statement is implemented in assembly language in two steps:
- First, a comparison instruction is used to produce a Boolean result, which is placed in the condition code (which is in R15).
- Second, a conditional jump instruction will either jump or not jump, depending on the condition code. This allows a choice of what instruction to execute next.
The cmp instruction compares the integers in two registers, and it sets R15 to the result of the comparison. R15 is a special register because several instructions, including cmp, use it automatically without specifying R15 in the instruction. Reflecting its special status, R15 also has a name: it's called the condition code.
Any relational operation can be used; it isn't limited to less-than. You can think of the condition code as holding a result like "less than", or "equal", etc. The way these comparison results are represented will be covered later.
After setting the condition code with cmp, the program executes a conditional jump. The instruction jumplt someLabel[R0] will go to the instruction with the specified label if and only if the condition code indicates "less than" (lt).
Here is a high level language statement:
if x < y then goto someLabel
To implement this in assembly language, we use a cmp instruction followed by jumplt:
load R2,x[R0] ; R2 := x load R3,y[R0] ; R3 := y cmp R2,R3 ; compare x and y jumplt someLabel[R0] ; if x < y then goto someLabel xxx ; this executes if x >= y xxx xxx someLabel xxx ; this executes if x < y
The conditional jump instructions have the form jumpXX, where XX is a relation, such as lt, eq, and so on:
jumplt someLabel[R0] ; if < then goto someLabel jumple someLabel[R0] ; if <= then goto someLabel jumpeq someLabel[R0] ; if = then goto someLabel jumpne someLabel[R0] ; if != then goto someLabel jumpge someLabel[R0] ; if >= then goto someLabel jumpgt someLabel[R0] ; if > then goto someLabel
These conditional jumps treat the contents of the registers as integers represented in two's complement notation. This means, for example, that $ffff is less than 0, because $ffff represents -1. There are several more conditional jumps that you can use for comparing natural numbers (binary), and a few other things as well.
A compare instruction is often followed immediately by a conditional jump, as in the example above. You can also save the result of a comparison in a Boolean variable and perform logic on Boolean variables. The Boolean variables can later be used to control conditional jumps. This topic will be discussed later, as it involves instructions from the full standard instruction set.
The address in a jump instruction – the place to jump to – is normally specified as a label which is defined in the label field of some instruction. You can place a label in the same line as the instruction, or it can be on a line with nothing else, in which case the label refers to the next instruction. In the following code, label1 is the address of the add instruction and label 2 is the address of the sub instruction.
label1 add R2,R4,R13 label2 sub R15,R0,R1
If-then constructs are translated into assembly language following two similar fixed patterns. Suppose Bexp is a Boolean in any register Rd
if bexp then statement 1 statement 2
This is translated according to the following pattern:
if !bexp then goto L2 statement 1 L2: statement 2
Here is an example:
a := 93 x := 35 y := 71 if y > x then a := 59 b := 104
The corresponding assembly language is:
; a := 93 lea R1,93[R0] ; R1 := 93 store R1,a[R0] ; a := 93 ; x := 35 lea R1,35[R0] ; R1 := 35 store R1,x[R0] ; x := 35 ; y := 71 lea R1,71[R0] ; R1 := 71 store R1,x[R0] ; x := 71 ; if y > x load R1,y[R0] ; R1 := y load R2,x[R0] ; R2 := x cmp R1,R2 ; compare y with x jumple R3,skip[R0] ; if not y > x then goto skip ; then a := 59 lea R1,59[R0] ; R1 := 59 store R1,a[R0] ; a := 59 ; b := 104 skip lea R1,104[R0] ; R1 := 104 store R1,b[R0] ; b := 104
Notice the use of jumple: if the Boolean expression (y>x) is False we want to skip over the "then" part, so we want to jump if y <= x (hence jumple).
An if-then-else statement has a similar compilation pattern, but this time there are two separate parts: the "then-part" and the "else-part". Depending on the value of the Boolean expression, one of those parts should be executed and the other should be skipped over.
For if-then-else, and many other control constructs, we need an unconditional jump which will always go to the specified address, and which doesn't use a Boolean.
jump somewhere[R0] ; go to somewhere
The general form of an if-then-else is
if x < y then S1 else S2 S3
The general if-then-else construct can be translated to use just goto and conditional goto:
if x >= y then goto L2 S1 goto L3 L2: S2 L3: S3
Loops
Loops are implemented using compilation patterns based on comparisons and jumps. The fundamental form is the while loop.
while Bexp do S1 S2
The compilation pattern is:
L1 if not Bexp then goto L2 S2 goto L1 L2
Occasionally you may encounter an infinite loop, which is sometimes expressed as a while loop:
while true do S1
This doesn't need a Boolean expression; it is simply compiled into:
loop instructions for S1 jump loop[R0]
Infinite loops are rather rare, or at least they should be. On occasion they are exactly what is wanted. For example, operating systems contain a loop that looks for something useful to do, and then does it, and this should be an infinite loop.
However, there is a common but poor programming style that uses infinite loops with random break or goto statements to get out of the loop. This may be appropriate on occasion but generally it is bad style.
So far we have seen several compilation patterns:
- if-then
- if-then-else
- while
Every high level programming construct has a compilation pattern, and they are mostly built using comparisons and jumps. In principle, these patterns are straightforward to use. However, there are two issues that require a little care: uniqueness of labels and nested statements.
Labels must be unique: the same one cannot be used twice in the same program, and if it is, the assembler will give an error message. This means that you cannot follow the compilation patterns blindly. If you use "loop" as the label for a while loop, as in the pattern above, you need a different label for your next while loop.
The best approach here is not to use labels like loop, loop1, loop2. It's far better to think about the purpose of the construct in your program and to use a label that reflects this purpose.
Another complication is that most programs contain nested statements. These are statements that contain smaller statements, and the containment may go several levels deep.
if b1 then S1 if b2 then S2 else S3 S4 else S5; while b3 do S6 S7
There is an important principle to follow here: every time a statement appears in a compilation pattern (we have been calling them S1, S2, S3, etc.), it should be translated as a block.
A block is a sequence of instructions which always begins execution at the first instruction, and always finishes at the end. You never jump into the middle of it, and it never jumps out of the middle to some other place.
Every statement should be compiled into a block of code. This block may contain internal structure — it may contain several smaller blocks — but to execute it you should always begin at the beginning and it should always finish at the end.
In programming language theory, programming with blocks is often considered to be good practice or good style. But it is more than just an issue of style. If you always treat the statements inside compilation patterns as blocks, the patterns will "just work", no matter how deeply nested they are. If you violate the block structure, you will find it difficult to get the program to work.
Machine language
The bits representing an instruction (written in hex) (e.g 0d69) are machine language. The hardware runs the machine language — it's just looking at the numbers. The text notation with names – e.g. add R13,R6,R9 – is called assembly language. Assembly language is for humans to read and write; machine language is for machines to execute. Both languages specify the program in complete detail, down to the last bit.
As a program is running, the memory contains all your program's data: the variables, data structures, arrays, lists, etc. The memory also contains the machine language program itself. The program is stored inside the computer's main memory, along with the data. This concept is called the stored program computer.
There is an alternative approach: a computer can be designed to have one memory to hold the data, and a completely separate memory to hold the program. This approach is often used for special-purpose computers (primarily micro-controllers), but experience has shown this to be inferior for general purpose computers.
Sigma16 has several different kinds of instruction. These are called instruction formats. All the instructions with the same format have similar representations in machine language. The Sigma16 Core has two instruction formats:
- RRR instructions use the registers
- RX instructions use the memory
The machine language program is in the memory. Therefore we need to represent each instruction as a word that can be stored in memory. An instruction format is a systematic way to represent an instruction using one or more words (a word is a string of bits).
- An RRR instruction is represented in one word
- An RX instruction is represented in two words.
RRR instructions
Every RRR instruction is represented in one word (16 bits), which is organised as four 4-bit fields. When describing a machine word, we normally write each field value as a hexadecimal digit: one of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f. These correspond to 0, 1, …, 15 in decimal.
The machine language code needs to specify which RRR instruction this is. Is it add? sub? mul? another? This is done with an operation code — a number that says what the operation is. There are about a dozen RRR instructions, so a 4-bit operation code suffices. For example, the add instruction has an opcode of 0.
We also need to specify three registers: destination and two source operands. There are 16 registers, so a specific one can be specified by 4 bits. Thus the requirements for an RRR instruction are 4 fields, each containing 4 bits, for a total of 16 bits, so an RRR instruction exactly fills one word.
The four fields in an RRR instruction are named:
- op – Operation code (e.g. add, sub, etc.) This is a number that specifies which instruction it is.
- d – Destination register
- a – First source operand register
- b – Second source operand register
Here are the operation codes for the RRR instructions we have seen so far. There are a few more RRR instructions, but they all have the same form as these do.
mnemonic | opcode |
---|---|
add | 0 |
sub | 1 |
mul | 2 |
div | 3 |
trap | b |
Don't memorise this table: you just need to understand how it's used. For example, let's translate this assembly language statement into machine language:
mul R14,R6,R10
- mul is an RRR instruction, so we need to find the values of the four fields: op, d, a, b.
- The table above says that the opcode for mul is $2 (that's 0010 as a bit string).
- The destination is R14, so the d field is $e.
- The first operand is R6, so the a field is $6.
- The last operand is R10, so the b field is $a.
- Putting the fields together ("assembling them into an instruction"), we get the machine language instruction: $2e6a.
Every RRR instruction is assembled the same way. The numbers in the fields will be different, but the procedure is always exactly the same.
RX instructions
We have seen several RX instructions so far:
lea R5,19[R0] ; R5 = 19 load R1,x[R0] ; R1 = x store R3,z[R0] ; z = R3
An RX instruction contains two operands, separated by a comma.
- The first operand is a register, and it's called the "register operand" (e.g. R5)
- The second operand is a constant or label followed by a register
(e.g. 19[R0] or x[R0]). This is called the "address operand", and
it contains two parts:
- The first part is either a constant like 19, or a label like x. This part is called the "displacement", and its value is one word. If this part is a constant (e.g. 19) then the value of the displacement is simply the value of the constant. If it's a label like x, the value is the address in memory of x.
- The seocnd part is a register in square brackets, e.g. [R0]. This is called the "index register". In the examples we've seen so far, the index register has always been R0, but in general it can be any register, and this is useful for several purposes we'll see later.
For RRR instructions, the op field contains a number saying which RRR instruction it is. This is the primary opcode. For RX instructions, the op field always contains f. So how does the machine know which RX instruction it is? This is specified by a secondary code in the b field. Here are the full codes (op and b) for a few of the RX instructions:
mnemonic | op | b |
---|---|---|
lea | $f | $0 |
load | $f | $1 |
store | $f | $2 |
Consider this assembly language statement:
lea R2,38[R7]
There are two words in the machine language code. The first word has the same 4 fields as for RRR instructions: op, d, a, b, where
- op contains $f for every RX instruction
- d contains the register operand (in the example, $2)
- a contains the index register (in the example, $7)
- b contains a code indicating which RX instruction this is (1 means load). This "secondary opcode" is required because the op field just indicates that the instruction is RX format, but doesn't say which RX instruction it is.
The second word contains the displacement. In the example, this is 38 (or $0026).
If the address field of an instruction has a label (like x) rather than a constant (lik 204), we need to know the value of the label in order to assemble the instruction. If x is a variable, the value of the label x is the address of the variable in memory. If x is a label on an instruction, its value is the address of the instruction.
Example: translate load R7,x[R12] from assembly language to machine language, given that the address of x is $0c3a.
- First word:
- op = $f (indicates that this is an RX instruction, but not which one; that needs the b field)
- d = $7 (the register operand)
- a = $c (part of the address operand: the register [R12])
- b = $1 (this is the secondary opcode, which indicates that this RX instruction is load)
- The first word is $f701
- Second word:
- The displacement, part of the address operand, is $0c3a
- The complete machine language instruction: $f7c1 $0c3a
The instruction jump loop[R0] has only one operand in assembly language, which is loop[R0]. This looks like an exception: the other RX instructions have two operands. However, the machine language representation of jump also contains a register operand, just like all RX instructions, but the machine ignores that register when the program runs, and the assembler just sets it to R0 in the machine code. This is called a "don't care" operand.
Pseudoinstructions
We have seen conditional jump instructions like jumplt loop. Technically, jumplt, jumpeq and the rest are called "pseudoinstructions". They are just a convenient assembly language notations to describe the actual underlying machine language instructions. All conditional jumps are expressed in machine language using just two real instructions: jumpc0 and jumpc1:
jumpc0 Rd,disp[Ra] jumpc1 Rd,disp[Ra]
The details of how jumpc0 and jumpc1 work will be discussed in the section on the Standard architecture. Here is a list of the pseudoinstructions for conditional jumps after an integer comparison:
jumplt someLabel[R0] ; if < then goto someLabel jumple someLabel[R0] ; if <= then goto someLabel jumpeq someLabel[R0] ; if = then goto someLabel jumpne someLabel[R0] ; if != then goto someLabel jumpge someLabel[R0] ; if >= then goto someLabel jumpgt someLabel[R0] ; if > then goto someLabel
A strange program
Consider "Program Strange" below. You can find the program on the Examples page (go to the Core/Instructions section), or you can copy it from below and paste it into the Editor page.
This program doesn't compute anything particularly useful. It's rather strange and not a model for good programming style, but it illustrates an extremely important concept, which is discussed in the solution below.
The program has a variable y which is defined in the last line: y data 0. But you can edit that line to give y any initial value; it doesn't need to be 0.
The aim of this tutorial is to figure out what the program does, and to do so for several specific values of y. A solution is given below, but please try first to work it out for yourself.
The problem
Run the program with several different initial values of the variable y. These values are given below. For each run, assume that all the registers contain 0 after the program is booted, before it begins execution. For each value of y, do the following:
- Edit the last statement of the program to give y the specified value.
- Execute the program manually, with paper and pencil but without the computer. Give the final values of the registers. Think about what is going on as the program runs.
- Assemble the program and boot it in the processor. Run the program on the emulator and check whether your execution was correct. Be sure to single-step through the program. Before executing an instruction, think about what it will do. Then click Step and look at the registers to see if you were right.
Here are the different variations:
- Run the program in its original form, with y data 0
- Change the last line to y data 1 and run it again
- Now use y data 256
- y data 8192
- y data -5424
And here is the program:
; Strange: A Sigma16 program that is a bit strange load R1,y[R0] load R2,x[R0] add R2,R2,R1 store R2,x[R0] lea R3,3[R0] lea R4,4[R0] x add R5,R3,R3 add R0,R0,R7 trap R0,R0,R0 y data 0
Solution
It's best to try answering the questions on your own first, and then to check by running the program on the emulator, before reading the solution!
The program loads an instruction into a register, does arithmetic on it by adding y to it, and stores the result back into memory. This phenomenon is called self-modifying code, and it exploits the fact that instructions and data are held in the same memory (this is the stored program computer concept). The original instruction is add R5,R3,R3, and its machine language code is 0533.
- When y=0, the final values are: R1=0, R2=0533, R3=3, R4=4, R5=6. The only notable points are that the store instruction doesn't actually change the value of the word in memory (it was 0533 and 0533 is being stored there), and the last add instruction doesn't change the value in R0 because R0 can never change; it is always 0. (Of course if R7=0 then the result of the addition is 0 anyway.)
- When y=1, the final values are: R1=1, R2=0534, R3=3, R4=4, R5=7. Note that R5 is not 3+3=6. When y=1 is added to the instruction, the result is 0534 which means add R5,R3,R4, so instead of adding R3+R3 it adds R3+R4.
- When y=256, the final values are: R1=256=0100, R2=0633, R3=3, R4=4, R5=0, R6=6. The decimal number 256 is 0100 in hexadecimal. When this is added to the instruction, the result is 0633, which means add R6,R3,R3 so R3+R3 is loaded into R6, not into R5.
- When y=8192, the final values are: R1=4096=2000, R2=2533, R3=3, R4=4, R5=9. The decimal number 8192 is 2000 in hexadecimal, and when this is added to the instruction the result is 2533, which means mul R5,R3,R3. It's no longer an add instruction, it's a multiply instruction that calculates R5 := R3*R3 = 9.
- When y=-5424 the program goes into an infinite loop. R1=ead0 (the hexadecimal representation of -5424), R2=f003, R3=3, and R4=4. What started out as the add instruction at x has been transformed into jump 7[R0], comprising the word at x (f003) and the following word (which is 0007). This jump instruction goes back to the first lea instruction, and the program runs for ever (lea, lea, jump).
There is a lot to say about the phenomenon of self-modifying code.
This program shows clearly that a computer does not execute assembly language; it executes machine language. Try running it on the Sigma16 application (single step each instruction). You'll see that the assembly language statement add R5,R3,R3 is highlighted in red, but that is just the GUI trying to be helpful. What's important is that the machine language instruction is fetched from memory and loaded into ir (the instruction register), and that is not 0533. The machine decodes the contents of ir and does whatever that says to do; it isn't aware of the assembly language statement. Indeed, a machine doesn't even understand the concept of assembly language — everything is just bits!
To follow exactly what is happening in the emulator, it's important to look at the pc and ir registers. These reflect what the machine is doing. The assembly language does not.
What is self-modifying code good for? The answer lies in the early history of electronic computers. Early computers (late 1940s and early 1950s) did not use an effective address (i.e. displacement + index) like Sigma16; the instructions simply specified the absolute memory address of an operand. This is ok for simple variables, but how could they process arrays?
The solution was to use self modifying code. In a loop that traverses an array, there would be a load instruction using address 0. In the body of the loop, there would be instructions to calculate the address of x[i] by loading the address of x and adding i; this is then stored into the address field of the load instruction. That instruction is then executed, obtaining the value of x[i]. This technique became obsolete in the early 1950s with the invention of index registers and effective addresses.
The pioneers of computers considered the concept of the stored program computer (i.e. the program and data are in the same memory) to be fundamental and essential. One of the most important reasons was that it made arrays possible. Now we consider the stored program concept to be fundamental for different reasons.
Self modifying code is tricky, and difficult to debug. It makes programs hard to read: you can't rely on what the program says, but on what its instructions will become in the future. For these reasons, self modifying code is now considered to be bad programming practice.
If a program modifies itself, you can't have one copy of the program in memory and allow it to be shared by several users. For example, it's common now to have a web browser open with several tabs. Each tab is served by an independent process (a separate running instance of a program that updates the window showing the web page). If you have 5 tabs open, there are 5 processes, each running the same machine language code, and there's only one copy of that in memory. This wouldn't work if the program modified itself!
Self modifying code leads to security holes: if a hacker has the ability to change your machine language code in memory, they could make your own program act against you.
Modern computers use a technique called segmentation that prevents a program from modifying itself. This leads to increased reliability and security.
Some computers have a facility that allows you to gain the power of self modifying code without actually modifying the code in memory. The idea is to have an instruction execute R1,x[R0] which calculates the logical or of the two operands and then executes the result; x is the address of an instruction and R1 contains the modification to it. The modified instruction is executed, but there is no change to the machine code in memory. This idea was used in the IBM 360 and its successors. However, as the design of effective addresses has become more sophisticated, the execute instruction is rarely needed, and most modern computers don't provide it.
Breakpoints
When you are testing or debugging a program, you may need to execute many instructions before reaching the point you're interested in. Some programs execute thousands of instructions just to initialize. It's infeasible to step through all those instructions, yet if you just run at full speed you won't be able to see what's happening in the section you are working on.
The solution is to run the program at full speed but to force it to stop when it reaches a specific instruction. This is called a breakpoint. When the program stops at the breakpoint, you can examine the registers and memory, and step through instructions from that point. At any time you can click Run and full speed execution resumes, until either another breakpoint is encountered or the program terminates.
Both a breakpoint and a halt instruction (trap R0,R0,R0) will stop execution of the program. The difference is that after a breakpoint you can click Step or Run to continue, but after halt the program cannot execute any more instructions until you boot the processor again.
There are two ways to set a breakpoint:
- Trap break: Insert an instruction into the program that breaks execution at that point.
- External break: Define a breakpoint in the user interface, without modifying the program.
Both forms are useful. Most of the time, while debugging a program, a trap break is easier and more convenient. However, if you don't want to modify the program or reassemble it, or if you realise that you need a breakpoint after execution has already started, then an external break is better.
Trap break
A trap break is a trap instruction whose first operand register contains the value 4. The other operand registers are ignored. When this instruction is executed, the emulator will stop execution, and you can resume execution later by clicking Step or Run.
Suppose you want to check what the load instruction is doing in this code:
… add R1,R2,R3 load R4,x[R1] …
Insert a breakpoint just before the instruction you want to examine. The breakpoint requires two instructions. The first instruction loads the break code into some register (say R9 but it doesn't matter which), and the second instruction is a trap which performs the break. The first operand is the register that contains the break code, and the other two operands are ignored, so we can just use R0.
… add R1,R2,R3 lea R9,4 ; R9 := trap break code trap R9,R0,R0 ; breakpoint load R4,x[R1] …
Now you can run the program at full speed, but when it executes the trap instruction, the emulator will stop. Since the trap instruction has just executed, it will be highlighted in red, and the instruction you're interested in – the load – will be highlighted in blue. You can single step for a while, and click Run again at any time to resume full speed execution.
A common technique is to put a trap break at the beginning of a loop. By clicking Run repeatedly, you can step through the loop iterations.
For an example of a long running program with a trap break, see Examples / Core / Testing / Looper.
External break
An external break tells the emulator to perform a breakpoint without modifying the program. Use these steps to set an external break:
- Find the address of the instruction to stop at: look at the assembly listing, find the instruction, and the listing gives its address.
- Go to the processor page, click Boot and then click Breakpoint.
- A small window will appear; type in the breakpoint address. It must be a hexadecimal address in assembly language format: it must begin with a $ and then contain four hex digits. No other characters may be present, not even white space.
- Click Refresh. This parses the address you entered and remembers it. (If you change the address in the window, click Refresh again.)
- Click Enable. This turns on the breakpoint.
- Click CLose. The breakpoint popup window will disappear so you can see the Processor again.
Now click Run and the program will execute at full speed. When the pc register is equal to the breakpoint address, the emulator will stop. Then you can Step or Run to continue execution.
As long as the breakpoint is enabled, execution will stop every time that location is encountered. To prevent this, open the breakpoint popup again and click Disable.
Standard architecture tutorials
The standard architecture provides additional registers and instructions that simplify many programming tasks, such as logic, functions, and procedures. It supports systems programming, including emulators, interpreters, and operating system kernels. There are assembler directives that simplify large scale programming, modules, linking, and relocation.
Some of the instructions in the standard architecture are under development and may be changed in future versions.
Defining symbols with equ
Sometimes a program needs to use a constant. For example, you may need to refer to the bit in the condition code that indicates the "less than" condition, or you may need to use an unusual logic function that doesn't have a built-in pseudoinstruction. Both of these require special constants.
There are pseudoinstructions for the most common logic operations:
andb R2,R7,9,13 ; R2.9 := R2.9 and R7.13
But suppose you want to use a less common logic function (for example, nand) that doesn't have a pseudoinstruction. You can use the general logic instruction and specify the encoding of the logic function you need. The encoding for the nand function is 14, so you need this instruction:
logicb R2,R7,9,13,14 ; R2.9 := R2.9 nand R7.13
However, it's poor programming style to use magic constants like 14 in your program. It's easy to get them wrong, it makes the code harder to understand, someone else reading the program will wonder what the 14 means, and you might even forget it yourself.
The assembler provides the "equ" statement for defining symbols. The symbol you're defining goes in the label field. The operation is equ, and the operand is the value of the symbol. Thus you can write "nand equ 14" to define the symbol:
nand equ 14 ; logic code for nand
Now you can write nand instead of 14:
logicb R2,R7,9,13,nand ; R2.9 := R2.9 nand R7.13
An equ statement must appear before any use of the symbol that it defines.
This kind of statement, which doesn't generate an instruction but merely gives the assembler some additional information, is called a directive. It is not a machine instruction.
In a program that does lots of logic, you could define the logic functions near the beginning of the program:
; Symbolic names for common logic function codes invx equ 12 and equ 1 or equ 7 xor equ 6 nand equ 14 nor equ 8 xnor equ 9
There are many applications for the equ statement, besides defining logic function constants. You can give a symbolic name to condition code flags you need, and for flags in system control registers, and for the indices of fields in your application program. If you have size constants, such as the number of elements in an array or the size of a stack or heap, you can define them with equ.
If you want to load the value of an equ symbol into a register, use lea:
lea R3,nand[R0] ; R3 := code for nand logic function
Some notations for convenience
There are several shortcuts that can be useful for experienced programmers, especially when working on large programs. But when you're starting out, it's better not to use these techniques.
; before comment is optional if operand field exists
It isn't required to have a semicolon ; before every comment. If an assembly language statement has an operation field and an operand field, then everything after the operand field is taken to be a comment.
lea R3,42[R0] This is a comment load R5,xyz[R0] ; and so is this
Although most assembly language statements have an operand field, some do not. In these cases, the semicolon is required to indicate a comment.
It's good style to use a semicolon consistently to indicate a comment, even where it wouldn't be required.
[R0] is assumed if it's omitted
The instruction lea R3,42 is an abbreviation for the complete form, lea R3,42[R0]. In an RX instruction, if you omit the [Reg] after the displacement, the assembler will automatically use [R0]. The following two assembly language statements describe the same instruction:
lea R3,42[R0] ; R3 := 42 lea R3,42 ; R3 := 42
You can write the instruction either way; both are translated to exactly the same machine language, and they execute exactly the same way. These are also equivalent:
jumplt loop[R0] ; if < then goto loop jumplt loop ; if < then goto loop
Common logic operations
The Sigma16 architecture provides two general logic instructions: logicf operates on fields within a word, and logicb operates on individual bits. Those two general instructions implement all possible logic functions on two operands, not just inv, and, or, xor. See the Architecture section below for an explanation of logicf and logicb.
To simplify programming, Sigma16 also provides pseudoinstructions for commonly used logic operations on words and bits. These are described below, and they are sufficient for many applications. For each pseudoinstruction, the assembler generates an equivalent logicf or logicb instruction.
Word logic
Word logic operations are "bitwise": bit \(i\) of the result depends on bit \(i\) of each operand, and does not depend on bits at any other index. There are pseudoinstructions for the most common word logic operations: invw, andw, orw, xorw.
The invw pseudoinstruction inverts all the bits in a register:
lea R3,$f0f0[R0] ; R3 := f0f0 invw R3 ; R3 := 0f0f
The other word logic pseudoinstructions take two register operands. They calculate the result and place the result in the first operand register, overwriting its previous value. There is not a separate destination register, as in the add instruction.
For example, andw R1,R2 calculates the logical and of R1 and R2, and places the result in the destination register R1.
lea R1,$f0f0[R0] ; R1 := f0f0 lea R2,$00ff[R0] ; R2 := 00ff andw R1,R2 ; R1 := 00f0
The orw (logical inclusive or) and xorw (logical exclusive or) instructions are similar.
lea R1,$f0f0[R0] ; R1 := f0f0 lea R2,$00ff[R0] ; R2 := 00ff orw R1,R2 ; R1 := f0ff xorw R1,R2 ; R1 := f000
Bit logic
There are also pseudoinstructions that perform logic on two individual bits, which may be at any index in any register. The result overwrites the first operand. For example, andb R1,R2,3,9 calculates R1.3 and R2.9 and stores the result in R1.3. Here are some examples:
lea R1,$00ff[R0] ; R1 := 00ff lea R2,$00f0[R0] ; R2 := 00f0 andb R1,R2,3,9 ; R1 := 00f7 ; (R1.3 := R1.3 and R2.9 = 1 and 0 = 0) orb R1,R2,11,5 ; R1 := 08f7 ; (R1.11 := R1.11 or R2.5 = 0 or 1 = 1) xorb R1,R2,4,6 ; R1 := 08e7 ; (R1.4 := R1.4 xor R2.6 = 1 xor 1 = 0)
Bit manipulation
There are several instructions that treat a word as just a string of bits, not as a number. Some of these operate on individual bits in a word, some operate on fields within a word, and some operate on the entire word.
Accessing individual bits
You can set an individual bit in a register at a specified index to 0 using clearb. The first operand is a register, and the second operand is the index of the bit. All other bits in the register remain unchanged. So clearb R3,2 makes R3.2 zero (that is, the bit at index 2 in R3). All the other bits in R3 remain unchanged.
lea R3,$ffff[R0] ; R3 := ffff clearb R3,2 ; R3 := fffb (R3.2 := 0)
setb is similar, but it makes the specified bit 1.
lea R3,$ffff[R0] ; R3 := ffff clearb R3,2 ; R3 := fffb (R3.2 := 0) add R4,R0,R0 ; R4 := 0000 setb R4,6 ; R4 := 0040 (R4.6 := 1)
Shifting
There are two shift instructions, which shift the word in Re by f bits (to left or right) and place the result into Rd. These are logical shifts: the bits that are shifted out are discarded, and bits shifted into the word are always 0.
shiftl Rd,Re,f ; Rd := Re shifted left by f bits shiftr Rd,Re,f ; Rd := Re shifted right by f bits
Here are some examples; the comments show the expected result in both hexadecimal and as bits.
lea R1,2[R0] ; R2 = 0002 0000 0000 0000 0010 shiftl R5,R1,4 ; R5 = 0020 0000 0000 0010 0000 shiftl R6,R1,13 ; R6 = 4000 0100 0000 0000 0000 shiftr R7,R6,3 ; R7 = 0800 0000 1000 0000 0000 shiftr R7,R7,11 ; R7 = 0001 0000 0000 0000 0001
Extracting fields
A common situation, especially in systems programming, is that you have a word that contains several fields. A machine language instruction is like that; also a network packet contains a number of fields. The sizes of the fields, and their positions within a word, can be anything. Software, especially systems programming, often needs to be able to extract a field from a word and put it somwhere else so it can be used.
It's possible to obtain any field from a register, and put it somewhere in another register, using a sequence of shift and logic instructions (particularly shiftl, shiftr, andw, orw). However, these sequences of instructions are error prone, hard to read, and they execute frequently in the inner loop of many systems programs.
Sigma16 provides an instruction extract which solves this problem with just one instruction. It allows you to extract an arbitrary field from a register and put it into an arbitrary location in another register.
The extract instruction takes operands that specify the source register (which contains the field you're interested in), the destination register (where it will put the field), the location of the field in the source register (specifed as the index of the rightmost and leftmost bits in the field), and the location where you want to put the field in the destination register.
This instruction copies 4 bits from R1.4:7 to R2:8:11:
extract R2,R1,8,4,7 ; R2.8:11 := R1.4:7
In this example, the operands are:
- R2 is the destination register
- R1 is the source register
- 8 is the index of the rightmost bit in the destination where the field will be placed at ascending bit indices
- 4 is the index of the rightmost bit of the field in the source register
- 7 is the index of the leftmost bit of the field in the source register
This instruction is equivalent to the following (although it is just one instruction, not four instructions):
R2.8 := R1.4 R2.9 := R1.5 R2.10 := R1.6 R2.11 := R1.7
The extract instruction modifies only the specified bits in the destination register; the other bits remain unchanged.
Here is an executable example:
lea R1,$00f0[R0] ; R1 = 00f0 add R2,R0,R0 ; R2 = 0000 extract R2,R1,8,4,7 ; R2 = 0f00 R2.8:11 := R1.4:7
Saving and restoring contiguous registers
There are several situations where a program needs to store a sequence of registers into consecutive memory locations. Later the registers must be reloaded from the block of memory. One example is a procedure call; another example is gaining access to all the fields of a record. These operations are supported by the save and restore instructions.
The following save instruction stores R3, R4, …, R9 into consecutive memory locations, starting at 5[R14]:
save R3,R9,5[R14] ; save R3 to R9
The save is equivalent to a sequence of individual store instructions:
store R3,5[R14] store R4,6[R14] store R5,7[R14] store R6,8[R14] store R7,9[R14] store R8,10[R14] store R9,11[R14]
To get the values from memory back into the registers, use this restore instruction:
restore R3,R9,5[R14] ; restore R3 to R9
The restore is equivalent to a sequence of individual load instructions:
load R3,5[R14] load R4,6[R14] load R5,7[R14] load R6,8[R14] load R7,9[R14] load R8,10[R14] load R9,11[R14]
Using save and restore makes the code shorter and more readable, and it's easier to check that the save an restore match each other.
The Sigma16 architecture
Sigma16 contains a set of registers, an arithmetic logic unit (ALU) and functional units for arithmetic calculations, a memory, a memory management unit that provides virtual memory, an interrupt system, and an Input/Output controller using direct memory access (DMA).
Implementations
The Sigma16 software application contains a complete programming environment, using emulation to implement the processor. The programming environment includes a file manager, editor, assembler, linker, and emulator, and it provides Input/Output and secondary storage. The software runs in a web browser, and does not require any installation. There are also command line tools which can be installed on a local computer, but they are not necessary.
There is also a digital circuit, specified in a functional hardware description language, which implements the Core architecture. The circuit specification is executable, and Core programs can be executed by simulating the circuit. The circuit is suitable for implementation using either an FPGA or a custom VLSI design. In simulation, the simulation driver provides Input/Output, and in a hardware realisation these would be provided by I/O hardware.
Subsystems
A register is a digital circuit that can retain one word of data. A new value can be loaded into a register, and the current contents may be read out. Registers are fast, and most computation is performed using the registers. Sigma16 contains several groups of registers; each group is displayed in a box on the Processor tab.
- The Register File is an array of 16 registers named R0, R1, \(\ldots\), R15. These registers are accessible to the machine language program. Programs use the register file to hold variables that are currently in use.
- The Control registers (pc, ir, adr) keep track of the instruction that is currently executing.
- The System registers control the system status and interrupts.
- The Virtual Memory registers are used for memory management.
The memory is an array of \(2^{16}\) words. Each word in the memory is identified by an address, which is a 16-bit natural number. The memory is similar to the register file, but significantly slower and much larger.
Computational units. The ALU (arithmetic and logic unit) is a circuit that can do arithmetic, such as addition, subtraction, comparison, and some other operations More complex operations, such as multiplication and division, are provided by functional units.
The Input/Output system can transfer data between the computer and the outside world.
Words
In Sigma16, a word is a sequence of 16 bits. Occasionally we will also refer to a double word (a sequence of 32 bits). A a generic word is a sequence of bits of arbitrary length. The system does not use bytes (a byte is 8 bits) or extended words (64 bits).
The hardware components in Sigma16 are mostly 16 bits wide. Each addressable memory location is a word, and a memory address is a word. Each register is a word.
By itself, a word has no inherent meaning: it is just a sequence of bits. Some instructions operate on a word without regard to what it means: for example, several instructions copy a word from one place to another and it doesn't matter what the word means. Other instructions act on a word assuming that it represents some particular primitive data type. For example, integer arithmetic assumes that the word represents an integer, while address arithmetic assumes that the word represents a natural number.
Sigma16 supports natural numbers, and integers, which are represented as words. It also supports Booleans, which are represented as a bit within a word. Addresses and characters are represented as natural numbers.
Indexing bits in a word
The bits of a word are indexed from right to left, starting with index 0. The least significant (rightmost) bit has index 0, and the most significant bit (leftmost) has index 15.
The notations \(x_i\) and x.i
both mean the bit with index \(i\) in the
word \(x\), for \(0 \leq i \leq 15\). For example, \(x_0\) is the rightmost
bit and \(x_{15}\) is the leftmost bit. When used in an instruction, a
bit index is specified as a 4-bit binary number \(i\) such that \(0 \leq
i \leq 15\).
The following table shows the indices of all the bits in a word. The vertical bars break the word into groups of 4 bits. This grouping corresponds to the representation of the word in hex notation.
\[\fbox{$\ x_{15}\> x_{14}\> x_{13}\> x_{12} \ \vert\ x_{11}\> x_{10}\> x_{9}\> x_{8}$}\]
\[\fbox{$ \ \vert\ x_{7}\> x_{6}\> x_{5}\> x_{4} \ \vert\ x_{3}\> x_{2}\> x_{1}\> x_{0} \ $}\]
Fields
A bit field is a contiguous sequence of bits in a word. It is specified by two numbers: the indices of the rightmost and leftmost bits in the field.
Natural numbers
The natural numbers are \(0, 1, 2, \ldots\), so all natural numbers are nonnegative. Sigma16 represents natural numbers as binary, and uses them to represent memory addresses. The binary value of an \(n\) bit generic word \(x\) is
\[binval (x) = \sum_{0 \leq i < n} x_i * 2^i\]
For a word of 16 bits, natural numbers are restricted to the range from 0 through \(2^{16}-1\); that is, from 0 through 65,535. For a double word (32 bits), natural numbers are restricted to the range from 0 through \(2^{32}-1\); that is, from 0 through 4,294,967,295.
A natural number (and hence a binary number) cannot be negative. If you need numbers that can be negative or positive, you must use an integer. The arithmetic instructions in Sigma16 operate on integers, but address arithmetic is performed in binary.
Integers
Integers are represented using two's complement notation. If the leftmost (most significant) bit of a word is 0, its two's complement value is the same as its binary value. If the leftmost bit is 1, the two's complement value is negative.
Any two's complement number (with one exception) can be negated by inverting all the bits (replace 0 by 1 and vice versa) and then adding 1, discarding any overflow. The sole exception is that if the smallest negative number (\(-2^{15}\)) is negated, the result is overflow because the largest positive integer representable in 16 bit two's complement is \(2^{15}-1\).
For example, consider \(x\) = 1111 1010. Since the leftmost bit is 1, we know that \(x < 0\). We can negate \(x\) by inverting the bits, obtaining 0000 0101. Adding 1 gives 0000 0110 which is 6. Since \(-x = 6\), we conclude that \(x = -6\).
Notations for a word
Assembly language provides several notations for expressing the value of a word.
- A natural number between 0 and 65,535 (216 - 1)
- An integer between -32,768 and 32,767 (-215 and 215 - 1)
- A 4-digit hexadecimal constant, where the digits are 0-9 a-f. The constant is preceded by $, for example $3e8c.
In assembly language programs, hexadecimal constants are written with a preceding $ sign (e.g. $3b2f). This is necessary to avoid ambiguity: 1234 is a decimal number and $1234 is a hexadecimal number, and they denote different words.
Assembly language always requires a $ before a hexadecimal constant. For displays of the machine state, in contexts where there is no ambiguity, the $ may be omitted. For example, the user interface displays register and memory contents as hexadecimal without the leading $. However, the $ is always required for hex constants in assembly language programs.
Memory
The memory is a hardware array of words that are accessed by address. A memory address is 16 bits wide, and there is one memory location corresponding to each address, so there are 216 = 64k memory locations (1k = 1,024). Each memory location is a 16-bit word.
Instructions specify memory addresses in two parts: the displacement, which is a word representing a binary number, and the index, which is one of the registers in the register file. For example, a memory address could be specified as $003c[R5]; the displacement is 003c and the index is R5.
When the instruction is executed, the computer calculates the effective address by adding the value of the displacement and the value in the index register. If R5 contains 2, then the effective address of $003c[R5] is 003e.
This scheme may seem more complicated than simply specifying the address directly, but it is flexible. If the machine language just gave the address as a single binary number, it would be limited to accessing simple static variables. The effective address mechanism is simple to implement in hardware, as you can see in the digital circuit processor, yet it allows the implementation of static variables, local variables, records, arrays, pointers and linked data structures, jump tables, and more. These techniques are described later.
Registers
Register file
The register file is a set of 16 general registers, each of which holds a 16 bit word. A register is referenced by a 4-bit binary number. In assembly language, we use the notations R0, R1, R2, …, R9, R10, R11, R12, R13, R14, R15 to refer to the registers. The state of the register file can be written as a table showing the value of each register:
Register | Contents |
---|---|
R0 | 0000 |
R1 | fffe |
R2 | 13c4 |
… | … |
R14 | 03c8 |
R15 | 0020 |
Sigma16 is a load/store style architecture. All calculations are carried out in the register file, and explicit load and store instructions must be used to copy data between the memory and the register file.
R0 and R15 are special. The other registers, R1 through R14, may be used for any purpose.
- R0 always contains the constant 0.
It is legal to perform an instruction that attempts to load some other value into R0, but the register will still contain 0 after executing such an instruction. Such an instruction will simply have no lasting effect.
- R15 is the condition code register.
Condition code
Several instructions produce status information: the result of a comparison, whether there was an overflow, etc. This information is automatically loaded into R15, which is also called the condition code. The description of each instruction states whether R15 is modified, and what goes into it.
Each bit in the condition code (R15) is a Boolean that specifies whether a specific condition is false (0) or true (1). The bits are indexed from bit 0 (rightmost, or least significant) to bit 15 (the leftmost, or most significant). Condition code bits are sometimes called flags. Each flag gives the status of a relation or event. If the flag is True (1) the relation holds or the event has occurred. If the flag is False (0) the relation does not hold, or the event has not occurred.
One way to use flags in the condition code is to control conditional jumps. The jumpc0 and jumpc1 instructions are conditional jumps that use a flag in R15 to decide whether to jump:
- Use jumpc0 to jump if the Boolean is False
- Use jumpc1 to jump if the Boolean is True
Both jumpc0 and jumpc1 take two operands. The first is a natural number from 0 to 15, which gives the index of a flag in the condition code. The second is an address to jump to, specified as displacement[index]. The indices of the flags are given below.
The two conditional jumps (jumpc0 and jumpc1) let you jump depending on either the truth or falsity of a condition. For example, the flag at index 4 denotes integer less than. The fist pair of instructions below jumps if the less than condition holds, and the second pair jumps if the less than condition is false. This is equivalent to jumping if the "greater than or equal" condition holds.
cmp R8,R9 jumpc1 4,xyz[R0] ; if R8 < R9 then goto xyz … cmp R8,R9 jumpc0 4,xyz[R0] ; if R8 >= R9 then goto xyz
Programs usually use pseudoinstructions rather than the actual machine instructions jumpc0 and jumpc1. For example, the following two instructions are equivalent: the assembler generates exactly the same machine language code for them.
jumpc1 4,xyz[R0] ; if R8 < R9 then goto xyz jumplt xyz[R0] ; if R8 < R9 then goto xyz
In effect, the pseudoinstruction lets you omit the flag index and instead to indicate the condition you want in the mnemonic. It's easier to remember that less than is indicated by lt than by 4.
Another way to use condition code flags is to save them as Boolean variables and perform logic operations on them. In this case it's best to copy a flag to another register, using copyb. From there you can evaluate logic expressions using andb, orb, xorb, and logicb. It isn't a good idea to keep Boolean variables in R15 for a long time, because many instructions will modify it.
Before doing logic operations on condition code bits, consider whether "short circuit" evaluation, with a separate comparison followed by conditional jump for each relation, is more appropriate.
There is not just one flag for "less than" and another for "greater than". There are separate flags for integers (represented as two's complement) and natural numbers (represented as binary). This is necessary because the relation between two words sometimes depends on the type of the data. For example, consider the word ffff (all 1 bits). On its own, ffff is just a word of bits and has no inherent meaning.
- If ffff is interpreted as a natural number (i.e. binary), it is positive and has the value 65,535, and ffff > 0000.
- If ffff is interpreted as an integer (i.e. two's complement), then it is negative and has the value -1, and ffff < 0000.
The cmp instruction compares two words, and evaluates five relations between the words: equality, less than for both natural and integer, and greater than for both natural and integer. These evaluations are performed simultaneously, using parallelism in the ALU circuit, and the result of each is recorded in the corresponding flag in the condition code.
cmp R5,R6 jumpc1 3,lessNat[R0] ; if R5 <Nat R6 then goto lessNat … cmp R7,R8 jumpc1 4,LessInt[R0] ; if R7 <Int R8 then goto lessInt
Thus you don't specify the types of the operands in the cmp instruction. Instead, you select the condition code flag that corresponds to the types.
Each flag has a short 1-character name to enable it to be displayed compactly in the user interface. A naming convention is that flags for integers (two's complement) have lower case letters, while flags for natural numbers (binary) have upper case letters. For example:
- l means < for integers
- L means < for natural numbers
- g means > for integers
- G means > for natural numbers
Equality is the same regardless of type. If two words consist of exactly the same bits, then they have the same value as integers, natural numbers, characters, addresses, and for any other possible type as well. Therefore there is only one flag for equality, and its symbol is =.
Register R15 is also called the condition code. Some instructions modify a bit in the condition code to indicate the results of comparisons, overflow, carry, other special data, and various errors. Instructions that operate on registers can access R15 as usual. There are also some instructions that implicitly access R15 without specifying it (for example, jumpc0 and jumpc1). Since many instructions modify the condition code as a side effect, programs should not use R15 to hold variables.
The following table lists all the condition code flags.
- Index is the bit position of the flag in the condition code. Bits are numbered from right to left, starting with 0.
- Symbol: the graphical user interface shows the symbol for each bit that is 1. For example, if G is displayed after a comparison of two words, this means that > holds if the words are interpreted as natural numbers but not if they are interpreted as integers. However, if G> is displayed, the comparison result is greater than for both integer and two's complement representation.
- Meaning: Description using English or mathematical notation
- Hex: The value of the condition code word assuming the given flag is true and all others are false. If several flags are true, the condition code register will be the logic or of the hex values of the true flags. For example, suppose the words 00a3 is compared with 00c5. The comparison result is L (less than for naturals) and also l (less than for integers). So the condition code will contain 0008 + 0010 = 0018.
Table: Condition code flags
Index | Symbol | Meaning | Hex |
0 | > g | > Int | 0001 |
1 | G | > Nat | 0002 |
2 | = | = | 0004 |
3 | L | < Nat | 0008 |
4 | < l | < Int | 0010 |
5 | v | Int overflow | 0020 |
6 | V | Nat overflow | 0040 |
7 | C | Carry | 0080 |
8 | S | Stack overflow | 0100 |
9 | s | Stack underflow | 0200 |
10 | reserved | 0400 | |
11 | reserved | 0800 | |
12 | reserved | 1000 | |
13 | reserved | 2000 | |
14 | reserved | 4000 | |
15 | reserved | 8000 |
There is an exception for division by zero, but no corresponding flag in the condition code. The reason is that the div instruction places the remainder in R15, so the condition code isn't available to represent division by 0. You can use an interrupt to detect division by 0, and you can test explicity for division by 0 by using jumpz specifying the register containing the divisor before executing the div instruction.
- The ordinary general registers: R1 through R14
Registers R1 through R14 have no special properties defined by the machine. A program can use them for any purpose.
However, there are some programming conventions that use certain registers for special purposes. The hardware does not enforce, or even know about, these conventions, and you do not have to follow the conventions in programming. However, it is necessary to obey the conventions in order to use the standard software libraries in your program. See the section on Programming for a discussion of these standard usage conventions.
Instruction control registers
The instruction control registers enable the processor to keep track of the state of the running program. These registers are rarely used directly by the machine language program, but they are essential for keeping track of the execution of the program, and some instructions use them directly.
- pc – program counter
The PC (program counter) register contains the address of the next instruction to be executed (not the address of the instruction currently being executed). The name is illogical, but "program counter" is the traditional name and Sigma16 sticks with standard terminology.
- ir – instruction register
The IR contains the instruction that is currently executing. If the instruction is RRR format, the entire instruction is in the IR. The other instruction formats require two words; the first word is in the IR and the second word is in the ADR register.
- adr – address register
If the instruction currently executing is RRR format, the ADR register is not used, and its content is just whatever was left over from previous instructions.
If the instruction currently executing is RX or EXP format, the ADR register contains the second word of the instruction. Furthermore, for RX instructions the effective address is calculated and replaces the second word of the instruction.
- status – system status register
The status register contains flags that specify the current execution mode of the processor.
Index Symbol Meaning Hex 0 U/S 0: user state. 1: system state 0001 1 E 1: interrupts enabled 0002 If the processor is in system state, any instruction can be executed. If it is in user state, executing a privileged instruction causes an interrupt.
When the system is started by booting a program, the status is set to system state. If the program is just a standalone program (such as the simple examples), you can ignore the system state. If an operating system has been booted, the OS should put the processor into user state when it gives a time slice to a user process.
If the status register indicates that interrupts are enabled, and an interrupt request is set in the request register, than an interrupt occurs when the current instruction is completed. If interrupts are not enabled, then interrupt requests are ignored.
Interrupts
An interrupt is a jump performed automatically by the machine in response to an event, not by executing a jump instruction. Interrupts are used for error detection and recovery, input/output, concurrency, memory protection, and virtual memory. Most of these features are implemented by an operating system kernel.
There are two kinds of event that can trigger an interrupt:
- An error that occurs while executing an instruction. For example, an arithmetic instruction could produce a result that is too large to represent in a word (this is called overflow). Sometimes this is called an exception, and sometimes it is called an interrupt.
- An external event that has nothing to do with the running program, but which needs to be handled. For example, an I/O device may request an interrupt when it has completed an I/O operation. This is always called an interrupt.
There are several types of interrupt. Each type has a unique index. The interrupt indices serve two purposes: they determine the interrupt priority, and they give the bit index for the interrupt type in the mask and request registers. The following table shows each type of interrupt and the bit index of its flag.
Table: Interrupt Indices
Index | Symbol | Meaning | Hex |
---|---|---|---|
0 | time | timer interrupt | 0001 |
1 | sf | segmentation fault | 0002 |
2 | sto | stack overflow | 0004 |
3 | stu | stack underflow | 0008 |
4 | trap | user trap | 0010 |
5 | tco | integer overflow | 0020 |
6 | bo | natural overflow | 0040 |
7 | zdiv | divide by 0 | 0080 |
Each type of interrupt has a priority which is determined by its index: a lower index indicates higher priority. Thus the timer interrupt has the highest priority, and a stack overflow has higher priority than integer overflow.
Interrupts take place only between execution of instructions. When one instruction has finished, the processor checks for interrupts before going on to the next instruction. This consists of the following steps:
- If interrupts are not enabled, the check terminates and the next instruction is executed normally.
- If interrupts are enabled, and there is an interrupt request, and the corresponding mask bit is enabled, an interrupt will take place. If several types of interrupt meet these conditions, the one with highest priority (lowest index) is selected. Let i be the index of the type of interrupt that was selected.
- The processor clears the request bit for the interrupt in the request register, it copies the status register into rstat, and it copies the pc register into the rpc register. Also, it loads mem[vect+2*i] into the pc register, and it and sets flags in the status register to indicate system state and interrupts disabled.
- If the timer is running, it is stopped.
- Now the interrupt has completed, and the next instruction is executed normally. Since the pc has been modified, the machine has jumped to a different location.
This algorithm is performed entirely by the processor; none of it involves software. Although it is described as a sequence of steps, many of these steps can be performed in parallel, because digital circuits are inherently parallel. If an interrupt does not take place, the time required for the check is minimal (less than one clock cycle), so checking for interrupts does not signicantly slow down the processor speed.
mask - interrupt mask register
The mask register specifies which types of interrupt are allowed to occur. It contains a flag for each type of interrupt; if the flag is 1 then the corresponding type of interrupt can take place. If the mask flag is 0, then that type of interrupt will not occur even if it is requested.
request - interrupt request register
When an interrupt is requested, the corresponding bit is set in the request register. When the next instruction is executed, the interrupt occurs if interrupts are enabled and the corresponding mask bit is 1.
rstat register
When an interrupt occurs, the value of the status register is copied into rstat.
rpc register
When an interrupt occurs, the value of pc is copied into rpc. This is necessary to enable the operating system to resume the interrupted program.
vect register
The interrupt vector register contains the address of a jump table. This is an array of jumps to interrupt handlers.
Memory management registers
(Will be implemented in future version)
Instruction set
Instruction representation
Instructions are represented in the memory of the computer using words, just like all other kinds of data. From the programmer's perspective, an instruction is like a simple statement in a programming language. From the circuit designer's perspective, instructions must be executed using logic gates, and the specific way it is represented as a word of bits is important.
An instruction specifies several pieces of information. For example, add R1,R2,R3 is an instruction that says four things: it's an addition, the operands come from R2 and R3, and the result goes into R1. Therefore to represent an instruction we need to organize a word as a collection of several fields. A field is a portion of a word that contains a small number (for example, a register number). Each field gives one specific piece of information about the instruction.
The particular scheme for describing an instruction as a collection of fields is called an instruction format. Like most computers, Sigma16 has a small number of instruction formats and a larger number of instructions. The key to understanding the interface between machine language and digital circuit design is to master the instruction formats.
The core architecture (the simplest part of the system) uses just two instruction formats: the RRR format for instructions that perform calculations in the registers, and the RX format for instructions that refer to a memory location.
The advanced parts of the architecture provide additional instructions, most of which are represented with the EXP format. The name EXP stands simultaneously for expansion (because it provides for many additional instructions) and experimental (because it allows for experimentation with the design and implementation of new instructions).
The first word of every instruction (for all instruction formats) contains the following fields:
- op (bits 15-12) 4-bit opcode, determines instruction format
- d (bits 11-8) 4-bit destination
- a (bits 7-4) 4-bit operand a
- b (bits 3-0) 4-bit operand b, or expanded opcode for RX
op | d | a | b |
15-12 | 11-8 | 7-4 | 3-0 |
If an instruction is RX or EXP format, it has a second word that contains four 4-bit fields. The meanings of the fields depend on the particular instruction.
- e (bits 15-12) 4-bit operand
- f (bits 11-8) 4-bit operand
- g (bits 7-4) 4-bit operand
- h (bits 3-0) 4-bit operand
disp |
15-0 |
e | f | g | h |
15-12 | 11-8 | 7-4 | 3-0 |
The first word of every instruction has a 4-bit field called the operation code. This is usually abbreviated to opcode (or just op). Since the opcode contains 4 bits, it can specify 16 values:
- If the opcode is between 0 and 12 (hex 0 to hex c), the instruction is RRR format, and there are 14 RRR instructions.
- An opcode value of 13 is reserved for future expansion of the architecture.
- If the op field is 14 (hex e) the instruction is EXP format and has a secondary opcode in the a and b fields.
- If the op field contains 15 (hex f) the instruction is RX format with a secondary opcode in the b field.
The details of each format and its fields are given below, and the following table gives an overview.
- Each instruction format has a fixed size, which is the number of words used to represent any instruction of that format.
- These words are subdivided into 4-bit fields, each with a unique name: op, d, a, b, etc.
- The type of instruction is specified by the opcode, which is the op field for RRR instructions, and which consists of several fields for the other formats.
- The instruction operands are specified by further fields.
- Some instructions combine two 4-bit fields into a single 8-bit
field.
- The a and b fields may be combined to form an 8-bit field called ab (only for the EXP format)
- The g and h fields may be combined to form an 8-bit field called gh
Format | Size | Opcode | Operands | Example |
---|---|---|---|---|
RRR | 1 | op | d,a,b | add Rd,Ra,Rb |
RX | 2 | op,b | d,a,disp | load Rd,disp[Ra] |
EXP | 2 | op,ab | d,e,f,g,h | save Rd,Re,gh[Rf] |
There are three machine instruction formats: RRR, RX, EXP. However, there is a larger set of assembly language statement formats, because there are special syntaxes for some instructions, and there are assembler directives that aren't instructions at all. Thus there are two kinds of format: the machine instruction formats, and the assembly language instruction statement formats. The assembly language formats are described later.
RRR format
The RRR format is used for instructions that perform calculations in the registers, without using memory.
An instruction in RRR format is one word containing four 4-bit fields called op, d, a, b. Each field contains 4 bits representing a binary number between 0 and 15.
op | d | a | b |
- op (bits with index 15 to 12) is the operation code, usually called opcode. This determines the operation to be performed. If the opcode is between 0 and 12 it specifies an RRR instruction. An opcode greater than 12 indicates an expanding opcode: the instruction is not RRR but one of the other formats, and it has a secondary opcode that specifies precisely which instruction it is. This is explained in the sections on RX and EXP formats.
- d (bits 11 to 8) is the destination register; the register where (in most cases) the result will be loaded.
- a (bits 7 to 4) is the register containing the first operand.
- b (bits 3 to 0) is the register containing the second operand.
In most cases, an RRR instruction takes two operands in registers specified by the a and b fields and produces a result which is loaded into the register specified by the d field.
A typical example of an RRR instruction is add R4,R13,R2, which adds the contents of registers R13 and R2, and loads the result into R4. It's equivalent to R4 := R13 + R2. The opcode for add is 0, so the machine language code for this instruction is 04d2.
RX format
The RX format is used for instructions that use a memory location. There are generally two operands: one is a register and the other is a word in memory.
RX instructions specify a memory location as well as a register operand. The machine language representation is two words:
Here is RX
The RX instruction format is used for instructions that use a memory address, which is specified by an index register and a displacement. The name of the format describes briefly the two operands: a register (R) and an indexed memory address (X).
An RX instruction contains two operands: one is a memory address, and the other is a register. Typical RX instructions are loads, stores, and jumps. The instruction consists of two consecutive words. The first has the same format as an RRR instruction, with four fields: op, d, sa, sb. The second word is a single 16-bit binary number, and is called the displacement.
An RX instruction is represented by two words, with the following fields: op=15, b contains the secondary opcode which specifies which RX instruction it is, d is the destination, a is the index register, and the second word is a 16 bit constant called the displacement (often written disp for short).
- op field (bits 0-3 of ir) is f for all RX instructions
- d field (bits 4-7 of ir) has several uses
- a field (bits 8-11 of ir) is index register for effective address
- b field (bits 12-15 of ir) is secondary opcode
- disp (displacement) is the second word of the instruction
- ea (effective address) = displacement + r[a]
The memory address is specified in two parts: an index register and the displacement. The index register is specified in the sa field. In assembly language, the notation used is number[reg], where the number is the value of the displacement, and the reg is the index register. Thus $20b3[R2] means the address has displacement $20b3 and the index register is R2.
When the machine executes an RX instruction, it begins by calculating the effective address. This is abbreviated "ea", and its value is the sum of the displacement and the contents of the index register.
RX instructions are represented in two words, and they use an "expanding opcode". That is, the op field of the first word of the instruction contains the constant f (the bits 1111) for every RX instruction, and the sb field is used to hold a secondary opcode indicating which RX instruction it is.
The register operand is specified in the d field. For several RX instructions, this is indeed the destination of the instruction: for example, load places data into Rd. However, a few RX instructions use the d field differently (see, for example, the conditional jump instructions).
The memory address is specified using the sa field and the displacement, which is the entire second word of the instruction.
The RX format is used for instructions that access the memory. There are two operands: a memory address and a register. The memory address is specified using two fields: a constant (called the displacement) and a register (called the index register).
An RX instruction consists of two words. The first has the same format as RRR. The op field is 15, which means "this instruction has RX format". A secondary operation code is needed to specify which RX instruction this is, this is given in the b field. The d field is the destination register, and the a field is the index register. There is only one operand register for RX instructions, since the b field is needed for the secondary operation code.
op | d | a | b |
The second word consists of one 16-bit field called the displacement (abbreviated as disp).
displacement |
The index register Rb and the displacement together specify the
effective address. The assembly language syntax for the effective
address is disp[Rb]
, and its value is \(\mathrm{disp} + \mathrm{Rb}\).
For example, suppose R4 contains 7 when the following instruction is
executed:
load R2,5[R4]
The address operand is 5[R4]
, where the displacement is 5 and the
index is R4. When the instruction executes, the effective address is
\(5 + \mathrm{R4} = 5 + 7 = 12\), or $000c
.
The displacement is a constant given in the instruction, but the index
register is variable. Since R0 always contains 0, the effective
address for disp[R0]
is the value of disp
.
The displacement is represented in binary, and the effective address
is calculated in binary, not in two's complement. Thus the effective
address of ffff[R0]
is the (positive) address of the last word in
memory – it isn't a negative number.
EXP format
The EXP instructions provide more complex operations, and they belong to the Standard architecture. (The Core architecture uses only RRR and RX). An EXP instruction consists of two words.
The first word has the same op and d fields as RRR and RX. The op field contains 14 (hex e), which indicates that the instruction is EXP format. The a and b fields are treated as one 8-bit natural number, which is the secondary operation code. This provides for the possibility of up to 256 EXP format instructions, which enables new experimental instructions to be defined.
op | d | ab |
The second word contains for 4-bit fields. Each of these may contain either a register number or a short 4-bit number, depending on the instruction.
e | f | g | h |
Notation for machine language
We usually write instructions in assembly language (sub R3,R12,R9) but the computer executes machine language (13c9). There is a precise relationship between these two notations for an instruction. By writing assembly language, you determine the exact value of every bit in every word of a machine language program. This section explains how these two notations for an instruction are related.
When each instruction is described below, an example is given showing a typical use of the instruction, along with the general form of the assembly language instruction.
The general form uses the machine language field names to show where each piece of the instruction goes in the machine code. The instruction format is also given, along with any constant fields.
As an example, the logicu instruction is EXP format, so its machine language representation consists of two words with the following fields:
op | d | a | b |
---|---|---|---|
e | f | g | h |
The primary opcode is e, which goes in the op field. The secondary opcode is 14, which goes in the ab field. Thus op=e, a=1, and b=4 (here, a dollar sign is used before the hex constants to distinguish them from the field names).
$e | d | $1 | $4 |
---|---|---|---|
e | f | g | h |
and the For example, here is the general form of the logicu instruction:
logicu Rd,e,Rf,g,h ; Rd.e := h (Rd.e, Rf.g) EXP op=#e ab=$14
logicu Rd,e,Rf,g,h ; Rd.e := h (Rd.e, Rf.g) EXP op=#e ab=$14
The format is EXP, so the instruction has all the fields of the EXP
format: op, d, ab, e, f, g, h. There are two constant fields: the
primary opcode (op) is $e
, and the secondary opcode (ab) is $14
.
The values of the other fields are given in the general form of the
assembly language. The operands Rd and Rf refer to registers, and
those register numbers go into the d and f fields of the instruction.
The e, g, and h fields are given as numbers in the assembly language.
(The assembly language convention is to give constants in decimal
notation, not hexadecimal.)
Example:
logicu R7,5,R2,13,xor ; R7.5 := R7.5 xor R2.13 e714 52d6
Arithmetic
add
The instruction add Rd,Ra,Rb
has operands Ra
and Rb
and
destination Rd
. It fetches the operands Ra
and Rb
, calculates
the sum Ra
+ Rb
, and loads the result into the destination Rd
.
The effect is Rd := Ra + Rb
. For example, add R5,R12,R2
performs R5 := R12 + R3
.
As well as setting the destination register, the add instruction sets or clears the overflow and carry bits in the condition code (which is R15). The other bits in R15 are not changed.
If the destination register is R15, the sum is placed in R15, and the overflow and carry flags are discarded.
The add instruction is RRR format with opcode=0. Given destination Rd and operands Ra and Rb (where d, a, b are hex digits), add Rd,Ra,Rb is reprseented by 0dab.
assembly language | add Rd,Ra,Rb |
instruction format | RRR |
opcode | 0 |
machine language | $0 d a b |
effect | R[d] := R[a] + R[b] |
condition code | vVC |
Examples:
Code | Assembly | Effect |
---|---|---|
062c | add R6,R2,R12 | ; R6 := R2 + R12 |
0d13 | add R13,R1,R3 | ; R13 := R1 + R3 |
A field is a consecutiave sequence of bits within a word. A field is specified with the index of the leftmost bit in the field, along with the size of the field. For example, the field in x with index 9 and size 3 consists of the bits x.9 x.8 x.7.
add R1,R2,R3 ; R1 := R2 + R3
The instruction add Rd,Ra,Rb has operands Ra and Rb and destination Rd. It fetches the operands Ra and Rb, calculates the sum Ra + Rb, and loads the result into the destination Rd. The effect is Rd := Ra + Rb. For example, add R5,R12,R2 performs R5 := R12 + R3.
The add instruction is RRR format with opcode=0. Given destination Rd and operands Ra and Rb (where d, a, b are hex digits), add Rd,Ra,Rb is reprseented by 0dab.
The add instruction can be used for both binary addition (on natural numbers) and for two's complement addition (on signed integers).
- 16-bit natural numbers are unsigned integers 0, 1, 2, …, 65535. If two natural numbers are added, the result is a natural number (the result cannot be negative). If the result is 65536 or larger, it cannot be represented as a 16 bit binary number. If this happens, the destination register is set to the lower 16 bits of the true result, and the binary overflow flag is set in the Condition Code.
- 16-bit two's complement numbers are signed integers -32999?, …, -1, 0, 1, …, 32???. If two signed integers are added, the result is a signed integer. If the result is less than -32000 or greater than 32000, then the result cannot be represented as a 16 bit two's complement number. If this happens, the destination register is set to the lower 16 bits of the true result, and the two's complement overflow flag is set in the Condition Code. Furthermore, the overflow flag is set in the req register. If interrupts are enabled and the overflow flag is 1 in the mask register, then an interrupt will occur immediatelhy after the add instruction executes.
The add instruction can be used for both binary addition (on natural numbers) and for two's complement addition (on signed integers).
- 16-bit natural numbers are unsigned integers 0, 1, 2, …, 65535. If two natural numbers are added, the result is a natural number (the result cannot be negative). If the result is 65536 or larger, it cannot be represented as a 16 bit binary number. If this happens, the destination register is set to the lower 16 bits of the true result, and the binary overflow flag is set in the Condition Code.
- 16-bit two's complement numbers are signed integers -32999?, …, -1, 0, 1, …, 32???. If two signed integers are added, the result is a signed integer. If the result is less than -32000 or greater than 32000, then the result cannot be represented as a 16 bit two's complement number. If this happens, the destination register is set to the lower 16 bits of the true result, and the two's complement overflow flag is set in the Condition Code. Furthermore, the overflow flag is set in the req register. If interrupts are enabled and the overflow flag is 1 in the mask register, then an interrupt will occur immediatelhy after the add instruction executes.
sub
Example: sub R1,R2,R3 ; R1 := R2 - R3
This instruction is similar to add; the only difference is that it calculates R2-R3 and places the result in R1. The effect on the condition code is the same as for add.
assembly language | sub Rd,Ra,Rb |
instruction format | RRR |
opcode | 1 |
machine language | $1 d a b |
effect | R[d] := R[a] - R[b] |
The instruction sub Rd,Ra,Rb has operands Ra and Rb and destination Rd. It fetches the operands Ra and Rb, calculates the difference Ra - Rb, and loads the result into the destination Rd. The effect is Rd := Ra - Rb. For example, sub R5,R12,R2 performs R5 := R12 - R3.
The sub instruction is RRR format with opcode=1.
Code Assembly Effect –— -----------–— -------------–— 162c sub R6,R2,R12 ; R6 := R2 - R12 1d13 sub R13,R1,R3 ; R13 := R1 - R3
In addition to setting the destination register, the sub instruction sets several bits in the condition code R15 and may set a bit in the req register.
----–— ----------------–— R15.ccG result > 0 (binary) R15.ccg result > 0 (two's complement) R15.ccE result = 0 R15.ccl result < 0 (two's complement) R15.ccV overflow (binary) R15.CCv overflow (two's complement) R15.CCc carry output R15.ccf logicc instruction function result ----–— ----------------–—
mul
Example: mul R1,R2,R3 ; R1 := R2 * R3
The multiply instruction mul Rd,Ra,Rb calculates the integer (two's complement) product of the operands Ra and Rb, and places the result in the destination register Rd. The mul instruction does not produce the natural (binary) product.
assembly language | mul Rd,Ra,Rb |
instruction format | RRR |
machine language | [$2 d a b] |
effect | reg[d] := reg[a] * reg[b] |
If the magnitude of the product is too large to be representable as a 16 bit two's complement integer, this is an overflow. If overflow occurs, the integer overflow bit is set in the condition code (F15) and the integer overflow bit is also set in the interrupt request register (req), and the lower order 16 bits of the product are loaded into Rd.
----–— ----------------–— R15.ccg result > 0 (two's complement) R15.ccE result = 0 R15.ccl result < 0 (two's complement) R15.CCv overflow (two's complement) R15.CCc carry output ----–— ----------------–—
div
Example: div R1,R2,R3 ; R1 := R2 / R3, R15 := R2 rem R3
Unlike the other arithmetic operations, the divide instruction div Rd,Ra,Rb produces two results: the quotient Ra / Rb and the remainder Ra rem Rb. It loads the quotient into the destination register Rd, and the remainder is loaded into R15.
assembly language | div Rd,Ra,Rb |
instruction format | RRR |
opcode | 3 |
machine language | $3 d a b |
effect | reg[d] := reg[a] / reg[b], |
reg[15] := reg[a] rem reg[b] |
If the destination register Rd is R15, then the quotient is placed in R15, and the remainder is discarded.
The divide instruction doesn't set the condition code, since R15 is used for the remainder. Therefore there is no condition code bit to indicate division by 0. However, it is easy for a program to detect a division by 0.
- (Explicit test for error) The program can compare the divisor with 0 before or after executing the divide instruction, and jump to an error handler if the divisor is 0. This is similar to testing the condition code after an add, sub, or mul instruction, but it does require two instructions: a compare followed by a conditional jump. For example:
div R1,R2,R3 ; R1 := R2/R3, R15 := R2 rem R3 cmp R3,R0 ; Did we divide by 0? jumpeq zeroDivide[R0] ; If yes, handle error
- (Exception) The program can detect division by 0 using an interrupt. To do this, enable interrupts and enable the interrupt mask for division by 0. See the section on Interrupts. This approach does not require a compare or jump instruction for each division.
cmp
The compare instruction cmp Ra,Rb
compares the values in the operand
registers Ra
and Rb
, and then sets flags in the condition code
(R15) to indicate the result. The notation R15.i means bit i in R15;
thus R15.0 is the leftmost bit of R15. The instruction performs both
natural number comparison (binary) and integer comparison (two's
complement). The resulting flags are
- binary less than (L) in R15.0
- two's complement less than (<) in R15.1
- equal in R15.2
- binary greater than (G) in R15.3
- two's complement greater than (>) in R15.4
assembly language | cmp Rd,Ra,Rb |
instruction format | RRR |
opcode | 4 |
machine language | $4 d a b |
effect | reg[d] := reg[a] / reg[b] |
The result of a cmp instruction can be used to control a conditional jump. The jumpc0 instruction jumps is a specified bit of R15 is 0, and the jumpc1 instruction jumps if a specified bit is 1.
Pseudoinstructions provide the most common cases; for example jumple jumps if the condition code indicates that a comparison produced either integer less-than or equal. A common pattern is a cmp followed by a jump pseudoinstruction, for example:
cmp R4,R9 ; compare R4 with R9 jumpgt abc]R0] ; if R4 > R9 then goto abc
addc
The addc instruction performs a binary addition with carry propagation. It adds the two operand registers and the carry bit in the condition code register, R15. The sum is loaded into the destination register Rd and the carry output is written back into the carry bit, overwriting its previous value. Overflow is not possible with this instruction.
muln
muln Rd,Ra,Rb
The muln instruction calculates the product of two natural numbers in Ra and Rb. The result is 32 bits; the leftmost 16 bits (the most significant part) is loaded into R15, and the rightmost 16 bits (the least significant part) is loaded into Rd. If Rd is R15, the most significant part is discarded.
instruction format | RRR |
machine language | [$4 d a b] |
effect | reg[d] := reg[a] / reg[b] |
assembly language | cmp Rd,Ra,Rb |
divn
divn Rd,Ra,Rb
The divn instruction divides two natural numbers: dividend / divisor. All the numbers – numerator, denominator, quotient, remainder – are natural numbers represented in binary.
instruction format | RRR |
machine language | [$4 d a b] |
effect | reg[d] := reg[a] / reg[b] |
assembly language | divn Rd,Ra,Rb |
- The dividend is a 32 bit natural number; its leftmost 16 bits are in R15 and the rightmost 16 bits are in Ra. Thedenominator is in Rb.
- Two results are produced: a 32-bit quotient anda 16-bit remainder.
- The leftmost 16 bits of the quotient are placedin R15 (replacing the leftmost part of the dividend). The rightmost16 bits of the quotient are placed in Rd.
- The remainder is placed in Ra, overwriting the least significant half of the dividend operand
Accessing memory
A memory address is a 16-bit binary number. Instructions don't specify addresses directly; they specify an address with two components: a displacement and an index, written as "displacement[index]". The displacement is a 16 bit constant, and in assembly language it may be given as a decimal integer, a hexadecimal word, or a label. The index is a register. For example,$0c45[R5] has a displacement of 0c45 and an index of R5.
When an instruction executes, the machine takes the displacement and index and calculates the effective address. This is defined to be the binary sum of the displacement and the curent value in the index register. In the example above, if R5 contains 3, then the effective address of $0c45[R5] is $0c48.
If you just want to specify an address a in an instruction, this can be written as "a[R0]". Since R0 contains the constant 0, the effective address is just a.
lea
The load effective address instruction lea Rd,disp[Ra] calculates the effective address of the operand disp[Ra] and places the result in the destination register Rd. The effective address is the binary sum disp+Ra.
assembly language | lea Rd,disp[Ra] |
opcode | f, 0 |
instruction format | RX |
machine language | $f d a $0, disp |
effect | reg[d] := disp+reg[a] |
load
The load instruction load Rd,disp[Rx] calculates the effective address of the operand disp[Rx] and copies the word in memory at the effective address into the destination register Rd. The effective address is the binary sum disp+Rx.
assembly language | load Rd,disp[Ra] |
instruction format | RX |
opcode | f, 1 |
machine language | $f d a $1, disp |
effect | reg[d] := mem[disp+reg[a]] |
Examples
load R12,count[R0] ; R12 := count load R6,arrayX[R2] ; R6 := arrayX[R2] load R3,$2b8e[R5] ; R3 := mem[2b8e+R5]
store
The store instruction store Rd,disp[Rx] calculates the effective address of the operand disp[Rx]. The value of the destination register Rd is stored into memory at the effective address.
assembly language | store Rd,disp[Ra] |
instruction format | RX |
opcode | f, 2 |
machine language | $f d a $2, disp |
effect | mem[disp+reg[a]] := reg[d] |
Store copies the word in the destination register into memory at the effective address. This instruction is unusual in that it treats the "destination register" as the source of data, and the actual destination which is modified is the memory location.
Most instructions take data from the rightmost operands and modify the leftmost destination, just like an assignment statement (x := y+z). However, the store instruction operates in the opposite direction. The reason for this has to do with the circuit design of the processor. Although the "left to right" nature of the store instruction may look inconsistent from the programmer's point of view, it actually is more consistent from the deeper perspective of circuit design.
Examples
store R3,$2b8e[R5] store R12,count[R0] store R6,arrayX[R2]
Stacks
Three instructions (push, pop, top) support operations on a stack represented as an array of contiguous elements, where the stack grows from lower to higher addresses. These instructions provide safe operations: they never overwrite memory outside the stack, and they indicate stack underflow and overflow by setting the condition code and optionaly performing an exception.
A stack is represented by three addresses, which are provided to the push, pop, and top instructions in registers:
- The stack base is the address of the first word allocated for the stack.
- The stack limit is the address of the last word allocated for the stack.
- The stack top is the address of the stack element that was pushed most recently.
Although three addresses are required to characterise the state of a stack, each individual stack instruction (push, pop, top) requires only two of those addresses. These are supplied as the Ra and Rb operands, while Rd is used to supply or receive the data value.
The maximum number of elements the stack may contain is stack limit - stack base + 1. Normally, stack limit is greater than stack base. If they are equal, there is only one word allocated for the stack (which is generally not useful), and if stack base > stack limit then no memory at all is allocated and every stack operation will signal an underflow or overflow error.
If the stack is not empty, then stack top is the address of the top element in the stack. If the stack is empty, then stack top must be stack base - 1.
A stack can be created and initialized by allocating a region of memory, setting stack base to the first word and stack limit to the last word, and setting stack top to stack base - 1.
push
The push instruction pushes an element onto a stack. It is RRR format, and its general form is:
push Rd,Ra,Rb
- Rd = stack data: value to be pushed, unchanged
- Ra = stack top: incremented unless stack was full
- Rb = stack limit: unchanged
- R15 condition code indicates stack overflow
- System interrupt request register indicates stack overflow
This instruction pushes the word in Rd onto a stack with stack top in Rd and stack limit in Rb, provided that the stack is not full. The push stores the data word in Ra into memory and increments stack top Rd. If the stack is full, nothing is stored into memory and a stack overflow error is indicated in the condition code and interrupt request registers; an interrupt will occur if interrupts are enabled and the stack mask bit is set. The operational semantics is:
if Ra < Rb then Ra := Ra + 1; mem[Ra] := Rd else R15.sovfl := 1, req.sovfl := 1
If Rd = Rb this means the stack completely fills the region of memory allocated for the stack, and there is no space to store a new element. In this case, the push instruction does not store Ra, it doesn't modify Rd, it doesn't modify memory outside the block, and it doesn't overwrite data in the stack. Instead, the instruction indicates a stack overflow by setting the sovfl (stack overflow) bit in the condition code (R15), and it also sets the stack fault bit in the interrupt request register. If interrupts are enabled and the stack fault bit is set in the interrupt mask register, then an interrupt will occur after the push instruction completes. There will be no interrupt if interrupts are disabled, or the stack fault bit is not set in the mask register.
pop
The push instruction removes an element onto a stack and returns it. The instruction is RRR format, and its general form is:
pop Rd,Ra,Rb
- Rd = stack data: destination for the popped stack element
- Ra = stack top: decremented unless stack was empty
- Rb = stack base: unchanged
- R15 condition code indicates stack underflow
- System interrupt request register indicates stack underflow
This instruction pops the word from a stack with stack top in Ra and stack base in Rb, provided that the stack is not empty. The pop loads the top element of the stack into Rd and decrements stack top. If the stack is empty, stack top is not decremented, Rd is not modified, and a stack underflow error is indicated in the condition code and interrupt request registers; an interrupt will occur if interrupts are enabled and the stack mask bit is set. The operational semantics is:
if Ra >= Rb then Rd := mem[Ra]; Ra := Ra - 1 else R15.suvfl := 1, req.suvfl := 1
top
The top instruction returns the top element on a stack but does not remove it. The instruction is RRR format, and its general form is:
top Rd,Ra,Rb
- Rd = stack data: destination for the top element of the stack; unchanged if stack is empty
- Ra = stack top: unchanged
- Rb = stack base: unchanged
- R15 condition code indicates stack underflow
- System interrupt request register indicates stack underflow
This instruction loads the element at stack top into Rd, provided that the stack is not empty. If the stack is empty, Rd is not modified and a stack underflow error is indicated in the condition code and interrupt request registers; an interrupt will occur if interrupts are enabled and the stack mask bit is set. The operational semantics is:
if Ra >= Rb then Rd := mem[Ra] else R15.suvfl := 1, req.suvfl := 1
Stack frames
When a program calls a procedure it is usually necessary to save the state of the caller in a data structure called a stack frame. (This is not necessary during a "tail call".) The stack frame is pushed onto the execution stack. When the procedure returns, the contents of the stack frame need to be loaded back into the registers. These operations can be performed by ordinary store instructions (for procedure call) and load instructions (for procedure return). However, this often requires a significant number of instructions.
The save
and restore
instructions transfer a block of data between
registers and memory, making it easier to use stack frames. These
instructions are analogous to store
and load
, but they store or
load multiple words, not just an individual word.
- save stores a sequence of adjacent registers into a block of contiguous memory locations.
- restore is the opposite: it loads the block of memory into the registers.
For both instructions, the sequence of registers is specified by
giving the first and last register. The starting address of the
memory block is specified by an effective address of the form
offset[Reg]
, where Reg is any register (e.g. R4, R13, etc) and
offset
is a number between 0 and 255.
Normally \(d \leq e\), in which case the number of registers to be saved is \(e-d + 1\). If \(d > e\), the instruction will do nothing. The base register \(R_f\) should not lie within the group of registers to be saved or restored.
The first register to be saved is \(R_d\), and the last register to be
saved is \(R_e\). The instruction always stores at least one register.
If \(d = e\), for example save R5,R5,0[R14]
then only R5 is stored.
If \(d < e\) then the register numbers wrap around For example,
There is an important restriction with save
and restore
: the
displacement is limited to a small natural number between 0 and 255.
(Recall that for load and store, the displacement can be as large as
65,535.) The reason for this is that the save and restore
instructions are EXP format and the offset is represented by an 8-bit
field (whereas load and store are RX format and the displacement is a
16-bit word).
The instruction is EXP format, and the offset is limited to 8 bits, because it is specified in the gh field, which is the rightmost 8 bits of the second word of the instruction. The secondary opcode is 9, which is in the ab field of the first word of the instruction.
save
The save instruction performs a sequence of stores, which copy the contents of a sequence of registers into consecutive memory locations.
- \(\mathsf{mem}[R_f+gh] := R_d\)
- \(\mathsf{mem}[R_f+gh+1] := R_{d+1}\)
- \(\mathsf{mem}[R_f+gh+2] := R_{d+2}\)
- \(\ldots\)
- \(\mathsf{mem}[R_f+gh+e-d] := R_e\)
assembly language | save Rd,Re,gh[Rf] |
instruction format | EXP |
opcode | E, 0A |
machine language | E d 0A, e f gh |
effect | \(M[R_f+gh] := R_d\) |
\(M[R_f+gh+1] := R_{d+1}\) | |
\(M[R_f+gh+2] := R_{d+2}\) | |
\(\ldots\) | |
\(M[R_f+gh+e-d] := R_e\) |
For example, the following instruction will save registers R3 through R9 into memory starting at address 6 + R13. Its machine language code is e30b 9d06.
save R3,R9,6[R13]
This instruction is equivalent to a sequence of store instructions:
store R3,6[R13] store R4,7[R13] store R5,8[R13] store R6,9[R13] store R7,10[R13] store R8,11[R13] store R9,12[R13]
As this example shows, using save
and restore
can make procedure
calls more concise and readable as well as less error-prone.
restore
This restore
instruction has the same operands as save
, and it
performs a sequence of loads corresponding to the stores performed by
save.
assembly language | restore Rd,Re,gh[Rf] |
instruction format | EXP |
opcode | E, 0B |
machine language | E d 0B, e f gh |
effect | \(R_d := M[R_f+gh]\) |
$Rd+1 := M[Rf+gh+1] | |
\(R_{d+2} := M[R_f+gh+2]\) | |
\(\ldots\) | |
\(R_e := M[R_f+gh+e-d]\) |
For example, consider this instruction:
restore R3,R10,4[R14]
The effect is equivalent to
load R3,4[R14] load R4,5[R14] load R5,6[R14] load R6,7[R14] load R7,8[R14] load R8,9[R14] load R9,10[R14] load R10,11[R14]
Suppose a stack frame was created by the save
example above.
To restore the registers from the stack frame, use:
restore R3,R9,6[R13] ; e30b 9d06. store R3-R9 starting at 6[R13]
It is equivalent to a sequence of store instructions:
load R3,6[R13] load R4,7[R13] load R5,8[R13] load R6,9[R13] load R7,10[R13] load R8,11[R13] load R9,12[R13]
restore R3,R5,6[R13] ; e30c 5d06. load R3-R5 starting at 6[R13]
This is equivalent to a sequence of load instructions:
load R3,6[R13] load R4,7[R13] load R5,8[R13] load R6,9[R13]
The restore instruction copies a sequence of consecutive memory locations starting from the effecive address into a sequence of adjacent registers. The index register (R14 in this example) is not changed. Restore is equivalent to a fixed sequence of load instructions; its purpose of restore is to restore the state of registers from memory after a procedure call or a context switch.
The instruction restore Re,Rf,gh[Rd] copies the contents of memory at consecutive locations beginning with mem[gh+Rf] into registers Re, Re+1, …, Rf.
Jumps
A jump is a transfer control to another address, rather than to the following instruction. The destination address is specified as the effective address in the jump instruction. A jump can be used to implement a goto statement.
jump
instruction format | RX |
machine language | {$f d a $3} {disp} |
effect | pc := disp+R[a] |
assembly language | jump disp[Ra] |
jumpc0, jumpc1
These instructions jump to a destination if a specified bit in the condition code is 0 (for jumpc0) or 1 (for jumpc1).
The first operand of jumpc0 is the index of a bit in R15. Don't specify R15; the condition bit is always in R15. The index of the condition bit must be specified in the d field of the instruction. Since this is an index, not a register, it's written as a number.
- jumpc0 3,loop[R0] ; Right
- jumpc0 R3,loop[R0] ; Wrong - not using R3!
lea R1,5[R0] lea R2,6[R0] cmp R1,R2 jumpc0 0,yes[R0] ; jump if GT so don't jump jumpc0 4,no[R0] ; jump if LT so do jump
jumpc0 goes to the destination if the specified condition bit is 0.
assembly language | jumpc0 d,disp[Ra] |
instruction format | RX |
opcode | f,4 |
machine language | $f d a $4 |
disp | |
effect | if R[15].d=0 |
then pc := disp+R[a] |
jumpc1 is similar: it jumps if the specified bit is 1.
assembly language | jumpc1 d,disp[Ra] |
instruction format | RX |
opcode | f,5 |
machine language | $f d a $5 |
disp | |
effect | if R15.d=1 |
effect | then R15.d=1 |