ARM assembler in Raspberry Pi – Chapter 7
ARM architecture has been for long targeted at embedded systems. Embedded systems usually end being used in massively manufactured products (dishwashers, mobile phones, TV sets, etc). In this context margins are very tight so a designer will always try to spare as much components as possible (a cent saved in hundreds of thousands or even millions of appliances may pay off). One relatively expensive component is memory although every day memory is less and less expensive. Anyway, in constrained memory environments being able to save memory is good and ARM instruction set was designed with this goal in mind. It will take us several chapters to learn all of these techniques, today we will start with one feature usually named shifted operand.
Indexing modes
We have seen that, except for load (ldr
), store (str
) and branches (b
and bXX
), ARM instructions take as operands either registers or immediate values. We have also seen that the first operand is usually the destination register (being str
a notable exception as there it plays the role of source because the destination is now the memory). Instruction mov
has another operand, a register or an immediate value. Arithmetic instructions like add
and and
(and many others) have two more source registers, the first of which is always a register and the second can be a register or an immediate value.
These sets of allowed operands in instructions are collectively called indexing modes. Today this concept will look a bit off since we will not index anything. The name indexing makes sense in memory operands but ARM instructions, except load and store, do not have memory operands. This is the nomenclature you will find in ARM documentation so it seems sensible to use theirs.
We can summarize the syntax of most of the ARM instructions in the following pattern
instruction Rdest, Rsource1, source2
There are some exceptions, mainly move (mov
), branches, load and stores. In fact move is not so different actually.
mov Rdest, source2
Both Rdest
and Rsource1
must be registers. In the next section we will talk about source2
.
We will discuss the indexing modes of load and store instructions in a future chapter. Branches, on the other hand, are surprisingly simple and their single operand is just a label of our program, so there is little to discuss on indexing modes for branches.
Shifted operand
What is this mysterious source2
in the instruction patterns above? If you recall the previous chapters we have used registers or immediate values. So at least that source2
is this: register or immediate value. You can use an immediate or a register where a source2
is expected. Some examples follow, but we have already used them in the examples of previous chapters.
mov r0, #1 mov r1, r0 add r2, r1, r0 add r2, r3, #4
But source2
can be much more than just a simple register or an immediate. In fact, when it is a register we can combine it with a shift operation. We already saw one of these shift operations in chapter 6. Not it is time to unveil all of them.
LSL #n
Logical Shift Left. Shifts bitsn
times left. Then
leftmost bits are lost and then
rightmost are set to zero.LSL Rsource3
Like the previous one but instead of an immediate the lower byte of a register specifies the amount of shifting.LSR #n
Logical Shift Right. Shifts bitsn
times right. Then
rightmost bits are lost and then
leftmost bits are set to zero,LSR Rsource3
Like the previous one but instead of an immediate the lower byte of a register specifies the amount of shifting.ASR #n
Arithmetic Shift Right. Like LSR but the leftmost bit before shifting is used instead of zero in then
leftmost ones.ASR Rsource3
Like the previous one but using a the lower byte of a register instead of an immediate.ROR #n
Rotate Right. Like LSR but then
rightmost bits are not lost but pushed onto then
leftmost bitsROR Rsource3
Like the previous one but using a the lower byte of a register instead of an immediate.
In the listing above, n
is an immediate from 1 to 31. These extra operations may be applied to the value in the second source register (to the value, not to the register itself) so we can perform some more operations in a single instruction. For instance, ARM does not have any shift right or left instruction. You just use the mov
instruction.
mov r1, r2, LSL #1 |
You may be wondering why one would want to shift left or right the value of a register. If you recall chapter 6 we saw that shifting left (LSL
) a value gives a value that the same as multiplying it by 2. Conversely, shifting it right (ASR
if we use two’s complement, LSR
otherwise) is the same as dividing by 2. Since a shift of n
is the same as doing n
shifts of 1, shifts actually multiply or divide a value by 2n.
mov r1, r2, LSL #1 /* r1 ← (r2*2) */ mov r1, r2, LSL #2 /* r1 ← (r2*4) */ mov r1, r3, ASR #3 /* r1 ← (r3/8) */ mov r3, #4 mov r1, r2, LSL r3 /* r1 ← (r2*16) */ |
We can combine it with add
to get some useful cases.
add r1, r2, r2, LSL #1 /* r1 ← r2 + (r2*2) equivalent to r1 ← r2*3 */ add r1, r2, r2, LSL #2 /* r1 ← r2 + (r2*4) equivalent to r1 ← r2*5 */ |
You can do something similar with sub
.
sub r1, r2, r2, LSL #3 /* r1 ← r2 - (r2*8) equivalent to r1 ← r2*(-7) |
ARM comes with a handy rsb
(Reverse Substract) instruction which computes Rdest ← source2 - Rsource1
(compare it to sub
which computes Rdest ← Rsource1 - source2
).
rsb r1, r2, r2, LSL #3 /* r1 ← (r2*8) - r2 equivalent to r1 ← r2*7 */ |
Another example, a bit more contrived.
/* Complicated way to multiply the initial value of r1 by 42 = 7*3*2 */ rsb r1, r1, r1, LSL #3 /* r1 ← (r1*8) - r1 equivalent to r1 ← 7*r1 */ add r1, r1, r1, LSL #1 /* r1 ← r1 + (2*r1) equivalent to r1 ← 3*r1 */ add r1, r1, r1 /* r1 ← r1 + r1 equivalent to r1 ← 2*r1 */ |
You are probably wondering why would we want to use shifts to perform multiplications. Well, the generic multiplication instruction always work but it is usually much harder to compute by our ARM processor so it may take more time. There are times where there is no other option but for many small constant values a single instruction may be more efficient.
Rotations are less useful than shifts in everyday use. They are usually used in cryptography, to reorder bits and “scramble” them. ARM does not provide a way to rotate left but we can do a n
rotate left doing a 32-n
rotate right.
/* Assume r1 is 0x12345678 */ mov r1, r1, ROR #1 /* r1 ← r1 ror 1. This is r1 ← 0x91a2b3c */ mov r1, r1, ROR #31 /* r1 ← r1 ror 31. This is r1 ← 0x12345678 */ |
That’s all for today.
ARM assembler in Raspberry Pi – Chapter 6 ARM assembler in Raspberry Pi – Chapter 8
When you add two numbers by hand, you start by vertically aligning their digits. Then you add vertically each column starting from the right and moving to the left, taking care of the carry values you may need during the addition. Well, the circuitry of a processor does something similar. But a processor is able to advance the different carries found along the computation in a way that adding two numbers of N bits does not take a “k*N” amount of time but “k*log(N)”. Compare how you calculate a sum: when you add two numbers of 10 digits you need twice more time than when you add two numbers of 5 digits. The processor would just need one extra step to add 10 digits compared to adding 5 digits. Cool, isn’t it?
Now consider multiplication. When you multiply two numbers, you start as well by vertically aligning them. But now, you pick each digit of one of the rows (I use the lower one) and then you multiply it with every number of the other row (in my case it would be the upper one). This gives you an intermediate row. Then you repeat this process with the digit at the left of the previous one, giving you a new intermediate row that you put below the previous intermediate one but shifted left a position. Once you have done this with all the digits of the first row you use (the lower one in my case), then you add all the intermediate rows. As you can see, there are lots of steps involved here. When you multiply binary numbers a similar thing happens, though multiplying binary numbers is easier because it is just multiplying by 0 or 1, but you still have to add the intermediate rows. So the whole process is much more complicated. So the processor needs more time to calculate it.
When multiplying a number by a power of 2 (2, 4, 8, 16, …) shifting left is as simple as moving 1 bit to the left. All this is done by just routing the bits inside the processor: no complex computation is needed at all. So avoiding multiplications is usually a Good Thing™
Of course, you added some extra details I find really interesting. I hope this is useful for more people as well.
Looks like shift instructions are available.
Reference : http://www.cl.cam.ac.uk/projects/raspberrypi/tutorials/os/armv6.html
no,
lsl
is not an ARMv6 instruction but a mnemonic implemented by the assembler. It looks like an instruction but it is just another instruction in disguise.You can see that by doing the following:
test.s
file that contains.text
main:
lsl r0, #3
mov r0, r0, LSL #3
gcc -c test.s
, this will create a filetest.o
objdump -d test.o
, you will see the following two instructions00000000 <main>:
0: e1a00180 lsl r0, r0, #3
4: e1a00180 lsl r0, r0, #3
note that they are the same instruction.
objdump
recognizes the pattern and shows the mnemonic instead.Kind regards,
And I almost forgot, thanks for writing up this great series. Thoroughly enjoying this series and began programming on my old samsumg galaxy 3 armv6 android phone running debian gnu/linux
Some Typos in chapter 7:
Both RDest and Rsource must be registers.
Both RDest and Rsource1 must be registers.
What is this misterious
What is this mysterious
not lost bot pushed
not lost but pushed
mov r3, 4
mov r3, #4
equivalent to r1 <- r1*3
equivalent to r1 <- r2*3
equivalent to r1 <- r1*5
equivalent to r1 <- r2*5
Greetings,
Gerry
thanks a lot! I updated the text with the fixes.
Kind regards,