ARM assembler in Raspberry Pi – Chapter 5
Branching
Until now our small assembler programs execute one instruction after the other. If our ARM processor were only able to run this way it would be of limited use. It could not react to existing conditions which may require different sequences of instructions. This is the purpose of the branch instructions.
A special register
In chapter 2 we learnt that our Raspberry Pi ARM processor has 16 integer general purpose registers and we also said that some of them play special roles in our program. I deliberately ignored which registers were special as it was not relevant at that time.
But now it is relevant, at least for register r15
. This register is very special, so special it has also another name: pc
. It is unlikely that you see it used as r15
since it is confusing (although correct from the point of view of the ARM architecture). From now we will only use pc
to name it.
What does pc
stand for? pc
means program counter. This name, the origins of which are in the dawn of computing, means little to nothing nowadays. In general the pc
register (also called ip
, instruction pointer, in other architectures like 386 or x86_64) contains the address of the next instruction going to be executed.
When the ARM processor executes an instruction, two things may happen at the end of its execution. If the instruction does not modify pc
(and most instructions do not), pc
is just incremented by 4 (like if we did add pc, pc, #4
). Why 4? Because in ARM, instructions are 32 bit wide, so there are 4 bytes between every instruction. If the instruction modifies pc
then the new value for pc
is used.
Once the processor has fully executed an instruction then it uses the value in the pc
as the address for the next instruction to execute. This way, an instruction that does not modify the pc
will be followed by the next contiguous instruction in memory (since it has been automatically increased by 4). This is called implicit sequencing of instructions: after one has run, usually the next one in memory runs. But if an instruction does modify the pc
, for instance to a value other than pc + 4
, then we can be running another instruction of the program. This process of changing the value of pc
is called branching. In ARM this done using branch instructions.
Unconditional branches
You can tell the processor to branch unconditionally by using the instruction b
(for branch) and a label. Consider the following program.
1 2 3 4 5 6 7 8 9 |
/* -- branch01.s */ .text .global main main: mov r0, #2 /* r0 ← 2 */ b end /* branch to 'end' */ mov r0, #3 /* r0 ← 3 */ end: bx lr |
If you execute this program you will see that it returns an error code of 2.
$ ./branch01 ; echo $? 2
What happened is that instruction b end
branched (modifying the pc
) to the instruction at the label end
, which is bx lr
, the instruction we run at the end of our program. This way the instruction mov r0, #3
has not actually been run at all (the processor never reached that instruction).
At this point the unconditional branch instruction b
may look a bit useless. It is not the case. In fact this instruction is essential in some contexts, in particular when linked with conditional branching. But before we can talk about conditional branching we need to talk about conditions.
Conditional branches
If our processor were only able to branch just because, it would not be very useful. It is much more useful to branch when some condition is met. So a processor should be able to evaluate some sort of conditions.
Before continuing, we need to unveil another register called cpsr
(for Current Program Status Register). This register is a bit special and directly modifying it is out of the scope of this chapter. That said, it keeps some values that can be read and updated when executing an instruction. The values of that register include four condition code flags called N
(negative), Z
(zero), C
(carry) and V
(overflow). These four condition code flags are usually read by branch instructions. Arithmetic instructions and special testing and comparison instruction can update these condition codes too if requested.
The semantics of these four condition codes in instructions updating the cpsr
are roughly the following
N
will be enabled if the result of the instruction yields a negative number. Disabled otherwise.Z
will be enabled if the result of the instruction yields a zero value. Disabled if nonzero.C
will be enabled if the result of the instruction yields a value that requires a 33rd bit to be fully represented. For instance an addition that overflows the 32 bit range of integers. There is a special case for C and subtractions where a non-borrowing subtraction enables it, disabled otherwise: subtracting a larger number to a smaller one enables C, but it will be disabled if the subtraction is done the other way round.V
will be enabled if the result of the instruction yields a value that cannot be represented in 32 bits two’s complement.
So we have all the needed pieces to perform branches conditionally. But first, let’s start comparing two values. We use the instruction cmp
for this purpose.
cmp r1, r2 /* updates cpsr doing "r1 - r2", but r1 and r2 are not modified */ |
This instruction subtracts to the value in the first register the value in the second register. Examples of what could happen in the snippet above?
- If
r2
had a value (strictly) greater thanr1
thenN
would be enabled becauser1-r2
would yield a negative result. - If
r1
andr2
had the same value, thenZ
would be enabled becauser1-r2
would be zero. - If
r1
was 1 andr2
was 0 thenr1-r2
would not borrow, so in this caseC
would be enabled. If the values were swapped (r1
was 0 andr2
was 1) then C would be disabled because the subtraction does borrow. - If
r1
was 2147483648 (the largest positive integer in 32 bit two’s complement) andr1
was -1 thenr1-r2
would be 2147483649 but such number cannot be represented in 32 bit two’s complement, soV
would be enabled to signal this.
How can we use these flags to represent useful conditions for our programs?
EQ
(equal) When Z is enabled (Z is 1)NE
(not equal). When Z is disabled. (Z is 0)GE
(greater or equal than, in two’s complement). When both V and N are enabled or disabled (V is N)LT
(lower than, in two’s complement). This is the opposite of GE, so when V and N are not both enabled or disabled (V is not N)GT
(greather than, in two’s complement). When Z is disabled and N and V are both enabled or disabled (Z is 0, N is V)LE
(lower or equal than, in two’s complement). When Z is enabled or if not that, N and V are both enabled or disabled (Z is 1. If Z is not 1 then N is V)MI
(minus/negative) When N is enabled (N is 1)PL
(plus/positive or zero) When N is disabled (N is 0)VS
(overflow set) When V is enabled (V is 1)VC
(overflow clear) When V is disabled (V is 0)HI
(higher) When C is enabled and Z is disabled (C is 1 and Z is 0)LS
(lower or same) When C is disabled or Z is enabled (C is 0 or Z is 1)CS
/HS
(carry set/higher or same) When C is enabled (C is 1)CC
/LO
(carry clear/lower) When C is disabled (C is 0)
These conditions can be combined to our b
instruction to generate new instructions. This way, beq
will branch only if Z
is 1. If the condition of a conditional branch is not met, then the branch is ignored and the next instruction will be run. It is the programmer task to make sure that the condition codes are properly set prior a conditional branch.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
/* -- compare01.s */ .text .global main main: mov r1, #2 /* r1 ← 2 */ mov r2, #2 /* r2 ← 2 */ cmp r1, r2 /* update cpsr condition codes with the value of r1-r2 */ beq case_equal /* branch to case_equal only if Z = 1 */ case_different : mov r0, #2 /* r0 ← 2 */ b end /* branch to end */ case_equal: mov r0, #1 /* r0 ← 1 */ end: bx lr |
If you run this program it will return an error code of 1 because both r1
and r2
have the same value. Now change mov r1, #2
in line 5 to be mov r1, #3
and the returned error code should be 2. Note that case_different
we do not want to run the case_equal
instructions, thus we have to branch to end
(otherwise the error code would always be 1).
That’s all for today.
ARM assembler in Raspberry Pi – Chapter 4 ARM assembler in Raspberry Pi – Chapter 6
I though that you can manipulate the cpsr directly, isn’t the commands MSR and MRS meant for that ? (Source: Arm v6 reference manual: http://www.scss.tcd.ie/~waldroj/3d1/arm_arm.pdf)
The text says that direct manipulation of cpsr is not possible.
Thanks for the great series, I have got so much out of it. I started by studying the arm instructions at http://www.davespace.co.uk/arm/introduction-to-arm/index.html and after that your article series adds to that information nicely by showing how to use the commands in real life
Keep up the good work
thanks for the comment!
You are right. I’ll reword the text, just to make clear that while it can be modified it is out of the scope of this chapter.
Kind regards,
main:
mov r0, #1
add pc, pc, #4
mov r0, #2
mov r0, #3
end:
bx lr
This gives me 1 (it jumped to end directly, while I was expecting that the r0←3 would get executed (I shifted the normal behavior of pc by the length of one instruction only, not two)
Another hypothesis I had was that since it’s modifying pc, it would not increment it, but I guess only the branching instructions have that behavior…
Any insights?
you are experiencing what, in my opinion, is a «quirk» in the ARM architecture (this is: the contract between the CPU designer and the software developer on how the CPU behaves).
Ideally one would expect, when reading the
pc
register in an instruction, to have the address of the current instruction.Imagine that the instruction
add pc, pc, #4
is in address 0x1000. You would expect, at the end of the instructionpc
be 0x1004. As usual in ARM, sincepc
got modified in the instruction, you would not add 4 bytes to it (as in implicit sequencing) but directly jump to 0x1004. So the next instruction run would be the one at the address 0x1004.Well, this is where the ARM quirk comes into play. When you read the
pc
register in an instruction its value is the current instruction plus 8 bytes.For instance, the following code,
mov r1, #0
current: mov pc, pc
plus4: add r1, r1, #1
plus8: add r1, r1, #2
plus12: add r1, r1, #3
end:
Here
r1
will atend
have the value 5 (2+3) instead of 6 (1+2+3). Why? Because in instructionmov pc, pc
, pc did not have the addresscurrent
, it wascurrent + 8
which in the example isplus8
. Since the instruction does modifypc
, the ARM processor does not dopc ← pc + 4
before starting the next instruction but just keeps thepc
as is. So by simply updating thepc
to itself we were able to skip 1 instruction.This is what is happening to you: in the
add pc, pc, #4
instruction you are reading apc
of the instructionmov r0, #3
. If you add to it 4 more bytes, it is the address of thebx lr
.This quirk may be a bit annoying, just remember that when you directly read
pc
it will always be the current instruction plus 8.Is this a problem most of the time? No, if you use labels in your branches, the assembler internally fixes everything for you.
I cannot explain the reason of this behaviour in ARM. I think this issue has historical roots in the earlier ARM designs where it probably happened that the
pc
was read at a point in the processor state where it had already been implicitly advanced by 8 bytes. This seems to be a very ARM specific thing (a similar sequence of code like the one above in other architectures would setr1
to 6).I hope this answers your question.
My guess is that in earlier (and simpler) iterations of the ARM architecture in the alu stage when you read the
pc
you were reading thepc
of the instruction-after-the-next-one (the value that the physical register had at that stage).Probably ARM had to preserve this architectural behaviour in later versions of the architecture, so the quirk remained. Note, though, there is no technical reason that prevents the alu stage to have the address of the current instruction when reading the operands.
For BGE, the branch condition is when N=V, not N=Z.
I looked it up in the ARM documentation when I considered that N=Z should be impossible. Zero has a sign bit of zero, and any calculation that results in zero should not have the N bit set.
you’re right. I made a typo in GE and then I propagated it to LT.
I fixed the post. Thanks a lot.
Kind regards,
Kind regards,
you are right. I’ve already fixed the post.
Thank you very much.
I noticed this because “bneq end_of_loop” threw an error.
I am glad to make a small contribution to your great tutorial, which I am going through with excitement and pleasure by learning asm
I also hope the subject will stay understandable for me to the last chapter. May I contact you If some questions will remain open after that?
Regards.
yes of course. Feel free to ask in the comments section of each chapter where you have questions.
I also hope next chapters are easy to understand as well
Kind regards,
Oops! I scattered this typo all over the blog.
Thank you for the heads-up.
Kind regards,
$ ./compare01 ; echo $?
Should be ./branch01 ; echo $?
thank you. I already fixed the post.
Kind regards,