ARM assembler in Raspberry Pi – Chapter 22
Several times in previous chapters we have talked about ARM as an architecture that has several features aimed at embedding systems. In embedded systems memory is scarce and expensive, so designs that help reduce the memory footprint are very welcome. Today we will see another of these features: the Thumb instruction set.
The Thumb instruction set
In previous installments we have been working with the ARMv6 instruction set (the one implemented in the Raspberry Pi). In this instruction set, all instructions are 32-bit wide, so every instruction takes 4 bytes. This is a common design since the arrival of RISC processors. That said, in some scenarios such codification is overkill in terms of memory consumption: many platforms are very simple and rarely need all the features provided by the instruction set. If only they could use a subset of the original instruction set that can be encoded in a smaller number of bits!
So, this is what the Thumb instruction set is all about. They are a reencoded subset of the ARM instructions that take only 16 bits per instructions. This means that we will have to waive away some instructions. As a benefit our code density is higher: most of the time we will be able to encode the code of our programs in half the space.
Support of Thumb in Raspbian
While the processor of the Raspberry Pi properly supports Thumb, there is still some software support that unfortunately is not provided by Raspbian. This means that we will be able to write
some snippets in Thumb but in general this is not supported (if you try to use Thumb for a full C program you will end with a sorry, unimplemented
message by the compiler).
Instructions
Thumb provides about 45 instructions (of about 115 in ARMv6). The narrower codification of 16 bit means that we will be more limited in what we can do in our code. Registers are split into two sets: low registers, r0
to r7
, and high registers, r7
to r15
. Most instructions can only fully work with low registers and some others have limited behaviour when working with high registers.
Also, Thumb instructions cannot be predicated. Recall that almost every ARM instruction can be made conditional depending on the flags in the cpsr
register. This is not the case in Thumb where only the branch instruction is conditional.
Mixing ARM and Thumb is only possible at function level: a function must be wholly ARM or Thumb, it cannot be a mix of the two instruction sets. Recall that our Raspbian system does not support Thumb so at some point we will have to jump from ARM code to Thumb code. This is done using the instruction (available in both instruction sets) blx
. This instruction behaves like the bl
instruction we use for function calls but changes the state of the processor from ARM to Thumb (or Thumb to ARM).
We also have to tell the assembler that some portion of assembler is actually Thumb while the other is ARM. Since by default the assembler expects ARM, we will have to change to Thumb at some point.
From ARM to Thumb
Let’s start with a very simple program returning an error code of 2 set in Thumb.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
/* thumb-first.s */ .text .code 16 /* Here we say we will use Thumb */ .align 2 /* Make sure instructions are aligned at 2-byte boundary */ thumb_function: mov r0, #2 /* r0 ← 2 */ bx lr /* return */ .code 32 /* Here we say we will use ARM */ .align 4 /* Make sure instructions are aligned at 4-byte boundary */ .globl main main: push {r4, lr} blx thumb_function /* From ARM to Thumb we use blx */ pop {r4, lr} bx lr |
.code 16 /* Here we say we will use Thumb / .align 2 / Make sure instructions are aligned at 2-byte boundary */
thumb_function: mov r0, #2 /* r0 ← 2 / bx lr / return */
.code 32 /* Here we say we will use ARM / .align 4 / Make sure instructions are aligned at 4-byte boundary */
.globl main main: push {r4, lr}
blx thumb_function /* From ARM to Thumb we use blx */
pop {r4, lr}
bx lr</p></div>
Thumb instructions in our thumb_function actually resemble ARM instructions. In fact most of the time there will not be much difference. As stated above, Thumb instructions are more limited in features than their ARM counterparts.
If we run the program, it does what we expect.
$ ./thumb-first; echo $? 2 |
How can we tell our program actually mixes ARM and Thumb? We can use objdump -d
to dump the instructions of our thumb-first.o
file.
$ objdump -d thumb-first.o thumb-first.o: file format elf32-littlearm Disassembly of section .text: 00000000 <thumb_function>: 0: 2002 movs r0, #2 2: 4770 bx lr 4: e1a00000 nop ; (mov r0, r0) 8: e1a00000 nop ; (mov r0, r0) c: e1a00000 nop ; (mov r0, r0) 00000010 <main>: 10: e92d4010 push {r4, lr} 14: fafffff9 blx 0 <thumb_function> 18: e8bd4010 pop {r4, lr} 1c: e12fff1e bx lr |
thumb-first.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <thumb_function>: 0: 2002 movs r0, #2 2: 4770 bx lr 4: e1a00000 nop ; (mov r0, r0) 8: e1a00000 nop ; (mov r0, r0) c: e1a00000 nop ; (mov r0, r0)
00000010 <main>: 10: e92d4010 push {r4, lr} 14: fafffff9 blx 0 <thumb_function> 18: e8bd4010 pop {r4, lr} 1c: e12fff1e bx lr
Check thumb_function
: its two instructions are encoded in just two bytes (instruction bx lr
is at offset 2 of mov r0, #2
. Compare this to the instructions in main
: each one is at offset 4 of its predecessor instruction. Note that some padding was added by the assembler at the end of the thumb_function
in form of nop
s (that should not be executed, anyway).
Calling functions in Thumb
In in Thumb we want to follow the AAPCS convention like we do when in ARM mode, but then some oddities happen. Consider the following snippet where thumb_function_1
calls thumb_function_2
.
.code 16 /* Here we say we will use Thumb */ .align 2 /* Make sure instructions are aligned at 2-byte boundary */ thumb_function_2: /* Do something here */ bx lr thumb_function_1: push {r4, lr} bl thumb_function_2 pop {r4, lr} /* ERROR: cannot use lr in pop in Thumb mode */ bx lr |
thumb_function_1: push {r4, lr} bl thumb_function_2 pop {r4, lr} /* ERROR: cannot use lr in pop in Thumb mode */ bx lr
Unfortunately, this will be rejected by the assembler. If you recall from chapter 10, in ARM push and pop are mnemonics for stmdb sp!
and ldmia sp!
, respectively. But in Thumb mode push
and pop
are instructions on their own and so they are more limited: push
can only use low registers and lr
, pop
can only use low registers and pc
. The behaviour of these two instructions almost the same as the ARM mnemomics. So, you are now probably wondering why these two special cases for lr
and pc
. This is the trick: in Thumb mode pop {pc}
is equivalent to pop the value val
from the stack and then do bx val
. So the two instruction sequence: pop {r4, lr}
followed by bx lr
becomes simply pop {r4, pc}
.
So, our code will look like this.
/* thumb-call.s */ .text .code 16 /* Here we say we will use Thumb */ .align 2 /* Make sure instructions are aligned at 2-byte boundary */ thumb_function_2: mov r0, #2 bx lr /* A leaf Thumb function (i.e. a function that does not call any other function so it did not have to keep lr in the stack) returns using "bx lr" */ thumb_function_1: push {r4, lr} bl thumb_function_2 /* From Thumb to Thumb we use bl */ pop {r4, pc} /* This is how we return from a non-leaf Thumb function */ .code 32 /* Here we say we will use ARM */ .align 4 /* Make sure instructions are aligned at 4-byte boundary */ .globl main main: push {r4, lr} blx thumb_function_1 /* From ARM to Thumb we use blx */ pop {r4, lr} bx lr |
.code 16 /* Here we say we will use Thumb / .align 2 / Make sure instructions are aligned at 2-byte boundary */
thumb_function_2: mov r0, #2 bx lr /* A leaf Thumb function (i.e. a function that does not call any other function so it did not have to keep lr in the stack) returns using "bx lr" */
thumb_function_1: push {r4, lr} bl thumb_function_2 /* From Thumb to Thumb we use bl / pop {r4, pc} / This is how we return from a non-leaf Thumb function */
.code 32 /* Here we say we will use ARM / .align 4 / Make sure instructions are aligned at 4-byte boundary */ .globl main main: push {r4, lr}
blx thumb_function_1 /* From ARM to Thumb we use blx */
pop {r4, lr}
bx lr</p></div>
From Thumb to ARM
Finally we may want to call an ARM function from Thumb. As long as we stick to AAPCS everything should work correctly. The Thumb instruction to call an ARM function is again blx
. Following is an example of a small program that says “Hello world” four times calling printf
, a function in the C library that in Raspbian is of course implemented using ARM instructions.
/* thumb-first.s */ .text .data message: .asciz "Hello world %d\n" .code 16 /* Here we say we will use Thumb */ .align 2 /* Make sure instructions are aligned at 2-byte boundary */ thumb_function: push {r4, lr} /* keep r4 and lr in the stack */ mov r4, #0 /* r4 ← 0 */ b check_loop /* unconditional branch to check_loop */ loop: /* prepare the call to printf */ ldr r0, addr_of_message /* r0 ← &message */ mov r1, r4 /* r1 ← r4 */ blx printf /* From Thumb to ARM we use blx. printf is a function in the C library that is implemented using ARM instructions */ add r4, r4, #1 /* r4 ← r4 + 1 */ check_loop: cmp r4, #4 /* compute r4 - 4 and update the cpsr */ blt loop /* if the cpsr means that r4 is lower than 4 then branch to loop */ pop {r4, pc} /* restore registers and return from Thumb function */ .align 4 addr_of_message: .word message .code 32 /* Here we say we will use ARM */ .align 4 /* Make sure instructions are aligned at 4-byte boundary */ .globl main main: push {r4, lr} /* keep r4 and lr in the stack */ blx thumb_function /* from ARM to Thumb we use blx */ pop {r4, lr} /* restore registers */ bx lr /* return */ |
.text
.data message: .asciz "Hello world %d\n"
.code 16 /* Here we say we will use Thumb / .align 2 / Make sure instructions are aligned at 2-byte boundary / thumb_function: push {r4, lr} / keep r4 and lr in the stack / mov r4, #0 / r4 ← 0 / b check_loop / unconditional branch to check_loop / loop:
/ prepare the call to printf / ldr r0, addr_of_message / r0 ← &message / mov r1, r4 / r1 ← r4 / blx printf / From Thumb to ARM we use blx. printf is a function in the C library that is implemented using ARM instructions / add r4, r4, #1 / r4 ← r4 + 1 / check_loop: cmp r4, #4 / compute r4 - 4 and update the cpsr / blt loop / if the cpsr means that r4 is lower than 4 then branch to loop */
pop {r4, pc} /* restore registers and return from Thumb function */
.align 4 addr_of_message: .word message
.code 32 /* Here we say we will use ARM / .align 4 / Make sure instructions are aligned at 4-byte boundary / .globl main main:
push {r4, lr} / keep r4 and lr in the stack / blx thumb_function / from ARM to Thumb we use blx /
pop {r4, lr} / restore registers / bx lr / return */
To know more
In next installments we will go back to ARM, so if you are interested in Thumb, you may want to check this Thumb 16-bit Instruction Set Quick Reference Card provided by ARM. When checking that card, be aware that the processor of the Raspberry Pi only implements ARMv6T, not ARMv6T2.
That’s all for today.
ARM assembler in Raspberry Pi – Chapter 21 ARM assembler in Raspberry Pi – Chapter 23
I note that the mul command sometimes requires the operands to be in different registers (with slight changes for Thumb mode). Where can I find detailed descriptions of such simple operators?
Still really enjoying your notes.
well you can get the documentation of the ARMv6 architecture in the ARM Information Center. The document is a PDF only available upon (free, AFAIK) registration.
That said, if you don’t feel like registering for only one document there are some copies online.
Kind regards,