ARM assembler in Raspberry Pi – Chapter 9
In previous chapters we learnt the foundations of ARM assembler: registers, some arithmetic operations, loads and stores and branches. Now it is time to put everything together and add another level of abstraction to our assembler skills: functions.
Why functions?
Functions are a way to reuse code. If we have some code that will be needed more than once, being able to reuse it is a Good Thing™. This way, we only have to ensure that the code being reused is correct. If we repeated the code whe should verify it is correct at every point. This clearly does not scale. Functions can also get parameters. This way not only we reuse code but we can use it in several ways, by passing different parameters. All this magic, though, comes at some price. A function must be a a well-behaved citizen.
Do’s and don’ts of a function
Assembler gives us a lot of power. But with a lot of power also comes a lot of responsibility. We can break lots of things in assembler, because we are at a very low level. An error and nasty things may happen. In order to make all functions behave in the same way, there are conventions in every environment that dictate how a function must behave. Since we are in a Raspberry Pi running Linux we will use the AAPCS (chances are that other ARM operating systems like RISCOS or Windows RT follow it). You may find this document in the ARM documentation website but I will try to summarize it in this chapter.
New special named registers
When discussing branches we learnt that r15
was also called pc
but we never called it r15
anymore. Well, let’s rename from now r14
as lr
and r13
as sp
. lr
stands for link register and it is the address of the instruction following the instruction that called us (we will see later what is this). sp
stands for stack pointer. The stack is an area of memory owned only by the current function, the sp
register stores the top address of that stack. For now, let’s put the stack aside. We will get it back in the next chapter.
Passing parameters
Functions can receive parameters. The first 4 parameters must be stored, sequentially, in the registers r0
, r1
, r2
and r3
. You may be wondering how to pass more than 4 parameters. We can, of course, but we need to use the stack, but we will discuss it in the next chapter. Until then, we will only pass up to 4 parameters.
Well behaved
functions
A function must adhere, at least, to the following rules if we want it to be AAPCS compliant.
- A function should not make any assumption on the contents of the
cpsr
. So, at the entry of a function condition codes N, Z, C and V are unknown. - A function can freely modify registers
r0
,r1
,r2
andr3
. - A function cannot assume anything on the contents of
r0
,r1
,r2
andr3
unless they are playing the role of a parameter. - A function can freely modify
lr
but the value upon entering the function will be needed when leaving the function (so such value must be kept somewhere). - A function can modify all the remaining registers as long as their values are restored upon leaving the function. This includes
sp
and registersr4
tor11
.
This means that, after calling a function, we have to assume that (only) registersr0
,r1
,r2
,r3
andlr
have been overwritten.
Calling a function
There are two ways to call a function. If the function is statically known (meaning we know exactly which function must be called) we will use bl label
. That label must be a label defined in the .text
section. This is called a direct (or immediate) call. We may do indirect calls by first storing the address of the function into a register and then using blx Rsource1
.
In both cases the behaviour is as follows: the address of the function (immediately encoded in the bl
or using the value of the register in blx
) is stored in pc
. The address of the instruction following the bl
or blx
instruction is kept in lr
.
Leaving a function
A well behaved function, as stated above, will have to keep the initial value of lr
somewhere. When leaving the function, we will retrieve that value and put it in some register (it can be lr
again but this is not mandatory). Then we will bx Rsource1
(we could use blx
as well but the latter would update lr
which is useless here).
Returning data from functions
Functions must use r0
for data that fits in 32 bit (or less). This is, C types char
, short
, int
, long
(and float
though we have not seen floating point yet) will be returned in r0
. For basic types of 64 bit, like C types long long
and double
, they will be returned in r1
and r0
. Any other data is returned through the stack unless it is 32 bit or less, where it will be returned in r0
.
In the examples in previous chapters we returned the error code of the program in r0
. This now makes sense. C’s main
returns an int
, which is used as the value of the error code of our program.
Hello world
Usually this is the first program you write in any high level programming language. In our case we had to learn lots of things first. Anyway, here it is. A “Hello world” in ARM assembler.
(Note to experts: since we will not discuss the stack until the next chapter, this code may look very dumb to you)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
/* -- hello01.s */ .data greeting: .asciz "Hello world" .balign 4 return: .word 0 .text .global main main: ldr r1, address_of_return /* r1 ← &address_of_return */ str lr, [r1] /* *r1 ← lr */ ldr r0, address_of_greeting /* r0 ← &address_of_greeting */ /* First parameter of puts */ bl puts /* Call to puts */ /* lr ← address of next instruction */ ldr r1, address_of_return /* r1 ← &address_of_return */ ldr lr, [r1] /* lr ← *r1 */ bx lr /* return from main */ address_of_greeting: .word greeting address_of_return: .word return /* External */ .global puts |
greeting: .asciz "Hello world"
.balign 4 return: .word 0
.text
.global main main: ldr r1, address_of_return /* r1 ← &address_of_return / str lr, [r1] / *r1 ← lr */
ldr r0, address_of_greeting /* r0 ← &address_of_greeting */
/* First parameter of puts */
bl puts /* Call to puts */
/* lr ← address of next instruction */
ldr r1, address_of_return /* r1 ← &address_of_return */
ldr lr, [r1] /* lr ← *r1 */
bx lr /* return from main */
address_of_greeting: .word greeting address_of_return: .word return
/* External */ .global puts
We are going to call puts
function. This function is defined in the C library and has the following prototype int puts(const char*)
. It receives, as a first parameter, the address of a C-string (this is, a sequence of bytes where no byte but the last is zero). When executed it outputs that string to stdout
(so it should appear by default to our terminal). Finally it returns the number of bytes written.
We start by defining in the .data
the label greeting
in lines 4 and 5. This label will contain the address of our greeting message. GNU as provides a convenient .asciz
directive for that purpose. This directive emits as bytes as needed to represent the string plus the final zero byte. We could have used another directive .ascii
as long as we explicitly added the final zero byte.
After the bytes of the greeting message, we make sure the next label will be 4 bytes aligned and we define a return
label in line 8. In that label we will keep the value of lr
that we have in main
. As stated above, this is a requirement for a well behaved function: be able to get the original value of lr
upon entering. So we make some room for it.
The first two instructions, lines 14 an 15, of our main function keep the value of lr
in that return
variable defined above. Then in line 17 we prepare the arguments for the call to puts
. We load the address of the greeting message into r0
register. This register will hold the first (the only one actually) parameter of puts
. Then in line 20 we call the function. Recall that bl
will set in lr
the address of the instruction following it (this is the instruction in line 23). This is the reason why we copied the value of lr
in a variable in the beginning of the main
function, because it was going to be overwritten by bl
.
Ok, puts
runs and the message is printed on the stdout
. Time to get the initial value of lr
so we can return successfully from main. Then we return.
Is our main
function well behaved? Yes, it keeps and gets back lr
to leave. It only modifies r0
and r1
. We can assume that puts
is well behaved as well, so everything should work fine. Plus the bonus of seeing how many bytes have been written to the output.
$ ./hello01 Hello world $ echo $? 12 |
Note that “Hello world” is just 11 bytes (the final zero is not counted as it just plays the role of a finishing byte) but the program returns 12. This is because puts
always adds a newline byte, which accounts for that extra byte.
Real interaction!
Now we have the power of calling functions we can glue them together. Let’s call printf and scanf to read a number and then print it back to the standard output.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
/* -- printf01.s */ .data /* First message */ .balign 4 message1: .asciz "Hey, type a number: " /* Second message */ .balign 4 message2: .asciz "I read the number %d\n" /* Format pattern for scanf */ .balign 4 scan_pattern : .asciz "%d" /* Where scanf will store the number read */ .balign 4 number_read: .word 0 .balign 4 return: .word 0 .text .global main main: ldr r1, address_of_return /* r1 ← &address_of_return */ str lr, [r1] /* *r1 ← lr */ ldr r0, address_of_message1 /* r0 ← &message1 */ bl printf /* call to printf */ ldr r0, address_of_scan_pattern /* r0 ← &scan_pattern */ ldr r1, address_of_number_read /* r1 ← &number_read */ bl scanf /* call to scanf */ ldr r0, address_of_message2 /* r0 ← &message2 */ ldr r1, address_of_number_read /* r1 ← &number_read */ ldr r1, [r1] /* r1 ← *r1 */ bl printf /* call to printf */ ldr r0, address_of_number_read /* r0 ← &number_read */ ldr r0, [r0] /* r0 ← *r0 */ ldr lr, address_of_return /* lr ← &address_of_return */ ldr lr, [lr] /* lr ← *lr */ bx lr /* return from main using lr */ address_of_message1 : .word message1 address_of_message2 : .word message2 address_of_scan_pattern : .word scan_pattern address_of_number_read : .word number_read address_of_return : .word return /* External */ .global printf .global scanf |
/* First message */ .balign 4 message1: .asciz "Hey, type a number: "
/* Second message */ .balign 4 message2: .asciz "I read the number %d\n"
/* Format pattern for scanf */ .balign 4 scan_pattern : .asciz "%d"
/* Where scanf will store the number read */ .balign 4 number_read: .word 0
.balign 4 return: .word 0
.text
.global main main: ldr r1, address_of_return /* r1 ← &address_of_return / str lr, [r1] / *r1 ← lr */
ldr r0, address_of_message1 /* r0 ← &message1 */
bl printf /* call to printf */
ldr r0, address_of_scan_pattern /* r0 ← &scan_pattern */
ldr r1, address_of_number_read /* r1 ← &number_read */
bl scanf /* call to scanf */
ldr r0, address_of_message2 /* r0 ← &message2 */
ldr r1, address_of_number_read /* r1 ← &number_read */
ldr r1, [r1] /* r1 ← *r1 */
bl printf /* call to printf */
ldr r0, address_of_number_read /* r0 ← &number_read */
ldr r0, [r0] /* r0 ← *r0 */
ldr lr, address_of_return /* lr ← &address_of_return */
ldr lr, [lr] /* lr ← *lr */
bx lr /* return from main using lr */
address_of_message1 : .word message1 address_of_message2 : .word message2 address_of_scan_pattern : .word scan_pattern address_of_number_read : .word number_read address_of_return : .word return
/* External */ .global printf .global scanf
In this example we will ask the user to type a number and then we will print it back. We also return the number in the error code, so we can check twice if everything goes as expected. For the error code check, make sure your number is lower than 255 (otherwise the error code will show only its lower 8 bits).
$ ./printf01 Hey, type a number: 123↴ I read the number 123 $ ./printf01 ; echo $? Hey, type a number: 124↴ I read the number 124 124
Our first function
Let’s define our first function. Lets extend the previous example but multiply the number by 5.
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
.balign 4 return2: .word 0 .text /* mult_by_5 function */ mult_by_5: ldr r1, address_of_return2 /* r1 ← &address_of_return */ str lr, [r1] /* *r1 ← lr */ add r0, r0, r0, LSL #2 /* r0 ← r0 + 4*r0 */ ldr lr, address_of_return2 /* lr ← &address_of_return */ ldr lr, [lr] /* lr ← *lr */ bx lr /* return from main using lr */ address_of_return2 : .word return2 |
.text
/* mult_by_5 function / mult_by_5: ldr r1, address_of_return2 / r1 ← &address_of_return / str lr, [r1] / *r1 ← lr */
add r0, r0, r0, LSL #2 /* r0 ← r0 + 4*r0 */
ldr lr, address_of_return2 /* lr ← &address_of_return */
ldr lr, [lr] /* lr ← *lr */
bx lr /* return from main using lr */
address_of_return2 : .word return2
This function will need another “return
” variable like the one main
uses. But this is for the sake of the example. Actually this function does not call another function. When this happens it does not need to keep lr
as no bl
or blx
instruction is going to modify it. If the function wanted to use lr
as the the r14
general purpose register, the process of keeping the value would still be mandatory.
As you can see, once the function has computed the value, it is enough keeping it in r0
. In this case it was pretty easy and a single instruction was enough.
The whole example follows.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
/* -- printf02.s */ .data /* First message */ .balign 4 message1: .asciz "Hey, type a number: " /* Second message */ .balign 4 message2: .asciz "%d times 5 is %d\n" /* Format pattern for scanf */ .balign 4 scan_pattern : .asciz "%d" /* Where scanf will store the number read */ .balign 4 number_read: .word 0 .balign 4 return: .word 0 .balign 4 return2: .word 0 .text /* mult_by_5 function */ mult_by_5: ldr r1, address_of_return2 /* r1 ← &address_of_return */ str lr, [r1] /* *r1 ← lr */ add r0, r0, r0, LSL #2 /* r0 ← r0 + 4*r0 */ ldr lr, address_of_return2 /* lr ← &address_of_return */ ldr lr, [lr] /* lr ← *lr */ bx lr /* return from main using lr */ address_of_return2 : .word return2 .global main main: ldr r1, address_of_return /* r1 ← &address_of_return */ str lr, [r1] /* *r1 ← lr */ ldr r0, address_of_message1 /* r0 ← &message1 */ bl printf /* call to printf */ ldr r0, address_of_scan_pattern /* r0 ← &scan_pattern */ ldr r1, address_of_number_read /* r1 ← &number_read */ bl scanf /* call to scanf */ ldr r0, address_of_number_read /* r0 ← &number_read */ ldr r0, [r0] /* r0 ← *r0 */ bl mult_by_5 mov r2, r0 /* r2 ← r0 */ ldr r1, address_of_number_read /* r1 ← &number_read */ ldr r1, [r1] /* r1 ← *r1 */ ldr r0, address_of_message2 /* r0 ← &message2 */ bl printf /* call to printf */ ldr lr, address_of_return /* lr ← &address_of_return */ ldr lr, [lr] /* lr ← *lr */ bx lr /* return from main using lr */ address_of_message1 : .word message1 address_of_message2 : .word message2 address_of_scan_pattern : .word scan_pattern address_of_number_read : .word number_read address_of_return : .word return /* External */ .global printf .global scanf |
/* First message */ .balign 4 message1: .asciz "Hey, type a number: "
/* Second message */ .balign 4 message2: .asciz "%d times 5 is %d\n"
/* Format pattern for scanf */ .balign 4 scan_pattern : .asciz "%d"
/* Where scanf will store the number read */ .balign 4 number_read: .word 0
.balign 4 return: .word 0
.balign 4 return2: .word 0
.text
/* mult_by_5 function / mult_by_5: ldr r1, address_of_return2 / r1 ← &address_of_return / str lr, [r1] / *r1 ← lr */
add r0, r0, r0, LSL #2 /* r0 ← r0 + 4*r0 */
ldr lr, address_of_return2 /* lr ← &address_of_return */
ldr lr, [lr] /* lr ← *lr */
bx lr /* return from main using lr */
address_of_return2 : .word return2
.global main main: ldr r1, address_of_return /* r1 ← &address_of_return / str lr, [r1] / *r1 ← lr */
ldr r0, address_of_message1 /* r0 ← &message1 */
bl printf /* call to printf */
ldr r0, address_of_scan_pattern /* r0 ← &scan_pattern */
ldr r1, address_of_number_read /* r1 ← &number_read */
bl scanf /* call to scanf */
ldr r0, address_of_number_read /* r0 ← &number_read */
ldr r0, [r0] /* r0 ← *r0 */
bl mult_by_5
mov r2, r0 /* r2 ← r0 */
ldr r1, address_of_number_read /* r1 ← &number_read */
ldr r1, [r1] /* r1 ← *r1 */
ldr r0, address_of_message2 /* r0 ← &message2 */
bl printf /* call to printf */
ldr lr, address_of_return /* lr ← &address_of_return */
ldr lr, [lr] /* lr ← *lr */
bx lr /* return from main using lr */
address_of_message1 : .word message1 address_of_message2 : .word message2 address_of_scan_pattern : .word scan_pattern address_of_number_read : .word number_read address_of_return : .word return
/* External */ .global printf .global scanf
I want you to notice lines 58 to 62. There we prepare the call to printf
which receives three parameters: the format and the two integers referenced in the format. We want the first integer be the number entered by the user. The second one will be that same number multiplied by 5. After the call to mult_by_5
, r0
contains the number entered by the user multiplied by 5. We want it to be the third parameter so we move it to r2
. Then we load the value of the number entered by the user into r1
. Finally we load in r0
the address to the format message of printf
. Note that here the order of preparing the arguments of a call is nonrelevant as long as the values are correct at the point of the call. We use the fact that we will have to overwrite r0
, so for convenience we first copy r0
to r2
.
$ ./printf02
Hey, type a number: 1234↴
1234 times 5 is 6170
That’s all for today.
ARM assembler in Raspberry Pi – Chapter 8 ARM assembler in Raspberry Pi – Chapter 10
Thanks for all the info.
Is it possible to make the same thing on assembly?
Thanks for the response
If the size of your array is statically defined (this is, it is known when you assemble) the code, you can use GNU assembler features which may be useful. For instance, the following is a simple case to compute the size of elements of an array of 4-byte integers.
.data
array: .word 0x1, 0x2, 0x3, 0x4
end_of_array :
.globl main
.text
main:
/* This is r1 ← 4 */
mov r1, #(end_of_array - array) / 4
...
This works because we substract end_of_array (the address past the last element of the array) to the array (the first element of the array). This gives us a value in bytes, so we divide it by 4 (each integer is 4 bytes). Note that this happens at assemble-time (or compilation time). So there is no real code emitted here: the assembler just computes a constant value and uses in-place of the whole expression. If it is not able to compute a constant value, this is an error.
I suggest you to read the GNU as manual, in special the section 5 about expressions.
If your array size is dynamic, then everyting is more complicated. Your array will be, in a straightforward approach, a pair of numbers: the address of the array itself and the number of elements. The “length” operation is just reading the latter. How the first would be used is beyond the scope of this tutorial as it may involve either upper bounded memory (your array may be up to N items) or dynamic memory (malloc, free, etc).
Kind regards,
Kind regards,
Hi Stellan,
yes, there is a technique but requires two things: a) putting a function inside its own “text.nnn” section, and b) telling the linker to remove unused sections. A very small example follows.
Compile like shown below.
-Wl
is used to tell gcc to pass-through a comma-separated list of flags (without further processing) to the linker. The linker flag--gc-sections
tells the linker to garbage collect unused sections. The linker flag--print-gc-sections
just reports the list of sections removed. Note that the C library has some sections that the linker considers unused, but note the last line.text.foo
which corresponds to our unusedfoo
function.Kind regards
.data
.balign 4
print_Statement:
.asciz “Variable Print: %d\n”
.balign 4
myvar:
.word 2
.balign 4
myarr:
.skip 4
.balign 4
addr_return:
.word 0
.text
.balign 4
.global main
main:
ldr r1, add
ldr r2,arr
ldr r1,[r1]
mov r3,#10
ldr r0,add_ret
str lr,[r0]
ldr r0,print_patt
mov r1,r3
bl printf
loop:
cmp r3,#8
beq endLoop
ldr r0,print_patt
mov r1,r3
bl printf
sub r3,r3,#1
b loop
endLoop:
ldr r0,add_ret
ldr lr,[r0]
bx lr
print_patt: .word print_Statement
add: .word myvar
print_patt: .word print_Statement
add: .word myvar
arr: .word myarr
add_ret: .word addr_return
.global printf
I haven’t checked in much detail but I think your problem is that
printf
is modifyingr3
. Recall that registersr0
tor3
can be freely modified by the callee, so if you’re keeping something important there, either back it up elsewhere or use another register. Recall that all other registers fromr4
tor13
must be preserved by the callee so you know that their contents after theprintf
call are the same they had before the call.Concerning whether
b
(and all its conditional versionsbeq
,bne
, …) modifylr
. No, they don’t. Onlybl
andblx
modifylr
.Kind regards,
So how do you explain this infinite loop here? If I am just removing the loop statement then I am getting the correct output of r3 so printf has nothing to do with this.