Skip to content

RISC-V Assembly Part 3: Arrays and Functions

Overview

This lecture connects two ideas that dominate real assembly programming: arrays and functions. We first show that C array indexing (arr[i]) and pointer arithmetic (*(arr + i)) are exactly the same operation, then translate that operation into RISC-V using the offset + base address pattern with lw and sw. We then formalize simple (leaf) functions: how arguments arrive in the a registers, how a result leaves in a0, the eight-argument limit, and the rule that a leaf function may freely use the a and t registers without touching the stack. These are the foundations for Lab 3 (sum_array, find_max) and Project 2.

Learning Objectives

  • Explain why arr[i] in C is exactly equivalent to *(arr + i) (pointer arithmetic).
  • Describe how C scales pointer arithmetic by the element size automatically.
  • Compute a byte offset (index * element_size) and add it to a base address to reach an array element in assembly.
  • Use lw to read an array element and sw to write an array element, and state the direction of data flow for each.
  • Translate a C array-access function into a correct RISC-V leaf function.
  • Identify a leaf function and state why it needs no stack frame.
  • Apply the RISC-V argument convention: arguments in a0a7 (eight or fewer), return value in a0.
  • Use a temporary register to hold an intermediate address and avoid clobbering arguments prematurely.

Prerequisites

  • RISC-V Assembly Parts 1 and 2: registers, instructions, immediates, control flow with branches and labels.
  • C basics: arrays, pointers, the & (address-of) and * (dereference) operators.
  • Memory model: byte-addressable memory, a 32-bit int (word) occupies 4 bytes, 64-bit registers.
  • Familiarity with lw/sw and li/mul/add from the RISC-V reference guide.

1. Roadmap and Logistics

Today's session opened with three items written at the top of the notes:

  • Arrays and Functions — the conceptual focus.
  • Lab 3 Q & A — Lab 3 (quadratic, min, sum_array, find_max) is due tonight.
  • Local VM setup — getting a RISC-V development environment running locally with QEMU and UTM so you are not dependent on the lab machines.

The technical heart of the lecture is two topics that build directly on the branches and loops from Part 2:

  1. Arrays — how to reach an element of an array in memory.
  2. Functions — the convention for passing arguments and returning a value.

Both topics feed directly into Lab 3's array problems and into Project 2, which adds real multi-function calling conventions on top of what we cover here.


2. Arrays and Pointers in C

The Two C Operators You Need

The notes call out two operators as hints for everything that follows:

arr      &              *
 |       |              |
base   address-of    dereference
  • &x — the address-of operator. It produces a pointer: the memory address where x lives.
  • *p — the dereference operator. Given a pointer p, it accesses the value stored at that address.

An array name like arr already is an address — specifically the address of element 0. That single fact is what makes array indexing and pointer arithmetic interchangeable.

Declaring an Array

int arr[5];

This reserves storage for five int values laid out contiguously in memory. Because an int is 4 bytes, the array occupies 20 bytes total, and the elements sit at consecutive addresses:

address:   arr+0   arr+4   arr+8   arr+12  arr+16
          +-------+-------+-------+-------+-------+
element:  |arr[0] |arr[1] |arr[2] |arr[3] |arr[4] |
          +-------+-------+-------+-------+-------+
index:        0       1       2       3       4

Notice the addresses jump by 4 each time, not by 1. The index counts elements; the address counts bytes. Converting between the two is the entire trick of array access in assembly.

Index Notation vs. Pointer Notation

The lecture put the two notations side by side. Every line on the left is exactly equivalent to the line on the right:

Array (index) notation Pointer notation Meaning
arr[0] = 99; *arr = 99; store 99 into element 0
arr[1] = 100; *(arr + 1) = 100; store 100 into element 1
x = arr[0]; x = *arr; load element 0 into x
y = arr[1]; y = *(arr + 1); load element 1 into y

The general identity, for any index i, is:

arr[i]  is exactly  *(arr + i)

This is not an approximation — the C standard defines arr[i] to mean *(arr + i). The two forms compile to the same machine code.

Pointer Arithmetic Scales by Element Size

Here is the part that trips everyone up the first time. When you write arr + 1 in C, you do not add 1 to the address. C scales the + 1 by the size of the element type. For int *, the element size is 4 bytes, so:

arr + 1   means   (address of arr) + 1 * sizeof(int)   =   address + 4
arr + 2   means   (address of arr) + 2 * sizeof(int)   =   address + 8
arr + i   means   (address of arr) + i * sizeof(int)   =   address + (i * 4)

C does this scaling silently because it knows the type of arr. Assembly does not know any types — to the processor, an address is just a number. That means we are responsible for the multiply. This is exactly the work the C compiler does for us, and it is what we now have to write by hand.

flowchart LR
    A["arr + i (C)"] --> B["compiler knows<br/>sizeof(int) = 4"]
    B --> C["address + (i * 4)"]
    C --> D["machine address"]
    style B fill:#f9f,stroke:#333,stroke-width:2px

Challenge from class: take the arr[i] version of a function, rewrite it using only pointer arithmetic (*(arr + i)), and then translate that pointer version to assembly. The pointer form maps almost one-to-one onto the instructions, so doing this rewrite first makes the assembly far easier.


3. Array Access in Assembly: Offset + Base

The Question

The lecture framed array access around one small function. Imagine main has a local array int arr[3] living on the stack, and we want to evaluate:

x = arr[i];

How does the processor find the right element when all it has is the base address of the array and the index i?

The Picture: Offset + arr

The handwritten diagram shows the three-element array on the stack and how an index maps to a memory location:

                STACK
              +---------+
   arr+8  ->  | arr[2]  |   index 2
              +---------+
   arr+4  ->  | arr[1]  |   index 1
              +---------+
   arr+0  ->  | arr[0]  |   index 0
              +---------+
                  ^
                  |
        target = offset + arr  (the base, held in a0)

The element we want lives at:

target address = base address (arr)  +  offset

where  offset = index * element_size = i * 4

That is the whole idea, highlighted in red in the notes: offset + arr. Compute the byte offset from the index, add it to the base address, and you have a pointer to the element. Then dereference it.

The C Function (Pointer Form)

Following the challenge, we write the access in pointer form so it maps cleanly to assembly:

// Return arr[i] using pointer arithmetic.
// a0 will hold arr (the base address), a1 will hold i.
int arr_get_c(int arr[], int i) {
    return *(arr + i);   // same as: return arr[i];
}

The Assembly Function

We are told how the arguments arrive — this register-to-parameter mapping is exactly what the calling convention guarantees:

# a0 - int arr[]   (base address of the array)
# a1 - int i       (the index)

Here is the translation. The comments restate what each register holds at each step:

.global arr_get_s
arr_get_s:
    li   t0, 4              # t0 = 4 (bytes per int = element size)
    mul  t1, a1, t0         # t1 = i * 4   (the byte offset)
    add  t2, a0, t1         # t2 = arr + offset   (address of arr[i])
    lw   a0, (t2)           # a0 = *t2 = arr[i]   (load the element)
    ret                     # return value is in a0

Step by step, mirroring the red annotations from the notes:

  1. li t0, 4 — put the element size (4 bytes per int) into a temporary.
  2. mul t1, a1, t0 — multiply the index i (a1) by 4 to get the byte offset. The note labels this t1 = t0(4) * a1(i).
  3. add t2, a0, t1 — add the base address arr (a0) to the offset. The note labels this t2 = a0(arr) + t1(offset). Now t2 is a pointer to arr[i].
  4. lw a0, (t2) — load the word at that address into a0. This is the dereference *(arr + i).
  5. ret — the result is already in a0, the return register.

Why lw a0, (t2) and not lw a0, t2

The parentheses matter. lw a0, (t2) means "treat the value in t2 as a memory address and load the word stored there." Without the parentheses you would be referring to the register's contents as data, which is not what a load does. The form is:

lw  rd, offset(rbase)     # rd = memory[rbase + offset]

When the offset is zero we write it as lw rd, (rbase), which is shorthand for lw rd, 0(rbase).

A Note on mul vs. Shift

Multiplying by 4 is the same as shifting left by 2 bits, because 4 = 2². Many assembly programmers prefer the shift because it avoids the mul instruction:

slli t1, a1, 2          # t1 = i << 2 = i * 4   (offset)
add  t2, a0, t1         # t2 = arr + offset
lw   a0, (t2)           # a0 = arr[i]
ret

Both are correct. The lecture's emphasis was on writing clear, correct code over clever optimization — pick whichever you understand best and use it consistently.

flowchart TD
    A["i (in a1)"] --> B["offset = i * 4"]
    C["arr base (in a0)"] --> D["addr = arr + offset"]
    B --> D
    D --> E["lw: value = memory at addr"]
    E --> F["result in a0"]

4. Loading vs. Storing: The Direction of Data Flow

Reading an element uses lw; writing an element uses sw. The lecture stressed getting the direction right, because the two instructions look almost identical but move data opposite ways.

Instruction Form Meaning Direction
lw lw rd, (raddr) rd = memory[raddr] memory → register
sw sw rsrc, (raddr) memory[raddr] = rsrc register → memory

The trap is that in both instructions the register operand is written first, even though for sw that register is the source, not the destination. Read it as:

  • lw a0, (t2) — "load into a0 from the address in t2."
  • sw a0, (t2) — "store a0 into the address in t2."

Writing an Array Element

To implement arr[i] = x, compute the same address, then store instead of load:

void arr_set_c(int arr[], int i, int x) {
    *(arr + i) = x;     // same as: arr[i] = x;
}
# a0 - int arr[]   (base address)
# a1 - int i       (index)
# a2 - int x       (value to store)
.global arr_set_s
arr_set_s:
    li   t0, 4             # element size
    mul  t1, a1, t0        # t1 = i * 4 (offset)
    add  t2, a0, t1        # t2 = arr + offset (address of arr[i])
    sw   a2, (t2)          # memory[t2] = x   (store the value)
    ret

Notice the only differences from arr_get_s: the value to store comes in a third argument a2, and the final instruction is sw a2, (t2) instead of lw a0, (t2). There is no return value, so we do not touch a0.


5. Walking an Array in a Loop

A single element access is the building block; most array work is a loop over all elements. Lab 3's sum_array is the canonical example. Here is the C:

int sum_array(int arr[], int n) {
    int sum = 0;
    for (int i = 0; i < n; i++) {
        sum = sum + arr[i];
    }
    return sum;
}

There are two common ways to translate the body. Both are correct; the second is slightly cleaner because it advances a pointer instead of recomputing the address each iteration.

Version A: Index-Based (recompute address each iteration)

# a0 - int arr[]   (base address)
# a1 - int n       (number of elements)
.global sum_array_s
sum_array_s:
    li   t0, 0             # t0 = sum = 0
    li   t1, 0             # t1 = i = 0
loop:
    bge  t1, a1, done      # if i >= n, exit loop
    slli t2, t1, 2         # t2 = i * 4 (byte offset)
    add  t2, a0, t2        # t2 = arr + offset (address of arr[i])
    lw   t3, (t2)          # t3 = arr[i]
    add  t0, t0, t3        # sum = sum + arr[i]
    addi t1, t1, 1         # i = i + 1
    j    loop
done:
    mv   a0, t0            # return value = sum
    ret

Version B: Pointer-Walking (advance the pointer by 4 each iteration)

Instead of multiplying every time, keep a running pointer and bump it by the element size:

# a0 - int arr[]   (base address, used as a moving pointer)
# a1 - int n       (number of elements)
.global sum_array_s
sum_array_s:
    li   t0, 0             # t0 = sum = 0
    li   t1, 0             # t1 = i = 0
loop:
    bge  t1, a1, done      # if i >= n, exit loop
    lw   t2, (a0)          # t2 = *arr  (current element)
    add  t0, t0, t2        # sum = sum + current element
    addi a0, a0, 4         # arr = arr + 1  (advance one int = 4 bytes)
    addi t1, t1, 1         # i = i + 1
    j    loop
done:
    mv   a0, t0            # return value = sum
    ret

Version B is the assembly form of *(arr++) and shows why pointer arithmetic is so natural here: advancing to the next int is just addi a0, a0, 4. The note from Part 2 captured this directly: "To get to next element in array of words (ints): addi a0, a0, 4."

flowchart TD
    Start["sum = 0, i = 0"] --> Check{"i < n?"}
    Check -- "no" --> Done["return sum in a0"]
    Check -- "yes" --> Load["load arr[i]"]
    Load --> Add["sum += arr[i]"]
    Add --> Bump["i++ and advance pointer"]
    Bump --> Check

6. Simple (Leaf) Functions

The second half of the lecture introduced simple functions, the kind you have been writing in Lab 3.

The Argument and Return Convention

The notes record the calling convention plainly:

arguments: a0, a1, a2, ...   (8 args or less)
return value: a0
  • The first argument is in a0, the second in a1, and so on up through a7.
  • A function may take eight arguments or fewer this way. (Beyond eight, additional arguments spill to the stack — out of scope for a simple function.)
  • The single return value comes back in a0.

Because the return value also lands in a0, a function will often overwrite its own first argument. In arr_get_s above, a0 started as the array base and ended as the loaded element. That is fine inside the function, but it means the caller cannot assume a0 survives a call — if the caller still needs the original a0, it must save it first. (That caller-side saving is the subject of Project 2's full calling convention.)

What Is a Leaf Function?

The notes define a simple function with a labeled, braced block and three rules pointing at it:

func_s:
    |
    |   use only "a" or "t" registers
    |
    |   no calls to other functions
    |
    ret                       <- "leaf function"

A leaf function is a function that does not call any other function. Picture the call graph as a tree: a function that calls others has children; a function that calls nobody is a leaf of that tree.

flowchart TD
    main["main"] --> f["foo (non-leaf)"]
    f --> g["arr_get_s (leaf)"]
    f --> h["min_s (leaf)"]
    style g fill:#9f9,stroke:#333
    style h fill:#9f9,stroke:#333

The green nodes are leaves: they sit at the bottom of the call tree and make no further calls.

The Two Rules for a Leaf Function

  1. Use only a and t registers. These are the caller-saved (also called temporary or "volatile") registers — a0a7 and t0t6. A function is allowed to clobber them freely. Because a leaf function uses only these, it never disturbs any value the caller is responsible for keeping.

  2. No calls to other functions. This is what makes it a leaf. And it is why a leaf function is so simple: the only thing that would overwrite the return-address register ra is a call, and a leaf makes none — so ra is never disturbed, and ret works without any saving.

Why a Leaf Needs No Stack Frame

Putting the two rules together yields the payoff: a leaf function does not need to allocate any stack space.

  • It never calls another function, so ra (the return address) is never overwritten — no need to save and restore it.
  • It uses only caller-saved (a/t) registers, so there is nothing the callee is obligated to preserve.

The result is the clean shape you have seen in every Lab 3 program: a label, some computation in a/t registers, and ret. No addi sp, sp, -N prologue, no ld ra epilogue. Every Lab 3 function — quadratic, min, sum_array, find_max — is a leaf function.

# The leaf-function shape:
.global func_s
func_s:
    # ... compute using only a* and t* registers ...
    # ... no 'call' instructions ...
    ret                 # ra was never touched, so this just works

Looking ahead: Project 2 introduces non-leaf functions (e.g. findmaxfc calls max2_s, sort calls swap_s). A non-leaf function must allocate a stack frame and save ra before it calls, because call overwrites ra. The contrast with leaf functions is exactly why we draw the line so carefully here.


7. Debugging Array and Function Code with GDB

The lecture demonstrated GDB for stepping through array access, since a wrong offset or a load/store mix-up is invisible from the program's printed output alone. A minimal workflow:

# Build with debug info, then run under the RISC-V GDB.
gdb ./find_max

Inside GDB, the most useful commands for this material:

break find_max_s        # stop at the start of our function
run 1 2 99 3 4          # run with command-line args
info registers a0 a1    # inspect the argument registers
stepi                   # execute one machine instruction
x/4xw $a0               # examine 4 words, in hex, at the address in a0

The x (examine) command is the key tool for arrays. x/4xw $a0 reads "examine 4 values, format hex, size word, starting at the address held in a0" — letting you see the array contents directly in memory even when they never get printed. Use it after computing an offset to confirm t2 really points at the element you intended.


Key Concepts

Concept Definition Example
Array Contiguous block of same-type elements in memory int arr[5]; → 20 bytes
Base address Address of element 0; what an array name evaluates to arr in a0
Index vs. byte offset Index counts elements; offset counts bytes arr[3] → offset 3 * 4 = 12
arr[i]*(arr + i) Indexing is defined as scaled pointer arithmetic arr[2] = *(arr + 2)
Pointer scaling C multiplies + i by sizeof(element) automatically int *: +1 adds 4 bytes
lw Load word: memory → register lw a0, (t2) reads arr[i]
sw Store word: register → memory sw a2, (t2) writes arr[i]
Offset + base The address formula for element access addr = arr + i*4
Argument registers First 8 args in a0a7; return in a0 a0 = arr, a1 = i
Leaf function Function that calls no other function arr_get_s, min_s
Caller-saved registers a0a7, t0t6, ra; free to clobber in a leaf use t0t3 freely

Practice Problems

Problem 1: Index to Offset and Address

An int array's base address (in a0) is 0x1000. What byte offset and what absolute address correspond to arr[4]? Write the two RISC-V instructions that compute the address of arr[4] into t2.

Click to reveal solution Each `int` is 4 bytes, so the offset for index 4 is `4 * 4 = 16` bytes, and the address is `0x1000 + 16 = 0x1010`.
slli t2, t1, 2      # t2 = i * 4 = 16   (assuming t1 = 4)
add  t2, a0, t2     # t2 = 0x1000 + 16 = 0x1010
Or with an explicit multiply:
li   t0, 4
mul  t2, t1, t0     # t2 = 4 * 4 = 16
add  t2, a0, t2     # t2 = 0x1010

Problem 2: Index Form ⇒ Pointer Form

Rewrite each statement using only pointer notation (* and pointer arithmetic), with no [].

arr[0] = 7;
x = arr[3];
arr[i] = arr[j];
Click to reveal solution
*arr = 7;                 // arr[0]  == *(arr + 0) == *arr
x = *(arr + 3);           // arr[3]  == *(arr + 3)
*(arr + i) = *(arr + j);  // arr[i] = arr[j]
Each `arr[k]` is literally defined as `*(arr + k)`. Doing this rewrite first makes the assembly translation almost mechanical.

Problem 3: Spot the Load/Store Bug

A student wants to store the value in a2 into arr[i], but the autograder reports the array is unchanged. Here is their code. Find and fix the bug.

li   t0, 4
mul  t1, a1, t0
add  t2, a0, t1
lw   a2, (t2)       # store x into arr[i]
ret
Click to reveal solution The instruction is `lw` (load), which copies memory **into** `a2` — the wrong direction. To store, use `sw`, which copies the register **into** memory:
li   t0, 4
mul  t1, a1, t0
add  t2, a0, t1
sw   a2, (t2)       # memory[t2] = a2   (store x into arr[i])
ret
Remember the direction: `lw` is memory → register, `sw` is register → memory. The register is always written first in the syntax, even when it is the source.

Problem 4: Translate get_second

Translate this leaf function to RISC-V assembly. It returns the element at index 1 of the array.

int get_second(int arr[]) {
    return arr[1];      // == *(arr + 1)
}
Click to reveal solution Index 1 has a byte offset of `1 * 4 = 4`, so we can fold the offset directly into the `lw` instruction:
# a0 - int arr[]
.global get_second_s
get_second_s:
    lw   a0, 4(a0)      # a0 = memory[a0 + 4] = arr[1]
    ret
The form `lw a0, 4(a0)` uses the built-in immediate offset of the load instruction, so no separate `add` is needed. A longer but equally correct version:
get_second_s:
    addi t0, a0, 4      # t0 = arr + 4  (address of arr[1])
    lw   a0, (t0)       # a0 = arr[1]
    ret

Problem 5: Is It a Leaf?

For each function, state whether it is a leaf function and whether it needs to allocate a stack frame.

  • (a) min_s — compares a0 and a1 with blt and returns the smaller in a0.
  • (b) findmaxfc_s — loops over an array and calls max2_s to compare each pair.
  • (c) sum_array_s — loops over an array accumulating a sum, no calls.
Click to reveal solution - **(a) `min_s`** — Leaf (no `call`). No stack frame needed; uses only `a`/`t` registers and `ret` works because `ra` is untouched. - **(b) `findmaxfc_s`** — **Not** a leaf: it calls `max2_s`. It **must** allocate a stack frame and save `ra` before the call, because `call` overwrites `ra`. (This is Project 2 territory.) - **(c) `sum_array_s`** — Leaf (no `call`). No stack frame needed. The deciding question is always: *does this function execute a `call` instruction?* If yes, it is non-leaf and must preserve `ra`.

Problem 6: Trace arr_get_s

Trace arr_get_s for arr = {10, 20, 30} at base address 0x2000 with i = 2. Show the value of each register after each instruction, and the final return value.

Click to reveal solution Start: `a0 = 0x2000` (base), `a1 = 2` (index). | Instruction | Result | |-------------|--------| | `li t0, 4` | `t0 = 4` | | `mul t1, a1, t0` | `t1 = 2 * 4 = 8` (offset) | | `add t2, a0, t1` | `t2 = 0x2000 + 8 = 0x2008` (address of `arr[2]`) | | `lw a0, (t2)` | `a0 = memory[0x2008] = 30` | | `ret` | returns `a0 = 30` | The function returns **30**, which is `arr[2]`. Note that `a0` was overwritten: it started as the base address `0x2000` and ended as the loaded value `30`. The caller must not rely on `a0` still holding the base after the call.

Further Reading


Summary

  1. arr[i] is defined as *(arr + i) — array indexing and pointer arithmetic are the same operation, so rewriting in pointer form first makes assembly translation straightforward.

  2. C scales pointer arithmetic by element size automatically; assembly does not, so we compute the byte offset i * 4 ourselves for an int array.

  3. Array access is offset + base: address = arr + (i * 4). Compute the offset, add the base, then dereference.

  4. lw loads (memory → register) and sw stores (register → memory) — the register is written first in both, even when it is the source, so always check the direction.

  5. Arguments arrive in a0a7 (eight or fewer) and the return value leaves in a0; because the return reuses a0, callers cannot assume a0 survives a call.

  6. A leaf function calls no other function, uses only caller-saved a/t registers, and therefore needs no stack frame — just compute and ret.

  7. Use a temporary register for the computed address (e.g. t2) so you can dereference it cleanly, and favor clear, correct code over premature optimization.

  8. GDB's stepi and x/4xw $a0 let you watch offsets and loads directly in memory, which is the fastest way to catch a wrong offset or a load/store mix-up.