← Back to Course
# RISC-V Assembly Part 3: Arrays and Functions ## CS 315 Computer Architecture --- ## Today's Topics - **Arrays in C** — index notation vs. pointer arithmetic - **Pointer scaling** — why assembly must do the multiply - **Array access in assembly** — the offset + base pattern - **`lw` vs. `sw`** — loading and storing array elements - **Loop over an array** — `sum_array` in two styles - **Leaf functions** — arguments, return values, no stack frame
Builds directly into
Lab 3
(
sum_array
,
find_max
) and
Project 2
.
--- ## C Operators: Address-of and Dereference Two operators you need to know: | Operator | Syntax | Meaning | |----------|--------|---------| | Address-of | `&x` | produces a pointer to `x` | | Dereference | `*p` | accesses the value at address `p` | - An array name `arr` **is already an address** — the address of element 0. - That single fact makes index notation and pointer arithmetic equivalent. --- ## Array Layout in Memory ```c int arr[5]; ``` Five `int` values, each 4 bytes, laid out **contiguously**: ```text address: arr+0 arr+4 arr+8 arr+12 arr+16 +-------+-------+-------+-------+-------+ element: |arr[0] |arr[1] |arr[2] |arr[3] |arr[4] | +-------+-------+-------+-------+-------+ index: 0 1 2 3 4 ```
Addresses jump by
4
each step (bytes), not by 1 (elements).
--- ## Index Notation = Pointer Notation The C standard **defines** `arr[i]` as `*(arr + i)`: | Index form | Pointer form | Meaning | |------------|--------------|---------| | `arr[0] = 99;` | `*arr = 99;` | store 99 at element 0 | | `arr[1] = 100;` | `*(arr + 1) = 100;` | store 100 at element 1 | | `x = arr[0];` | `x = *arr;` | load element 0 | | `y = arr[1];` | `y = *(arr + 1);` | load element 1 | **Tip:** rewrite in pointer form first — then assembly is almost mechanical. --- ## Pointer Arithmetic Scales by Element Size When you write `arr + i` in C, the compiler multiplies by `sizeof(int)`: ```text arr + 1 → address + 1 * 4 = address + 4 arr + 2 → address + 2 * 4 = address + 8 arr + i → address + i * 4 ```
C scales silently because it knows the type.
Assembly knows no types — we must do the multiply ourselves.
--- ## Scaling: C vs. Assembly
flowchart LR A["arr + i (C)"] --> B["compiler knows sizeof(int) = 4"] B --> C["address + i * 4"] C --> D["machine address"] style B fill:#fff3cd,stroke:#FDBB30,stroke-width:2px
In assembly: 1. Compute `offset = i * 4` 2. Compute `addr = arr + offset` 3. Dereference `addr` --- ## The Offset + Base Formula For any array element `arr[i]`: ```text target address = base address (arr) + offset offset = index * element_size = i * 4 (for int) ``` Stack picture for `int arr[3]`: ```text STACK +---------+ arr+8 → | arr[2] | index 2 +---------+ arr+4 → | arr[1] | index 1 +---------+ arr+0 → | arr[0] | index 0 +---------+ ``` --- ## `arr_get_s`: Load an Array Element C pointer form → assembly, step by step: ```text # a0 = arr (base address) a1 = i (index) ``` ```asm .global arr_get_s arr_get_s: li t0, 4 # t0 = 4 (bytes per int) mul t1, a1, t0 # t1 = i * 4 (byte offset) add t2, a0, t1 # t2 = arr + offset (address of arr[i]) lw a0, (t2) # a0 = *t2 = arr[i] ret ```
lw a0, (t2)
means: treat t2 as an address, load the word stored there.
--- ## Address Computation Flow
flowchart TD A["i (in a1)"] --> B["offset = i * 4"] C["arr base (in a0)"] --> D["addr = arr + offset"] B --> D D --> E["lw: value = memory at addr"] E --> F["result in a0"]
--- ## `lw` Syntax: Offset Form The full form of a load: ```text lw rd, offset(rbase) # rd = memory[rbase + offset] ``` - `lw a0, (t2)` is shorthand for `lw a0, 0(t2)` - When the offset is a known constant, fold it in directly: ```asm lw a0, 4(a0) # load arr[1] — offset 1*4=4 baked in ``` No separate `add` needed for fixed indices! --- ## `mul` vs. Shift Left Multiplying by 4 = shifting left by 2 bits (since 4 = 2²): ```asm # Using mul: li t0, 4 mul t1, a1, t0 # t1 = i * 4 # Using shift (slightly shorter): slli t1, a1, 2 # t1 = i << 2 = i * 4 ``` Both are correct. Prioritize **clear, correct code** over premature optimization. --- ## Loading vs. Storing: Direction Matters | Instruction | Form | Meaning | Direction | |-------------|------|---------|-----------| | `lw` | `lw rd, (raddr)` | `rd = memory[raddr]` | memory → register | | `sw` | `sw rsrc, (raddr)` | `memory[raddr] = rsrc` | register → memory |
Both write the
register operand first
in the syntax — even
sw
, where the register is the
source
. Always check the direction!
--- ## `arr_set_s`: Store an Array Element Write `arr[i] = x` (argument `x` arrives in `a2`): ```asm # a0 = arr (base address) # a1 = i (index) # a2 = x (value to store) .global arr_set_s arr_set_s: li t0, 4 # element size mul t1, a1, t0 # t1 = i * 4 (offset) add t2, a0, t1 # t2 = arr + offset sw a2, (t2) # memory[t2] = x ret ``` Only two differences from `arr_get_s`: `a2` carries the value, and `sw` replaces `lw`. --- ## Loop Over an Array: `sum_array` ```c int sum_array(int arr[], int n) { int sum = 0; for (int i = 0; i < n; i++) sum += arr[i]; return sum; } ``` Two translation styles: - **Version A** — index-based: recompute address each iteration - **Version B** — pointer-walking: advance pointer by 4 --- ## Version A: Index-Based Loop ```asm # a0 = arr, a1 = n sum_array_s: li t0, 0 # sum = 0 li t1, 0 # i = 0 loop: bge t1, a1, done # if i >= n, exit slli t2, t1, 2 # t2 = i * 4 (offset) add t2, a0, t2 # t2 = arr + offset lw t3, (t2) # t3 = arr[i] add t0, t0, t3 # sum += arr[i] addi t1, t1, 1 # i++ j loop done: mv a0, t0 # return sum ret ``` --- ## Version B: Pointer-Walking Loop Advance the base pointer by 4 each iteration instead of recomputing: ```asm # a0 = arr (used as moving pointer), a1 = n sum_array_s: li t0, 0 # sum = 0 li t1, 0 # i = 0 loop: bge t1, a1, done # if i >= n, exit lw t2, (a0) # t2 = *arr (current element) add t0, t0, t2 # sum += current element addi a0, a0, 4 # advance pointer (arr++) addi t1, t1, 1 # i++ j loop done: mv a0, t0 ret ``` `addi a0, a0, 4` is the assembly equivalent of `arr++`. --- ## Loop Control Flow
flowchart TD Start["sum = 0, i = 0"] --> Check{"i < n?"} Check -- "no" --> Done["return sum in a0"] Check -- "yes" --> Load["load arr[i]"] Load --> Add["sum += arr[i]"] Add --> Bump["i++, advance pointer"] Bump --> Check
--- ## Calling Convention: Arguments and Return RISC-V simple function convention: ```text Arguments: a0, a1, a2, ..., a7 (8 or fewer) Return value: a0 ``` - First argument → `a0`, second → `a1`, up through `a7` - **Return value overwrites `a0`** — callers cannot assume `a0` survives a call - Beyond 8 arguments: spill to stack (out of scope for leaf functions)
In
arr_get_s
:
a0
starts as the array base and ends as the loaded element.
--- ## What Is a Leaf Function? A **leaf function** is a function that **calls no other function**.
flowchart TD main["main"] --> foo["foo (non-leaf)"] foo --> arr_get["arr_get_s (leaf)"] foo --> min_s["min_s (leaf)"] style arr_get fill:#e8f5e9,stroke:#00543c style min_s fill:#e8f5e9,stroke:#00543c
Leaf nodes sit at the bottom of the call tree and make no further calls. --- ## Two Rules for a Leaf Function 1. **Use only `a` and `t` registers** - `a0`–`a7` (argument/return) and `t0`–`t6` (temporaries) - These are *caller-saved* — free to clobber 2. **No `call` instructions** - `call` overwrites `ra` (the return-address register) - A leaf never calls, so `ra` is never disturbed ```asm .global func_s func_s: # ... use only a* and t* registers ... # ... no 'call' instructions ... ret # ra was never touched, works perfectly ``` --- ## Why a Leaf Needs No Stack Frame Putting both rules together: | Reason | Consequence | |--------|-------------| | Never calls another function | `ra` is never overwritten | | Uses only caller-saved registers | Nothing for callee to preserve | **Result:** no `addi sp, sp, -N` prologue, no save/restore of `ra`.
Every Lab 3 function —
quadratic
,
min
,
sum_array
,
find_max
— is a leaf function.
--- ## Leaf vs. Non-Leaf: The Dividing Line | Property | Leaf | Non-Leaf | |----------|------|----------| | Calls other functions? | No | Yes | | Must save `ra`? | No | Yes | | Needs stack frame? | No | Yes | | Registers used | `a`, `t` only | also `s` registers |
Project 2
introduces non-leaf functions (e.g.,
findmaxfc
calls
max2_s
). Non-leaf functions
must
allocate a stack frame and save
ra
.
--- ## Trace: `arr_get_s` with `i = 2` Array `{10, 20, 30}` at base `0x2000`, index `i = 2`: | Instruction | Result | |-------------|--------| | `li t0, 4` | `t0 = 4` | | `mul t1, a1, t0` | `t1 = 2 * 4 = 8` (offset) | | `add t2, a0, t1` | `t2 = 0x2000 + 8 = 0x2008` | | `lw a0, (t2)` | `a0 = memory[0x2008] = 30` | | `ret` | returns `30` | Note: `a0` changed from base address `0x2000` to value `30`. --- ## Common Bug: Wrong `lw`/`sw` Direction A student wants to *store* `a2` into `arr[i]` but the array is unchanged: ```asm li t0, 4 mul t1, a1, t0 add t2, a0, t1 lw a2, (t2) # BUG: this LOADS into a2, not stores! ret ``` Fix: ```asm sw a2, (t2) # memory[t2] = a2 (store x into arr[i]) ``` `lw` = memory → register. `sw` = register → memory. --- ## Debugging Array Code with GDB ```bash gdb ./find_max ``` Key GDB commands for array work: ```text break find_max_s # stop at function entry run 1 2 99 3 4 # run with arguments info registers a0 a1 # inspect argument registers stepi # execute one instruction at a time x/4xw $a0 # examine 4 words in hex at address in a0 ``` `x/4xw $a0` reads: **4** values, format he**x**, size **w**ord — lets you see the array in memory directly. --- ## Key Concepts Table | Concept | Key Point | |---------|-----------| | `arr[i]` ≡ `*(arr+i)` | Indexing is scaled pointer arithmetic | | Byte offset | `i * sizeof(element)` = `i * 4` for `int` | | Offset + base | `addr = arr + i*4` | | `lw` | memory → register | | `sw` | register → memory | | Argument registers | `a0`–`a7`; return in `a0` | | Leaf function | No `call`, no stack frame needed | --- ## Summary 1. **`arr[i]` is defined as `*(arr + i)`** — rewrite in pointer form first to simplify assembly translation. 2. **C scales pointer arithmetic automatically**; we must compute `i * 4` ourselves in assembly. 3. **Array access = offset + base**: `addr = arr + (i * 4)`, then dereference. 4. **`lw` loads (memory → register); `sw` stores (register → memory)** — the register is written first in both. 5. **Arguments in `a0`–`a7`, return value in `a0`** — `a0` is overwritten by the return value. 6. **A leaf function calls nothing**, uses only `a`/`t` registers, and needs **no stack frame**.