RISC-V Assembly Part 2: Arguments, Arrays, Conditionals, and Loops¶
Overview¶
This lecture builds on our first pass at RISC-V assembly by organizing the instruction set into three categories — data processing, control, and memory — and then using them to express the programming constructs we already know from C. We learn how arrays live in memory and how to reach individual elements with load instructions, how to pass arrays and scalars as function arguments, and how to translate if/else statements and for/while loops into the conditional branches and unconditional jumps that the processor actually executes. These patterns are the core of Lab 3 and Project 2.
Learning Objectives¶
- Classify RISC-V instructions into data processing, control, and memory categories
- Explain how an array is laid out in memory and how a base address plus an offset reaches an element
- Use the load word (
lw) instruction with theoffset(base)addressing syntax to read array elements - Pass scalar values and array base addresses as function arguments in the
aregisters and return a result ina0 - Translate a C
if/then/elsestatement into assembly using a conditional branch, a label, and an unconditional jump - Translate C
forandwhileloops into assembly using a loop label, a guard branch, and a back-edge jump - Choose the correct branch instruction (
beq,bne,blt,bge,ble,bge) for a given comparison - Trace assembly execution by hand and in a debugger to verify a computed result
Prerequisites¶
- RISC-V Assembly Part 1: registers (
x0–x31and ABI names), the fetch–decode–execute cycle, the program counter, and basic data-processing instructions - C fundamentals: arrays, pointers,
if/else,for/while, and function arguments and return values (Project 1) - Comfort reading three-operand instructions of the form
op dst, src1, src2 - Familiarity with the dev environment (RISC-V VM,
gcc,as,gdb)
1. The Three Categories of Instructions¶
Every RISC-V instruction you will use in this course falls into one of three families. Keeping these categories in mind makes a long instruction list feel small, because each new mnemonic is just a variation on a category you already understand.
| Category | Purpose | Examples |
|---|---|---|
| Data processing | Compute a new value from register values | add, addi, sub, mul, div, slli, and, or |
| Control | Change which instruction runs next | j, beq, bne, blt, bge, ble, ret |
| Memory | Move data between registers and memory | lw, sw, ld, sd, lb, sb |
The processor can only compute on values that live in registers. Data, however, lives in memory. So a typical computation has a recurring rhythm:
flowchart LR
A[Memory] -->|load| B[Registers]
B -->|data processing| B
B -->|store| A
style B fill:#f9f,stroke:#333,stroke-width:2px
- Load values from memory into registers (memory instruction).
- Compute a new value from those registers (data processing instruction).
- Store the result back to memory if needed (memory instruction).
- Control instructions decide whether to repeat, branch, or move on.
Today we focus on the memory category (specifically array access) and the control category (conditionals and loops), tying everything together with the arguments and return value conventions that let one piece of code call another.
2. Arrays in Memory¶
What an Array Is¶
In C, an array is a contiguous block of memory holding elements of the same type. Consider:
This reserves room for three int values. On our 64-bit RISC-V machine an int is 4 bytes (a word), so arr occupies 12 bytes laid out one element after another. Memory is byte-addressable: every byte has its own address, and a 4-byte word spans four consecutive addresses.
The name arr evaluates to the address of the first element — its base address. That is the key idea that connects C and assembly: an array argument is really just a pointer (an address) to the start of the data.
Processor Memory
+--------------------+ +-----------+
| Registers | | ... |
| +--------------+ | arr + 8 -> | 3 | arr[2]
| | a0 (base) |--+-------+ +-----------+
| | a1 | | | arr + 4 -> | 2 | arr[1]
| | a2 | | | +-----------+
| +--------------+ | +-> arr -> | 1 | arr[0]
+--------------------+ +-----------+
The instructor's diagram showed exactly this: the base address arr is held in a register (a0), and following that address into memory reaches the first element, 1. Adding 4 reaches 2, adding 8 reaches 3.
Element Addresses¶
To find the address of element i in an int array, multiply the index by the element size and add it to the base:
| Element | Index i |
Byte offset (i * 4) |
Address |
|---|---|---|---|
arr[0] |
0 | 0 | base + 0 |
arr[1] |
1 | 4 | base + 4 |
arr[2] |
2 | 8 | base + 8 |
arr[3] |
3 | 12 | base + 12 |
Because int is 4 bytes, consecutive elements are 4 bytes apart. For an array of long or pointers (8 bytes each) the stride would be 8.
3. Memory Instructions: Load and Store¶
Memory instructions are the only way to move data between registers and memory. They come in matched pairs: load (memory → register) and store (register → memory).
Load Word¶
The instruction taught in class is lw, load word, which reads a 32-bit value from memory into a register:
lw t0, (a0) # t0 = *a0 -- load the word at address a0 into t0
^ ^
| +-- addr: register holding the memory address
+------- dest: register that receives the loaded value
Reading the operands aloud: "load the word (32-bit value) found at the address in a0 into the destination register t0." In C this is exactly a pointer dereference, t0 = *a0;.
The parentheses mean "use the value in this register as a memory address." This is register-indirect addressing — a0 is not the data, it is a pointer to the data.
The offset(base) Addressing Syntax¶
You can add a constant offset to the base address directly in the instruction:
lw t0, 0(a0) # t0 = word at address a0 + 0 -> arr[0]
lw t1, 4(a0) # t1 = word at address a0 + 4 -> arr[1]
lw t2, 8(a0) # t2 = word at address a0 + 8 -> arr[2]
The effective address is computed as base + offset. Writing (a0) is shorthand for 0(a0). This makes reading the first few fixed elements of an array very clean.
Load vs Store Operand Order¶
Stores write a register value back to memory. The store-word instruction is sw. Watch the operand order — it is a frequent source of bugs:
# Load: the DESTINATION register comes first
lw t0, (a0) # t0 = memory[a0] (t0 receives)
# Store: the SOURCE register comes first
sw t0, (a0) # memory[a0] = t0 (t0 provides)
Loads follow the usual "destination first" convention (like add rd, rs1, rs2). Stores put the value being written first and the address second.
Sizes¶
Because the machine is 64-bit, there are wider memory instructions too. lw/sw move 32-bit words (a C int); ld/sd move 64-bit doublewords (a C long or a pointer). For Lab 3 the arrays are int arrays, so lw is the instruction you need.
| Instruction | Bytes | C type | Operation |
|---|---|---|---|
lw rd, off(rs) |
4 | int |
load 32-bit word |
sw rs2, off(rs1) |
4 | int |
store 32-bit word |
ld rd, off(rs) |
8 | long, pointer |
load 64-bit doubleword |
sd rs2, off(rs1) |
8 | long, pointer |
store 64-bit doubleword |
4. Passing Arrays and Scalars as Arguments¶
The Argument and Return Convention¶
RISC-V uses registers to pass arguments and return results:
- Arguments go into
a0,a1,a2, ...,a7, in order. - The return value comes back in
a0.
For Lab 3 we restrict ourselves to the a registers (arguments and return) and the t registers (temporaries) — no stack management is required because these functions do not call other functions. (Saving registers across calls is the topic of the next session.)
A scalar argument is just placed in a register. An array argument is passed as its base address — a single pointer — not by copying the whole array.
// C: arr decays to a pointer to its first element
int sum3(int *arr) {
return arr[0] + arr[1] + arr[2];
}
.global sum3
# int sum3(int *arr)
# a0 = arr (base address of the array)
sum3:
lw t0, 0(a0) # t0 = arr[0]
lw t1, 4(a0) # t1 = arr[1]
lw t2, 8(a0) # t2 = arr[2]
add t0, t0, t1 # t0 = arr[0] + arr[1]
add a0, t0, t2 # a0 = t0 + arr[2] (return value in a0)
ret
Two things to notice:
- We loaded each element with a fixed offset because the indices (0, 1, 2) are known at write time.
- The final result is placed in
a0beforeret, becausea0is the return register.
Two Ways to Add Three Numbers¶
In class we contrasted two versions of the same "add three numbers" task to highlight the difference between scalar and memory arguments.
Version A — three scalar arguments. Each value arrives in its own register, so no memory access is needed:
.global add3
# int add3(int a, int b, int c)
# a0 = a, a1 = b, a2 = c
add3:
add a0, a0, a1 # a0 = a + b
add a0, a0, a2 # a0 = (a + b) + c
ret
Version B — one array argument. Only the base address arrives; we must lw each element out of memory (the sum3 example above). Same result, but now the memory category does the heavy lifting. The point: how data is passed determines which instructions you reach for.
Indexed Array Access (Variable Index)¶
When the index is not a constant — for example inside a loop — you compute the byte offset at run time. Assume a0 holds the base address and t0 holds the index i:
# t2 = arr[i] where a0 = base, t0 = i
li t1, 4 # element size = 4 bytes
mul t1, t0, t1 # t1 = i * 4
add t1, a0, t1 # t1 = base + i*4 = &arr[i]
lw t2, (t1) # t2 = arr[i]
A faster, idiomatic variant replaces the multiply with a shift left, since multiplying by 4 is the same as shifting left by 2 (4 == 2^2):
# t2 = arr[i] using a shift instead of a multiply
slli t1, t0, 2 # t1 = i * 4 (shift left logical by 2)
add t1, a0, t1 # t1 = &arr[i]
lw t2, (t1) # t2 = arr[i]
slli shift |
Multiplies by | Element size |
|---|---|---|
slli x, y, 1 |
2 | 2-byte (short) |
slli x, y, 2 |
4 | 4-byte (int) |
slli x, y, 3 |
8 | 8-byte (long, pointer) |
5. Control Statements: Branches and Jumps¶
Two Kinds of Control Transfer¶
C control structures are built from just two assembly primitives:
- Unconditional jump (
j label): always transfers control tolabel. - Conditional branch (
b__ rs1, rs2, label): transfers control tolabelonly if a comparison ofrs1andrs2holds; otherwise execution falls through to the next instruction.
A label names a location in the code so that a jump or branch has somewhere to go.
main:
add t0, zero, zero # t0 = 0
j next # unconditional jump -- skip the two adds
add ... # (skipped)
add ... # (skipped)
next:
addi t0, zero, 1 # execution resumes here
The red arrow in the instructor's notes traced the jump from j next down to the next: label, skipping the instructions in between. That is the whole idea of a jump: it rewrites the program counter so the next instruction fetched is the one at the label, not the one physically below.
The Branch Instructions¶
A conditional branch compares two registers and branches if the condition is true. The mnemonic encodes the comparison:
| Instruction | Branch taken when | Comparison |
|---|---|---|
beq rs1, rs2, label |
rs1 == rs2 |
equal |
bne rs1, rs2, label |
rs1 != rs2 |
not equal |
blt rs1, rs2, label |
rs1 < rs2 |
less than (signed) |
bge rs1, rs2, label |
rs1 >= rs2 |
greater or equal (signed) |
ble rs1, rs2, label |
rs1 <= rs2 |
less than or equal (signed) |
bgt rs1, rs2, label |
rs1 > rs2 |
greater than (signed) |
In class we read ble aloud as "branch on less than or equal": ble a0, a1, else means "if a0 <= a1, go to else."
Signed comparisons and negative numbers
blt, bge, ble, and bgt interpret their operands as signed two's-complement integers, so they handle negative values correctly. There are unsigned variants (bltu, bgeu) for when bit patterns should be compared as unsigned magnitudes. For Lab 3's signed int data, the signed branches are what you want.
6. Translating If/Then/Else¶
The Pattern¶
The instructor worked this example on the board. In C:
The assembly assumes a0 holds val and t1 holds r:
# a0 - int val
# t1 - int r
ble a0, zero, else # if val <= 0, go to the else branch
addi t1, zero, 1 # then-branch: r = 1
j done # skip over the else-branch
else:
addi t1, zero, 0 # else-branch: r = 0
done:
The Key Trick: Branch on the Opposite Condition¶
The C code says "if val > 0, do the then-block." The assembly branches on the opposite condition: ble a0, zero, else jumps away to the else-block when val <= 0. When the condition is true we simply fall through into the then-block.
This inversion is the heart of compiling if. We arrange the code so that:
- The branch skips the then-block when the condition is false.
- The then-block ends with an unconditional
j doneso it does not fall into the else-block. - The else-block sits between the
else:anddone:labels.
| C condition (do then-block when true) | Branch to else when false |
|---|---|
val > 0 |
ble a0, zero, else |
val >= 0 |
blt a0, zero, else |
a == b |
bne a0, a1, else |
a != b |
beq a0, a1, else |
a < b |
bge a0, a1, else |
a <= b |
bgt a0, a1, else |
Control Flow Diagram¶
flowchart TD
A["Evaluate condition (val > 0?)"] -->|"false: ble taken"| E["else block: r = 0"]
A -->|"true: fall through"| T["then block: r = 1"]
T --> J["j done"]
J --> D["done:"]
E --> D
style T fill:#9f9,stroke:#333
style E fill:#ff9,stroke:#333
A Second If Example¶
The notes also sketched a simple equality test:
With a0 holding x and t0 holding y:
There is no else here, so there is no j — when the body finishes it simply falls through to done:. The branch bne carries the inverted condition: we skip the body whenever x != 0.
7. Translating Loops¶
The Loop Pattern¶
A loop is just an if whose body ends by jumping back up to re-test the condition. The instructor built the canonical pattern from a guard branch plus a back-edge jump.
The C code, a function that sums 0 + 1 + ... + (n-1):
The assembly, with a0 holding n (the argument), t0 holding i, and t1 holding sum:
.global loopsum
# int loopsum(int n)
# a0 - int n
# t0 - int i
# t1 - int sum
loopsum:
li t0, 0 # i = 0
li t1, 0 # sum = 0
loop:
bge t0, a0, done # if i >= n, exit the loop
add t1, t1, t0 # sum = sum + i
addi t0, t0, 1 # i = i + 1
j loop # jump back to re-test the condition
done:
mv a0, t1 # move sum into a0 (return value)
ret
Anatomy of the Loop¶
initialize counters <- li t0, 0 / li t1, 0
loop:
GUARD: bge t0, a0, done <- exit when i >= n (inverted condition)
body: sum = sum + i <- the work
update: i = i + 1 <- advance the counter
back-edge: j loop <- repeat
done:
return sum
Mapping the C for clauses to assembly:
for clause |
C | Assembly |
|---|---|---|
| Initialization | i = 0 |
li t0, 0 (before loop:) |
| Condition | i < n |
bge t0, a0, done (inverted: exit when i >= n) |
| Body | sum = sum + i |
add t1, t1, t0 |
| Update | i++ |
addi t0, t0, 1 |
| Repeat | (implicit) | j loop |
Just as with if, the guard branches on the opposite of the loop condition: the C loop continues while i < n, so the assembly exits when i >= n (bge).
Loop Control Flow Diagram¶
flowchart TD
Init["i = 0, sum = 0"] --> Guard{"i < n?"}
Guard -->|"yes"| Body["sum = sum + i"]
Body --> Update["i = i + 1"]
Update --> Guard
Guard -->|"no"| Done["return sum"]
style Body fill:#9f9,stroke:#333
style Done fill:#9ff,stroke:#333
mv and li Are Pseudo-Instructions¶
The epilogue used mv a0, t1 to copy sum into the return register, and li to load constants. These are convenience pseudo-instructions the assembler expands into real instructions:
li t0, 0becomesaddi t0, zero, 0mv a0, t1becomesaddi a0, t1, 0(oradd a0, t1, zero)j loopbecomesjal zero, loop
They make code far more readable while still mapping to the base instruction set.
Verifying in the Debugger¶
In class we ran the loop under gdb, single-stepping to watch t0 (i) and t1 (sum) change each iteration. For loopsum(4) the trace is:
| Iteration | t0 (i) before guard |
Guard i >= 4? |
t1 (sum) after body |
|---|---|---|---|
| 1 | 0 | no | 0 + 0 = 0 |
| 2 | 1 | no | 0 + 1 = 1 |
| 3 | 2 | no | 1 + 2 = 3 |
| 4 | 3 | no | 3 + 3 = 6 |
| exit | 4 | yes → done | 6 (returned in a0) |
A handy gdb workflow:
# Build with debug info, then step through the function
riscv64-linux-gnu-gcc -g -static -o loopsum main.c loopsum_s.s
gdb ./loopsum
(gdb) break loopsum
(gdb) run 4
(gdb) stepi # step one instruction at a time
(gdb) info registers t0 t1 a0 # inspect i, sum, return value
8. Putting It Together: Find the Maximum¶
The Lab 3 / Project 2 findmax problem combines everything in this lecture: an array argument, indexed memory access in a loop, and a conditional inside the body. Here is the array-traversal version (a single function, no calls).
int findmax(int *arr, int n) {
int max = arr[0];
int i;
for (i = 1; i < n; i++) {
if (arr[i] > max) {
max = arr[i];
}
}
return max;
}
.global findmax
# int findmax(int *arr, int n)
# a0 - int *arr (base address)
# a1 - int n
# t0 - int i
# t1 - int max
# t2 - &arr[i] / arr[i]
findmax:
lw t1, (a0) # max = arr[0]
li t0, 1 # i = 1
loop:
bge t0, a1, done # if i >= n, exit
slli t2, t0, 2 # t2 = i * 4
add t2, a0, t2 # t2 = &arr[i]
lw t2, (t2) # t2 = arr[i]
ble t2, t1, skip # if arr[i] <= max, skip update
mv t1, t2 # max = arr[i]
skip:
addi t0, t0, 1 # i++
j loop
done:
mv a0, t1 # return max
ret
Notice the two inverted conditions working together: the loop guard exits when i >= n (bge), and the inner if skips the update when arr[i] <= max (ble). The Project 2 findmaxfc variant performs the same comparison by calling a max2_s helper instead of branching inline — which requires the stack discipline covered next session.
Key Concepts¶
| Concept | Definition | Example |
|---|---|---|
| Data processing instruction | Computes a value from registers | add a0, a1, a2 |
| Control instruction | Changes which instruction runs next | j loop, ble a0, a1, else |
| Memory instruction | Moves data between registers and memory | lw t0, (a0), sw t0, 4(a0) |
| Base address | Address of an array's first element | arr in int arr[3] |
offset(base) syntax |
Effective address = base + offset | lw t1, 4(a0) reads base + 4 |
Load word (lw) |
Read a 32-bit value from memory into a register | lw t0, (a0) is t0 = *a0 |
| Argument registers | a0–a7, hold function inputs |
a0 = arr, a1 = n |
| Return register | a0, holds the function result |
mv a0, t1 before ret |
| Label | A name for a code location | loop:, done: |
| Unconditional jump | Always transfers control | j loop |
| Conditional branch | Transfers control if a comparison holds | bge t0, a0, done |
| Condition inversion | Branch on the opposite condition to skip a block | if (x>0) → ble x, zero, else |
slli for indexing |
Shift left to multiply index by element size | slli t1, t0, 2 is t0 * 4 |
| Pseudo-instruction | Assembler convenience expanded to real ops | mv, li, j |
Practice Problems¶
Problem 1: Read the Third Element¶
The base address of an int array is in a0. Write the single instruction that loads arr[3] into t0 using a constant offset, and explain the offset you chose.
Click to reveal solution
Each `int` is 4 bytes, so element 3 is at byte offset `3 * 4 = 12` from the base. The `offset(base)` syntax computes the effective address `a0 + 12` and loads the 32-bit word there.Problem 2: Indexed Access with a Variable¶
The base address is in a0 and the index i is in a1. Write assembly that loads arr[i] into t0 using a shift (not a multiply).
Click to reveal solution
Shifting left by 2 multiplies by `2^2 = 4`, the size of an `int`. Adding that byte offset to the base produces the element address, and `lw` dereferences it.Problem 3: Translate an If/Else¶
Translate this C code to RISC-V assembly. Assume a0 holds x and t0 holds r.
Click to reveal solution
The C `if` runs the then-block when `x < 10`, so the assembly branches to `else` on the opposite condition, `x >= 10` (`bge`). Because `bge` compares two registers, we first load the constant `10` into `t1`. The then-block ends with `j done` so it does not fall through into the else-block.Problem 4: Trace a Loop¶
Hand-trace the following for the call loopsum(3). Show the value of i and sum at the end of each iteration and the final returned value.
loopsum:
li t0, 0 # i
li t1, 0 # sum
loop:
bge t0, a0, done # a0 = n
add t1, t1, t0
addi t0, t0, 1
j loop
done:
mv a0, t1
ret
Click to reveal solution
With `a0 = n = 3`: | Iteration | `i` at guard | `i >= 3`? | `sum` after body | `i` after update | |-----------|--------------|-----------|------------------|------------------| | 1 | 0 | no | 0 + 0 = 0 | 1 | | 2 | 1 | no | 0 + 1 = 1 | 2 | | 3 | 2 | no | 1 + 2 = 3 | 3 | | exit | 3 | yes → done | 3 | — | The returned value in `a0` is **3** (which is `0 + 1 + 2`). The body never runs for `i = 3` because the guard `bge t0, a0, done` exits first.Problem 5: Sum an Array¶
Write a complete leaf function sumarr(int *arr, int n) that returns the sum of all n elements. Use a0 for the base, a1 for n, and the t registers for locals.
Click to reveal solution
.global sumarr
# int sumarr(int *arr, int n)
# a0 - int *arr, a1 - int n
# t0 - i, t1 - sum, t2 - &arr[i] / arr[i]
sumarr:
li t0, 0 # i = 0
li t1, 0 # sum = 0
loop:
bge t0, a1, done # if i >= n, exit
slli t2, t0, 2 # t2 = i * 4
add t2, a0, t2 # t2 = &arr[i]
lw t2, (t2) # t2 = arr[i]
add t1, t1, t2 # sum += arr[i]
addi t0, t0, 1 # i++
j loop
done:
mv a0, t1 # return sum
ret
Problem 6: Why Invert the Condition?¶
A student writes the following for if (a0 > 0) { t0 = 1; }. Explain what is wrong and fix it.
Click to reveal solution
The branch jumps to `body` only when `a0 > 0`, but `body` is the very next instruction, so the branch does nothing useful — and worse, when `a0 <= 0` execution **falls through** into `body` anyway and still sets `t0 = 1`. The condition was not inverted. To compile `if`, branch on the **opposite** condition to *skip* the body: Now the body runs only when `a0 > 0`, exactly matching the C `if`.Further Reading¶
- RISC-V ISA Specification — the official standard
- RISC-V Assembly Programmer's Manual — practical reference for instructions and pseudo-instructions
- The RISC-V Reader — Patterson and Waterman, a concise textbook
- RISC-V references and cheat sheet: /guides/riscv/
- Course key concepts index: /guides/key-concepts/
- Project 2 (RISC-V Assembly Language): /assignments/project02/
- Source lecture notes (PDF): "/notes/CS315-01 2025-09-04 RISC-V Assembly 2.pdf"
Summary¶
-
Every instruction fits one of three categories — data processing (compute), control (change the next instruction), and memory (move data between registers and memory). Computation follows a load → compute → store rhythm.
-
Arrays are contiguous memory, and an array argument is passed as its base address. Element
iof anintarray lives atbase + i * 4, becauseintis 4 bytes and memory is byte-addressable. -
lwloads a word from memory using theoffset(base)syntax:lw t0, 4(a0)reads the 32-bit value ata0 + 4. Loads name the destination first; stores (sw) name the source first. -
Arguments arrive in
a0–a7and results return ina0. For Lab 3, code uses onlyaandtregisters because the functions do not call other functions, so no stack management is needed. -
Control flow is built from unconditional jumps (
j) and conditional branches (beq,bne,blt,bge,ble,bgt), with labels naming jump targets. -
Compiling
if/elserelies on condition inversion: branch on the opposite of the C condition to skip the then-block, end the then-block withj done, and place the else-block beforedone:. -
Loops are an
ifwith a back-edge: initialize, test the inverted condition with a guard branch, run the body, update the counter, andjback to the guard label. -
Verify by tracing — by hand with a register table or in
gdbwithstepi— to confirm the loop counter, accumulator, and return value evolve as expected.