RISC-V Assembly Part 8: Linked Lists¶
Overview¶
This lecture connects three closely related ideas: the full family of RISC-V load/store instructions (including the difference between signed and unsigned loads), how C struct values are laid out in memory with offsets and padding, and how those two ideas combine to implement and traverse linked lists in assembly. A linked list is just a chain of structs where each node holds a value and a pointer to the next node. Once you understand struct field offsets and the load instructions, traversing a list in assembly is straightforward: load the value, load the next pointer, repeat until the pointer is NULL. This material is the direct foundation for the Project 3 findmaxll and findmaxllp problems.
Learning Objectives¶
- Identify every RISC-V load and store variant (
ld/sd,lw/sw,lh/sh,lb/sb) and the size each one moves - Explain sign extension and choose between signed (
lb,lh,lw) and unsigned (lbu,lhu,lwu) loads based on a field's declared C type - Explain why stores have no unsigned variant
- Compute the byte offset of each field in a C struct, accounting for alignment and padding
- Translate struct field accesses in C into RISC-V load/store instructions using offsets
- Describe the structure of singly and doubly linked lists and their performance trade-offs
- Define a linked-list node in C, lay it out in memory, and chain nodes with
nextpointers - Traverse a linked list in RISC-V assembly using a pointer-advancing loop terminated by
NULL
Prerequisites¶
- RISC-V Assembly Parts 1–7: registers, instructions, control flow, the stack, and the function calling convention
- Memory addressing with base + offset (e.g.
lw t0, 4(a0)) - Two's complement and binary representation
- C pointers, the address-of operator
&, and struct syntax
1. RISC-V Memory Instructions: Loads and Stores¶
A RISC-V processor can only compute on values held in registers. To work with data in memory we must first load it into a register, compute, and then store the result back. RISC-V provides loads and stores in four sizes, matching the common C integer types.
The Size Variants¶
| Pair | Size | Bits | Typical C type |
|---|---|---|---|
ld / sd |
doubleword | 64 | int64_t, long, any pointer |
lw / sw |
word | 32 | int, int32_t |
lh / sh |
halfword | 16 | short, int16_t |
lb / sb |
byte | 8 | char, int8_t |
The first letter l/s is load/store; the second letter d/w/h/b is the size. The address is given as offset(base_register):
ld t0, (a0) # load 64 bits from address in a0
lw t0, 4(a0) # load 32 bits from address a0 + 4
lb t0, 0(a0) # load 8 bits from address in a0
sd t0, 8(a1) # store 64 bits of t0 to address a1 + 8
Signed vs Unsigned Loads¶
A register on RV64 is 64 bits wide. When a load brings in fewer than 64 bits, the processor must decide how to fill the remaining high bits. There are two choices, and they correspond to whether the C type is signed or unsigned:
lb— load byte, signed. The top bit of the loaded byte (its sign bit) is copied into all the higher bits. This is sign extension.lbu— load byte, unsigned. The high bits are filled with zeros (zero extension).
The same pattern applies to the other sizes: lh/lhu, lw/lwu. (ld already fills the whole register, so there is no ldu.)
Sign Extension in Pictures¶
Suppose the byte in memory at the address in a0 has its top bit set — for example a char holding a negative value, with low 8 bits 1111 1111.
lb t0, (a0) — sign extension. The sign bit (bit 7) is replicated through bits 8–63, so a negative byte stays negative as a 64-bit value:
bit 63 bit 7 bit 0
+------------------------------+------------------+
| 1 1 1 1 ... 1 1 1 1 1 1 1 1 | 1 1 1 1 1 1 1 1 | t0
+------------------------------+------------------+
^ all copies of sign bit (s) ^ the loaded byte
lbu t0, (a0) — zero extension. The high bits are zeros, so the result is the byte's unsigned value (0–255):
bit 63 bit 7 bit 0
+------------------------------+------------------+
| 0 0 0 0 ... 0 0 0 0 0 0 0 0 | 1 1 1 1 1 1 1 1 | t0
+------------------------------+------------------+
^ zeros ^ the loaded byte
If the in-memory byte were 0xFF:
lbproduces0xFFFFFFFFFFFFFFFF=-1(signed)lbuproduces0x00000000000000FF=255(unsigned)
For ASCII characters (0–127) the top bit is 0, so lb and lbu give the same result. The difference only shows up for values 128 and above, or for genuinely negative char/int fields.
Choosing the Right Load¶
The rule is simple and worth memorizing:
Match the load instruction to the field's declared C type and size. Use the signed variant for signed types (
char,short,int,int8_t, ...) and the unsigned variant for unsigned types (unsigned char,uint8_t, ...).
Using lb where you meant lbu is a classic bug: a field that should read as 200 comes back as -56 because of unintended sign extension. In Project 3 (unstruct) the expected output deliberately includes negative values like -99 printed as the full 64-bit hex 0xFFFFFFFFFFFFFF9D, which only happens if you sign-extend correctly with the signed loads.
All Loads Work the Same Way¶
The load family is completely regular:
lw lwu
lh lhu all work the same, just different sizes
lb lbu ( "u" = zero-extend instead of sign-extend )
ld (no ldu — a doubleword already fills the register)
Stores Have No Unsigned Variant¶
There is no sbu, shu, or swu. A store just copies the low N bits of the register straight to memory — it does not extend anything, because it is writing into a fixed-size slot, not filling a wider destination. Sign/zero extension is only a question when you load a small value into a wide register. So:
2. Struct Memory Layout¶
A C struct groups several values into one contiguous block of memory. To access a field in assembly we need its offset — the number of bytes from the start of the struct to that field.
A Simple Struct of Two ints¶
An int is 4 bytes, so the layout is:
If a pointer to the struct is in a0 (and the C compiler uses fp as a base pointer to a stack-allocated instance), the two accessor functions are trivial:
# a0 = &foo (pointer to the struct)
get_a_s:
lw a0, (a0) # a0 = foo.a (offset 0)
ret
get_b_s:
lw a0, 4(a0) # a0 = foo.b (offset 4)
ret
In C these correspond to:
Stack picture (struct on the stack, base in fp):
A Struct of Two int64_ts¶
Each field is 8 bytes:
Accessing these fields uses ld (doubleword) instead of lw:
3. Alignment and Padding¶
Offsets are not always just the running sum of the field sizes. The compiler inserts padding so that each field is naturally aligned: a 4-byte field starts at an address that is a multiple of 4, an 8-byte field at a multiple of 8, and so on. Aligned accesses are required (or strongly preferred) by the hardware and keep lw/ld working correctly.
Mixed Sizes Force Padding¶
Naively c is at 0, a would be at 1, b at 5. But a (a 4-byte field) must start on a multiple of 4, and b (an 8-byte field) on a multiple of 8. The compiler pads:
Byte-by-byte memory layout (low address at bottom, fp at the struct base):
offset
15 +-----+ \
... | b | | int64_t b (8 bytes, offsets 8..15)
8 | b | /
+-----+ \
7 | a | |
6 | a | | int a (4 bytes, offsets 4..7)
5 | a | |
4 | a | /
+-----+ \
3 | pad | |
2 | pad | | 3 bytes padding (offsets 1..3)
1 | pad | /
+-----+
0 | c | char c (1 byte, offset 0) <- fp
+-----+
So the offsets the compiler actually uses are c = 0, a = 4, b = 8, and the struct is 16 bytes total.
Padding Can Hide Between and After Fields¶
struct foo_st {
int a; // 4 bytes
char b; // 1 byte
char c; // 1 byte
int d; // 4 bytes
char e; // 1 byte
};
Walking through it:
# struct foo_st
# offset field
# 0 int a
# 4 char b
# 5 char c
# 8 int d (offsets 6,7 are padding so d lands on a multiple of 4)
# 12 char e
Layout:
offset 12 | e |
11 | pad? |
10 | pad? | (tail padding to round the struct size up to a
9 | pad? | multiple of its largest alignment, 4 -> size 16)
8 | d |
7 | d |
6 | d |
5 | d |
4 | pad | (offsets 6,7 padding before d)
5 | c |
4 | b |
3 | a |
2 | a |
1 | a |
fp -> 0 | a |
The key takeaways:
- A field's offset is determined by alignment, not just by adding up sizes.
- Padding can appear between fields (to align the next field) and after the last field (to make the whole struct a multiple of its largest member's alignment, so arrays of the struct stay aligned).
- When you hand-write assembly to touch a struct, you must use the same offsets the C compiler chose. The reliable way to discover them is the
offsetofmacro in C or examininggcc -Soutput — never guess.
Discovering Offsets in C¶
#include <stddef.h>
#include <stdio.h>
struct foo_st {
char c;
int a;
int64_t b;
};
int main(void) {
printf("c at %zu\n", offsetof(struct foo_st, c)); // 0
printf("a at %zu\n", offsetof(struct foo_st, a)); // 4
printf("b at %zu\n", offsetof(struct foo_st, b)); // 8
printf("size = %zu\n", sizeof(struct foo_st)); // 16
return 0;
}
4. Pointers Are All 8 Bytes¶
Before linked lists, one more fact: on RV64 every pointer is a 64-bit value, so every pointer is 8 bytes, regardless of what it points to.
This is why pointers are loaded and stored with ld/sd. A next pointer in a linked-list node, a pointer to a char, a pointer to a giant struct — all occupy the same 8 bytes and all use the doubleword instructions.
5. Linked Lists¶
An array stores elements in one contiguous block. A linked list instead stores each element in its own node, and each node holds a pointer to the next node. This makes insertion and deletion cheap (just repoint a pointer) at the cost of losing random access (you must walk the chain).
Singly Linked Lists¶
Each node is a struct containing a value x and a single next pointer. The list is reached through a head pointer; the last node's next is NULL (drawn as 0), which marks the end.
single linked list
head tail
| |
v v
+------+ +------+ +------+
| x |----->| x |----->| x |-----> NULL (0)
+------+ +------+ +------+
^ ^ ^
struct node node ( the arrows are the "next" pointers )
headpoints to the first node.- Each
nextpointer points to the following node. tailis the last node, whosenextisNULL.- You can only move forward. To get to a node you must start at
headand follownextpointers.
flowchart LR
head((head)) --> N0
N0["x | next"] --> N1["x | next"]
N1 --> N2["x | next"]
N2 --> NULL["NULL (0)"]
Doubly Linked Lists¶
Each node carries two pointers: next (forward) and prev (backward). The first node's prev and the last node's next are both NULL.
double linked list
+--------+ +--------+ +--------+
0 <-----| x , y |<====>| x , y |<====>| x , y |-----> 0
+--------+ +--------+ +--------+
head tail
(each node stores both next and prev)
flowchart LR
NULL0["NULL"] -.prev.-> N0
N0["x,y"] -- next --> N1["x,y"]
N1 -- prev --> N0
N1 -- next --> N2["x,y"]
N2 -- prev --> N1
N2 -- next --> NULL1["NULL"]
Singly vs Doubly — Trade-offs¶
| Operation | Singly linked | Doubly linked |
|---|---|---|
| Forward traversal | O(n) | O(n) |
| Backward traversal | Not possible directly | O(n) |
| Insert/remove given a node | O(n) (need predecessor) | O(1) (have prev) |
| Memory per node | 1 pointer (8 bytes) | 2 pointers (16 bytes) |
A doubly linked list supports efficient backward traversal and O(1) removal of a node from the middle because each node knows its predecessor. A singly linked list is leaner but removing an interior node requires first finding the previous node, which is O(n).
When we drop into assembly, all of the language's abstractions disappear — a "list" is nothing more than structs in memory connected by pointers we load and follow by hand.
6. A Linked-List Node in C¶
The node type for a singly linked list:
Field offsets:
# struct node_st
# offset 0 next_p (pointer to the next node, 8 bytes)
# offset 8 value (the integer payload, 4 bytes)
Putting the pointer first is convenient: next_p lands at offset 0, so following the chain is just ld next, (node) with no offset, and the value is at offset 8.
Building a List by Hand on the Stack¶
In main we can create individual node variables and wire them together manually. Here four nodes n0, n1, n2, n3 hold values 11, 22, 33, 44:
int main(void) {
struct node_st n0, n1, n2, n3;
n0.value = 11;
n0.next_p = &n1;
n1.value = 22;
n1.next_p = &n2;
n2.value = 33;
n2.next_p = &n3;
n3.value = 44;
n3.next_p = NULL; // end of list
struct node_st *head = &n0;
/* ... */
}
Each next_p is assigned the address of the next node (&n1, &n2, ...). The last node's next_p is NULL, terminating the list. The head pointer (&n0) is where traversal begins.
How It Looks on the Stack¶
The nodes live in main's stack frame. Each box below is a node; next_p (the field highlighted in the lecture) holds the address of the node above it, and the final next_p is NULL:
STACK
+-------------+
| 44 | n3.value
| NULL | n3.next_p <- end of list
+-------------+
+------>| 33 | n2.value
| | . (-> n3) | n2.next_p ----+
| +-------------+ |
| +--->| 22 | n1.value |
| | | . (-> n2) | n1.next_p ----+ (points up to n3)
| | +-------------+
| | | 11 | n0.value
head ->| +----| . (-> n1) | n0.next_p
(&n0) +-------| |
+-------------+
head = &n0
follow n0.next_p -> n1, n1.next_p -> n2, n2.next_p -> n3, n3.next_p = NULL
The pointer values are concrete addresses; what matters is that each node's next_p field contains the address of the next node, and NULL marks the tail.
Node Size in Memory¶
struct node_st has an 8-byte pointer and a 4-byte int. With alignment, the struct is padded out to 16 bytes (8 for next_p, 4 for value, 4 bytes of tail padding so the struct's size is a multiple of 8). This 16-byte stride is what you see if you watch the addresses in GDB while traversing the list.
7. Traversing a Linked List in Assembly¶
Counting the nodes in a list is the canonical traversal. In C:
int count(struct node_st *p) {
int n = 0;
while (p != NULL) {
n = n + 1;
p = p->next_p; // advance to next node
}
return n;
}
The loop has three parts: a termination test (p != NULL), a body, and an advance step (p = p->next_p). In assembly the advance step is simply a load from offset 0 (where next_p lives).
# int count(struct node_st *p)
# a0 = p (pointer to current node, also the return value at the end)
# struct node_st layout: next_p @ 0, value @ 8
count:
li t0, 0 # t0 = n = 0
count_loop:
beq a0, zero, count_done # if p == NULL, stop
addi t0, t0, 1 # n = n + 1
ld a0, 0(a0) # p = p->next_p (advance)
j count_loop
count_done:
mv a0, t0 # return value in a0
ret
Key points:
beq a0, zero, ...is thep != NULLtest.NULLis the integer0, andzerois the hardwired-zero register, so comparing the pointer againstzerochecks for the end of the list.ld a0, 0(a0)loads thenext_pfield (offset 0) and overwritesa0with the next node's address. Because the pointer field is at offset 0, no displacement is needed — but writing0(a0)makes the intent explicit.- The loop ends exactly when
next_pwasNULL, i.e. we just read past the tail.
Finding the Maximum Value (Project 3 findmaxll)¶
Project 3 asks for the maximum value across the list. Same traversal, but we read the value field (offset 8) and keep a running max:
int findmaxll(struct node_st *p) {
int max = p->value; // assume non-empty list
while (p != NULL) {
if (p->value > max) {
max = p->value;
}
p = p->next_p;
}
return max;
}
# int findmaxll(struct node_st *p)
# a0 = p
# next_p @ 0, value @ 8
findmaxll:
lw t1, 8(a0) # max = p->value (seed with first value)
findmax_loop:
beq a0, zero, findmax_done
lw t2, 8(a0) # t2 = p->value
ble t2, t1, findmax_skip # if value <= max, skip update
mv t1, t2 # max = value
findmax_skip:
ld a0, 0(a0) # p = p->next_p
j findmax_loop
findmax_done:
mv a0, t1 # return max
ret
Note the load choices: value is an int (4 bytes, signed) so we use lw; next_p is a pointer (8 bytes) so we use ld. Matching the load to the field type is exactly the rule from Section 1.
Printing While Traversing (Project 3 findmaxllp)¶
findmaxllp prints each value with printf during the walk. Because printf is a function call, this version is non-leaf: it must build a stack frame, save ra, and save any value it needs across the call in a callee-saved register (the current node pointer and the running max). The traversal logic is identical; the difference is calling-convention bookkeeping.
flowchart TD
A[p = head] --> B{p == NULL?}
B -- yes --> E[return max]
B -- no --> C["read p->value (lw, offset 8)"]
C --> D[update max / print value]
D --> F["p = p->next_p (ld, offset 0)"]
F --> B
8. Watching It Run in GDB¶
You can verify the layout and traversal with GDB. Set a breakpoint at the traversal function, then step and examine the node pointer in a0 and the fields around it.
# build with debug symbols, run under qemu/gdb as in lab
gdb ./findmaxll
(gdb) break findmaxll
(gdb) run 1 2 3 4 99 5 6 7 8 9
(gdb) info registers a0 # a0 = address of current node
(gdb) x/2gx $a0 # dump next_p (offset 0) and value+pad (offset 8)
(gdb) stepi # single-step instructions
What to look for:
- The address in
a0changes by 16 bytes each time the loop executesld a0, 0(a0)if the nodes are an array ofnode_st(16-byte stride from the padded struct size). For a hand-linked list the addresses can be anywhere, but eachldjumpsa0to the next node. x/2gx $a0prints two 8-byte (giant) words in hex: the first isnext_p, the second isvalue(in its low 4 bytes) plus padding.- When
ld a0, 0(a0)brings in0, the nextbeq a0, zero, ...terminates the loop.
Key Concepts¶
| Concept | Definition | Example |
|---|---|---|
| Load/store size variants | ld/sd, lw/sw, lh/sh, lb/sb move 8/4/2/1 bytes |
lw t0, 8(a0) loads a 32-bit field |
| Sign extension | lb/lh/lw copy the value's top bit into the high register bits |
0xFF via lb becomes -1 |
| Zero extension | lbu/lhu/lwu fill high bits with 0 |
0xFF via lbu becomes 255 |
| No unsigned stores | Stores copy low N bits; nothing to extend | sbu does not exist |
| Offset | Byte distance from a struct's start to a field | value at offset 8 in node_st |
| Padding | Filler bytes inserted for alignment | char then int → 3 pad bytes |
| Alignment | A field starts on a multiple of its size | int64_t on a multiple of 8 |
| Linked list | Chain of nodes connected by next pointers |
head -> n0 -> n1 -> NULL |
| NULL terminator | Pointer value 0 marking the list's end |
last node's next_p == 0 |
| Traversal | Following next pointers until NULL |
ld a0, 0(a0) in a loop |
Practice Problems¶
Problem 1: Choose the Load Instruction¶
A struct field is declared unsigned char status; at offset 6. You want to load it into t0 from a node whose address is in a0. Which instruction, and why?
Click to reveal solution
The field is one byte (`char`), so use a byte load. It is **unsigned**, so use the `u` (zero-extending) variant `lbu`. Using `lb` would sign-extend: a stored value of `200` (`0xC8`, top bit set) would read back as `-56` instead of `200`. Match the load to the declared type and size: 1 byte + unsigned = `lbu`, with the field offset `6`.Problem 2: Compute the Offsets¶
Give the offset of each field and the total size of:
Click to reveal solution
`tag` is at 0. `key` is 8-byte aligned, so 7 bytes of padding push it to offset 8. `count` follows at 16. The struct's largest member needs 8-byte alignment, so the total size rounds up from 20 to **24** (4 bytes of tail padding). Access them with `lb tag@0`, `ld key@8`, `lw count@16`.Problem 3: Why No swu?¶
Explain why RISC-V has lwu but no swu.
Click to reveal solution
Extension (sign vs zero) is only a question when you move a **small** value into a **wider** register — the high bits have to be filled with something. That happens on a load (a small memory value going into a 64-bit register), so loads come in signed and unsigned flavors. A store goes the other way: it copies the **low N bits** of the register straight into an N-byte memory slot. There is nothing wider to fill, so there is nothing to extend. `sw` already does exactly the right thing for both signed and unsigned 32-bit values — it just writes the low 32 bits. Hence `lwu` exists but `swu` does not (likewise no `sbu`, `shu`, `sdu`).Problem 4: Translate the Traversal¶
Translate this C loop to RISC-V assembly. Assume struct node_st has next_p at offset 0 and value at offset 8, and the list head pointer is already in a0. Sum all the values and return the total in a0.
int sum(struct node_st *p) {
int total = 0;
while (p != NULL) {
total = total + p->value;
p = p->next_p;
}
return total;
}
Click to reveal solution
# int sum(struct node_st *p)
# a0 = p ; next_p @ 0, value @ 8
sum:
li t0, 0 # total = 0
sum_loop:
beq a0, zero, sum_done # p == NULL -> stop
lw t1, 8(a0) # t1 = p->value (int, signed -> lw)
add t0, t0, t1 # total += value
ld a0, 0(a0) # p = p->next_p (pointer -> ld)
j sum_loop
sum_done:
mv a0, t0 # return total
ret
Problem 5: Trace the Pointer Chain¶
Given the hand-built list n0(11) -> n1(22) -> n2(33) -> n3(44) -> NULL, how many times does ld a0, 0(a0) execute in the count function, and what is the final value returned?
Click to reveal solution
The loop body runs once per node (4 times), and each iteration executes one `ld a0, 0(a0)`: | Iteration | `a0` before | `n` after `addi` | `a0` after `ld` | |-----------|-------------|------------------|-----------------| | 1 | `&n0` | 1 | `&n1` | | 2 | `&n1` | 2 | `&n2` | | 3 | `&n2` | 3 | `&n3` | | 4 | `&n3` | 4 | `NULL (0)` | After iteration 4, `a0` is `NULL`, so `beq a0, zero, count_done` exits. `ld a0, 0(a0)` executed **4 times**, and the function returns **4** (the node count). Note the loop reads each node's `next_p` exactly once.Problem 6: Spot the Bug¶
A student writes the following to advance through a list whose nodes have value at offset 0 and next_p at offset 8. Their traversal never terminates. What went wrong?
Click to reveal solution
The advance load uses the **wrong offset**. With `value` at offset 0 and `next_p` at offset 8, `ld a0, 0(a0)` reloads the *value* into `a0` (reinterpreting an `int` as an address) instead of the `next_p` pointer. The fix: This is exactly why getting struct offsets right matters: the hardware will happily load whatever bytes you point it at. Always confirm the layout (via `offsetof` or `gcc -S`) before hard-coding offsets in assembly.Further Reading¶
- RISC-V Specifications (riscv.org)
- RISC-V Assembly Programmer's Manual
- Data Structure Alignment (Wikipedia)
- Linked List (Wikipedia)
- RISC-V references and cheat sheets: /guides/riscv/
- Project 3 specification: /assignments/project03/
- Lecture source notes: /notes/CS315-01 2025-09-18 RISC-V Assembly 8.pdf
Summary¶
-
RISC-V loads and stores come in four sizes —
ld/sd(8 bytes),lw/sw(4),lh/sh(2),lb/sb(1) — matching the common C integer types. -
Signed loads (
lb,lh,lw) sign-extend the value into the wide register; unsigned loads (lbu,lhu,lwu) zero-extend. Choose the variant that matches the field's declared C type, or you will read negative values where you expected positive ones (and vice versa). -
Stores have no unsigned variant because a store just writes the low N bits to memory; there is nothing to extend.
-
Struct fields are accessed by byte offset, and the compiler inserts padding so each field is naturally aligned (4-byte fields on multiples of 4, 8-byte fields on multiples of 8). Always use the same offsets the C compiler chose — find them with
offsetoforgcc -S. -
All pointers are 8 bytes on RV64 and are loaded/stored with
ld/sd, no matter what they point to. -
A linked list is a chain of structs connected by
nextpointers, terminated byNULL(the integer0). Singly linked lists go forward only; doubly linked lists add aprevpointer for O(1) interior removal and backward traversal at the cost of extra memory. -
Traversing a list in assembly is a loop: test the pointer against
zerofor theNULLterminator, process the current node's fields (lw value, 8(a0)), then advance withld a0, 0(a0). This is the core pattern behind Project 3'sfindmaxllandfindmaxllp.