RISC-V Assembly Part 8: Linked Lists¶

Overview¶

This lecture connects three closely related ideas: the full family of RISC-V load/store instructions (including the difference between signed and unsigned loads), how C struct values are laid out in memory with offsets and padding, and how those two ideas combine to implement and traverse linked lists in assembly. A linked list is just a chain of structs where each node holds a value and a pointer to the next node. Once you understand struct field offsets and the load instructions, traversing a list in assembly is straightforward: load the value, load the next pointer, repeat until the pointer is NULL. This material is the direct foundation for the Project 3 findmaxll and findmaxllp problems.

Learning Objectives¶

Identify every RISC-V load and store variant (ld/sd, lw/sw, lh/sh, lb/sb) and the size each one moves
Explain sign extension and choose between signed (lb, lh, lw) and unsigned (lbu, lhu, lwu) loads based on a field's declared C type
Explain why stores have no unsigned variant
Compute the byte offset of each field in a C struct, accounting for alignment and padding
Translate struct field accesses in C into RISC-V load/store instructions using offsets
Describe the structure of singly and doubly linked lists and their performance trade-offs
Define a linked-list node in C, lay it out in memory, and chain nodes with next pointers
Traverse a linked list in RISC-V assembly using a pointer-advancing loop terminated by NULL

Prerequisites¶

RISC-V Assembly Parts 1–7: registers, instructions, control flow, the stack, and the function calling convention
Memory addressing with base + offset (e.g. lw t0, 4(a0))
Two's complement and binary representation
C pointers, the address-of operator &, and struct syntax

1. RISC-V Memory Instructions: Loads and Stores¶

A RISC-V processor can only compute on values held in registers. To work with data in memory we must first load it into a register, compute, and then store the result back. RISC-V provides loads and stores in four sizes, matching the common C integer types.

The Size Variants¶

Pair	Size	Bits	Typical C type
`ld` / `sd`	doubleword	64	`int64_t`, `long`, any pointer
`lw` / `sw`	word	32	`int`, `int32_t`
`lh` / `sh`	halfword	16	`short`, `int16_t`
`lb` / `sb`	byte	8	`char`, `int8_t`

The first letter l/s is load/store; the second letter d/w/h/b is the size. The address is given as offset(base_register):

ld t0, (a0)        # load 64 bits from address in a0
lw t0, 4(a0)       # load 32 bits from address a0 + 4
lb t0, 0(a0)       # load 8 bits from address in a0
sd t0, 8(a1)       # store 64 bits of t0 to address a1 + 8

Signed vs Unsigned Loads¶

A register on RV64 is 64 bits wide. When a load brings in fewer than 64 bits, the processor must decide how to fill the remaining high bits. There are two choices, and they correspond to whether the C type is signed or unsigned:

lb — load byte, signed. The top bit of the loaded byte (its sign bit) is copied into all the higher bits. This is sign extension.
lbu — load byte, unsigned. The high bits are filled with zeros (zero extension).

The same pattern applies to the other sizes: lh/lhu, lw/lwu. (ld already fills the whole register, so there is no ldu.)

Sign Extension in Pictures¶

Suppose the byte in memory at the address in a0 has its top bit set — for example a char holding a negative value, with low 8 bits 1111 1111.

lb t0, (a0) — sign extension. The sign bit (bit 7) is replicated through bits 8–63, so a negative byte stays negative as a 64-bit value:

 bit 63                       bit 7        bit 0
 +------------------------------+------------------+
 | 1 1 1 1 ... 1 1 1 1 1 1 1 1  | 1 1 1 1 1 1 1 1  |   t0
 +------------------------------+------------------+
   ^ all copies of sign bit (s)    ^ the loaded byte

lbu t0, (a0) — zero extension. The high bits are zeros, so the result is the byte's unsigned value (0–255):

 bit 63                       bit 7        bit 0
 +------------------------------+------------------+
 | 0 0 0 0 ... 0 0 0 0 0 0 0 0  | 1 1 1 1 1 1 1 1  |   t0
 +------------------------------+------------------+
   ^ zeros                          ^ the loaded byte

If the in-memory byte were 0xFF:

lb produces 0xFFFFFFFFFFFFFFFF = -1 (signed)
lbu produces 0x00000000000000FF = 255 (unsigned)

For ASCII characters (0–127) the top bit is 0, so lb and lbu give the same result. The difference only shows up for values 128 and above, or for genuinely negative char/int fields.

Choosing the Right Load¶

The rule is simple and worth memorizing:

Match the load instruction to the field's declared C type and size. Use the signed variant for signed types (char, short, int, int8_t, ...) and the unsigned variant for unsigned types (unsigned char, uint8_t, ...).

Using lb where you meant lbu is a classic bug: a field that should read as 200 comes back as -56 because of unintended sign extension. In Project 3 (unstruct) the expected output deliberately includes negative values like -99 printed as the full 64-bit hex 0xFFFFFFFFFFFFFF9D, which only happens if you sign-extend correctly with the signed loads.

All Loads Work the Same Way¶

The load family is completely regular:

lw    lwu
lh    lhu     all work the same, just different sizes
lb    lbu     ( "u" = zero-extend instead of sign-extend )
ld    (no ldu — a doubleword already fills the register)

Stores Have No Unsigned Variant¶

There is no sbu, shu, or swu. A store just copies the low N bits of the register straight to memory — it does not extend anything, because it is writing into a fixed-size slot, not filling a wider destination. Sign/zero extension is only a question when you load a small value into a wide register. So:

sd  sw  sh  sb       <- these exist
sdu swu shu sbu      <- these do NOT exist

2. Struct Memory Layout¶

A C struct groups several values into one contiguous block of memory. To access a field in assembly we need its offset — the number of bytes from the start of the struct to that field.

A Simple Struct of Two `int`s¶

struct foo_st {
    int a;
    int b;
};

struct foo_st foo;

An int is 4 bytes, so the layout is:

# struct foo_st
# offset  field
#   0      a
#   4      b

If a pointer to the struct is in a0 (and the C compiler uses fp as a base pointer to a stack-allocated instance), the two accessor functions are trivial:

# a0 = &foo  (pointer to the struct)
get_a_s:
    lw a0, (a0)        # a0 = foo.a   (offset 0)
    ret

get_b_s:
    lw a0, 4(a0)       # a0 = foo.b   (offset 4)
    ret

In C these correspond to:

int get_a_c(struct foo_st *fp) { return fp->a; }
int get_b_c(struct foo_st *fp) { return fp->b; }

Stack picture (struct on the stack, base in fp):

        +---------+
 fp+4 ->|    b    |   offset 4
        +---------+
 fp   ->|    a    |   offset 0   <- start of struct
        +---------+

A Struct of Two `int64_t`s¶

struct foo_st {
    int64_t a;
    int64_t b;
};

Each field is 8 bytes:

# struct foo_st
# offset  field
#   0      a
#   8      b

        +---------+
 fp+8 ->|    b    |   offset 8
        +---------+
 fp   ->|    a    |   offset 0
        +---------+

Accessing these fields uses ld (doubleword) instead of lw:

get_a_s:
    ld a0, (a0)        # a0 = foo.a   (offset 0)
    ret
get_b_s:
    ld a0, 8(a0)       # a0 = foo.b   (offset 8)
    ret

3. Alignment and Padding¶

Offsets are not always just the running sum of the field sizes. The compiler inserts padding so that each field is naturally aligned: a 4-byte field starts at an address that is a multiple of 4, an 8-byte field at a multiple of 8, and so on. Aligned accesses are required (or strongly preferred) by the hardware and keep lw/ld working correctly.

Mixed Sizes Force Padding¶

struct foo_st {
    char    c;     // 1 byte
    int     a;     // 4 bytes
    int64_t b;     // 8 bytes
};

Naively c is at 0, a would be at 1, b at 5. But a (a 4-byte field) must start on a multiple of 4, and b (an 8-byte field) on a multiple of 8. The compiler pads:

# struct foo_st
# offset  field
#   0      char    c
#   4      int     a      (3 bytes of padding at 1,2,3)
#   8      int64_t b

Byte-by-byte memory layout (low address at bottom, fp at the struct base):

 offset
   15  +-----+ \
   ... | b   |  | int64_t b  (8 bytes, offsets 8..15)
    8  | b   | /
       +-----+ \
    7  | a   |  |
    6  | a   |  | int a      (4 bytes, offsets 4..7)
    5  | a   |  |
    4  | a   | /
       +-----+ \
    3  | pad |  |
    2  | pad |  | 3 bytes padding (offsets 1..3)
    1  | pad | /
       +-----+
    0  | c   |    char c     (1 byte, offset 0)   <- fp
       +-----+

So the offsets the compiler actually uses are c = 0, a = 4, b = 8, and the struct is 16 bytes total.

Padding Can Hide Between and After Fields¶

struct foo_st {
    int  a;     // 4 bytes
    char b;     // 1 byte
    char c;     // 1 byte
    int  d;     // 4 bytes
    char e;     // 1 byte
};

Walking through it:

# struct foo_st
# offset  field
#   0      int  a
#   4      char b
#   5      char c
#   8      int  d      (offsets 6,7 are padding so d lands on a multiple of 4)
#   12     char e

Layout:

 offset 12 | e |
        11 | pad? |
        10 | pad? |   (tail padding to round the struct size up to a
         9 | pad? |    multiple of its largest alignment, 4 -> size 16)
         8 | d |
         7 | d |
         6 | d |
         5 | d |
         4 | pad |     (offsets 6,7 padding before d)
         5 | c |
         4 | b |
         3 | a |
         2 | a |
         1 | a |
fp ->    0 | a |

The key takeaways:

A field's offset is determined by alignment, not just by adding up sizes.
Padding can appear between fields (to align the next field) and after the last field (to make the whole struct a multiple of its largest member's alignment, so arrays of the struct stay aligned).
When you hand-write assembly to touch a struct, you must use the same offsets the C compiler chose. The reliable way to discover them is the offsetof macro in C or examining gcc -S output — never guess.

Discovering Offsets in C¶

#include <stddef.h>
#include <stdio.h>

struct foo_st {
    char    c;
    int     a;
    int64_t b;
};

int main(void) {
    printf("c at %zu\n", offsetof(struct foo_st, c));   // 0
    printf("a at %zu\n", offsetof(struct foo_st, a));   // 4
    printf("b at %zu\n", offsetof(struct foo_st, b));   // 8
    printf("size = %zu\n", sizeof(struct foo_st));      // 16
    return 0;
}

4. Pointers Are All 8 Bytes¶

Before linked lists, one more fact: on RV64 every pointer is a 64-bit value, so every pointer is 8 bytes, regardless of what it points to.

int          *ip;      // 8 bytes
char         *cp;      // 8 bytes
struct foo_st *fp;     // 8 bytes  (all 8 bytes / 64 bit)

This is why pointers are loaded and stored with ld/sd. A next pointer in a linked-list node, a pointer to a char, a pointer to a giant struct — all occupy the same 8 bytes and all use the doubleword instructions.

5. Linked Lists¶

An array stores elements in one contiguous block. A linked list instead stores each element in its own node, and each node holds a pointer to the next node. This makes insertion and deletion cheap (just repoint a pointer) at the cost of losing random access (you must walk the chain).

Singly Linked Lists¶

Each node is a struct containing a value x and a single next pointer. The list is reached through a head pointer; the last node's next is NULL (drawn as 0), which marks the end.

 single linked list

  head                                              tail
   |                                                 |
   v                                                 v
 +------+      +------+      +------+
 |  x   |----->|  x   |----->|  x   |-----> NULL (0)
 +------+      +------+      +------+
   ^             ^             ^
 struct        node          node     ( the arrows are the "next" pointers )

head points to the first node.
Each next pointer points to the following node.
tail is the last node, whose next is NULL.
You can only move forward. To get to a node you must start at head and follow next pointers.

flowchart LR
    head((head)) --> N0
    N0["x | next"] --> N1["x | next"]
    N1 --> N2["x | next"]
    N2 --> NULL["NULL (0)"]

Doubly Linked Lists¶

Each node carries two pointers: next (forward) and prev (backward). The first node's prev and the last node's next are both NULL.

 double linked list

         +--------+      +--------+      +--------+
 0 <-----| x , y  |<====>| x , y  |<====>| x , y  |-----> 0
         +--------+      +--------+      +--------+
            head                            tail
                          (each node stores both next and prev)

flowchart LR
    NULL0["NULL"] -.prev.-> N0
    N0["x,y"] -- next --> N1["x,y"]
    N1 -- prev --> N0
    N1 -- next --> N2["x,y"]
    N2 -- prev --> N1
    N2 -- next --> NULL1["NULL"]

Singly vs Doubly — Trade-offs¶

Operation	Singly linked	Doubly linked
Forward traversal	O(n)	O(n)
Backward traversal	Not possible directly	O(n)
Insert/remove given a node	O(n) (need predecessor)	O(1) (have `prev`)
Memory per node	1 pointer (8 bytes)	2 pointers (16 bytes)

A doubly linked list supports efficient backward traversal and O(1) removal of a node from the middle because each node knows its predecessor. A singly linked list is leaner but removing an interior node requires first finding the previous node, which is O(n).

When we drop into assembly, all of the language's abstractions disappear — a "list" is nothing more than structs in memory connected by pointers we load and follow by hand.

6. A Linked-List Node in C¶

The node type for a singly linked list:

struct node_st {
    struct node_st *next_p;   // offset 0   (pointer, 8 bytes)
    int             value;    // offset 8
};

Field offsets:

# struct node_st
# offset 0   next_p   (pointer to the next node, 8 bytes)
# offset 8   value    (the integer payload, 4 bytes)

Putting the pointer first is convenient: next_p lands at offset 0, so following the chain is just ld next, (node) with no offset, and the value is at offset 8.

Building a List by Hand on the Stack¶

In main we can create individual node variables and wire them together manually. Here four nodes n0, n1, n2, n3 hold values 11, 22, 33, 44:

int main(void) {
    struct node_st n0, n1, n2, n3;

    n0.value  = 11;
    n0.next_p = &n1;

    n1.value  = 22;
    n1.next_p = &n2;

    n2.value  = 33;
    n2.next_p = &n3;

    n3.value  = 44;
    n3.next_p = NULL;     // end of list

    struct node_st *head = &n0;
    /* ... */
}

Each next_p is assigned the address of the next node (&n1, &n2, ...). The last node's next_p is NULL, terminating the list. The head pointer (&n0) is where traversal begins.

How It Looks on the Stack¶

The nodes live in main's stack frame. Each box below is a node; next_p (the field highlighted in the lecture) holds the address of the node above it, and the final next_p is NULL:

                  STACK
                +-------------+
                |   44        |  n3.value
                |   NULL      |  n3.next_p   <- end of list
                +-------------+
        +------>|   33        |  n2.value
        |       |   . (-> n3) |  n2.next_p ----+
        |       +-------------+                |
        |  +--->|   22        |  n1.value      |
        |  |    |   . (-> n2) |  n1.next_p ----+  (points up to n3)
        |  |    +-------------+
        |  |    |   11        |  n0.value
 head ->|  +----|   . (-> n1) |  n0.next_p
 (&n0)  +-------|             |
                +-------------+

 head = &n0
 follow n0.next_p -> n1, n1.next_p -> n2, n2.next_p -> n3, n3.next_p = NULL

The pointer values are concrete addresses; what matters is that each node's next_p field contains the address of the next node, and NULL marks the tail.

Node Size in Memory¶

struct node_st has an 8-byte pointer and a 4-byte int. With alignment, the struct is padded out to 16 bytes (8 for next_p, 4 for value, 4 bytes of tail padding so the struct's size is a multiple of 8). This 16-byte stride is what you see if you watch the addresses in GDB while traversing the list.

7. Traversing a Linked List in Assembly¶

Counting the nodes in a list is the canonical traversal. In C:

int count(struct node_st *p) {
    int n = 0;
    while (p != NULL) {
        n = n + 1;
        p = p->next_p;     // advance to next node
    }
    return n;
}

The loop has three parts: a termination test (p != NULL), a body, and an advance step (p = p->next_p). In assembly the advance step is simply a load from offset 0 (where next_p lives).

# int count(struct node_st *p)
# a0 = p  (pointer to current node, also the return value at the end)
# struct node_st layout:  next_p @ 0,  value @ 8
count:
    li   t0, 0                # t0 = n = 0
count_loop:
    beq  a0, zero, count_done # if p == NULL, stop
    addi t0, t0, 1            # n = n + 1
    ld   a0, 0(a0)            # p = p->next_p   (advance)
    j    count_loop
count_done:
    mv   a0, t0              # return value in a0
    ret

Key points:

beq a0, zero, ... is the p != NULL test. NULL is the integer 0, and zero is the hardwired-zero register, so comparing the pointer against zero checks for the end of the list.
ld a0, 0(a0) loads the next_p field (offset 0) and overwrites a0 with the next node's address. Because the pointer field is at offset 0, no displacement is needed — but writing 0(a0) makes the intent explicit.
The loop ends exactly when next_p was NULL, i.e. we just read past the tail.

Finding the Maximum Value (Project 3 `findmaxll`)¶

Project 3 asks for the maximum value across the list. Same traversal, but we read the value field (offset 8) and keep a running max:

int findmaxll(struct node_st *p) {
    int max = p->value;          // assume non-empty list
    while (p != NULL) {
        if (p->value > max) {
            max = p->value;
        }
        p = p->next_p;
    }
    return max;
}

# int findmaxll(struct node_st *p)
# a0 = p
# next_p @ 0,  value @ 8
findmaxll:
    lw   t1, 8(a0)            # max = p->value   (seed with first value)
findmax_loop:
    beq  a0, zero, findmax_done
    lw   t2, 8(a0)            # t2 = p->value
    ble  t2, t1, findmax_skip # if value <= max, skip update
    mv   t1, t2              # max = value
findmax_skip:
    ld   a0, 0(a0)           # p = p->next_p
    j    findmax_loop
findmax_done:
    mv   a0, t1             # return max
    ret

Note the load choices: value is an int (4 bytes, signed) so we use lw; next_p is a pointer (8 bytes) so we use ld. Matching the load to the field type is exactly the rule from Section 1.

Printing While Traversing (Project 3 `findmaxllp`)¶

findmaxllp prints each value with printf during the walk. Because printf is a function call, this version is non-leaf: it must build a stack frame, save ra, and save any value it needs across the call in a callee-saved register (the current node pointer and the running max). The traversal logic is identical; the difference is calling-convention bookkeeping.

flowchart TD
    A[p = head] --> B{p == NULL?}
    B -- yes --> E[return max]
    B -- no --> C["read p->value (lw, offset 8)"]
    C --> D[update max / print value]
    D --> F["p = p->next_p (ld, offset 0)"]
    F --> B

8. Watching It Run in GDB¶

You can verify the layout and traversal with GDB. Set a breakpoint at the traversal function, then step and examine the node pointer in a0 and the fields around it.

# build with debug symbols, run under qemu/gdb as in lab
gdb ./findmaxll
(gdb) break findmaxll
(gdb) run 1 2 3 4 99 5 6 7 8 9
(gdb) info registers a0          # a0 = address of current node
(gdb) x/2gx $a0                  # dump next_p (offset 0) and value+pad (offset 8)
(gdb) stepi                      # single-step instructions

What to look for:

The address in a0 changes by 16 bytes each time the loop executes ld a0, 0(a0) if the nodes are an array of node_st (16-byte stride from the padded struct size). For a hand-linked list the addresses can be anywhere, but each ld jumps a0 to the next node.
x/2gx $a0 prints two 8-byte (giant) words in hex: the first is next_p, the second is value (in its low 4 bytes) plus padding.
When ld a0, 0(a0) brings in 0, the next beq a0, zero, ... terminates the loop.

Key Concepts¶

Concept	Definition	Example
Load/store size variants	`ld/sd`, `lw/sw`, `lh/sh`, `lb/sb` move 8/4/2/1 bytes	`lw t0, 8(a0)` loads a 32-bit field
Sign extension	`lb`/`lh`/`lw` copy the value's top bit into the high register bits	`0xFF` via `lb` becomes `-1`
Zero extension	`lbu`/`lhu`/`lwu` fill high bits with 0	`0xFF` via `lbu` becomes `255`
No unsigned stores	Stores copy low N bits; nothing to extend	`sbu` does not exist
Offset	Byte distance from a struct's start to a field	`value` at offset 8 in `node_st`
Padding	Filler bytes inserted for alignment	`char` then `int` → 3 pad bytes
Alignment	A field starts on a multiple of its size	`int64_t` on a multiple of 8
Linked list	Chain of nodes connected by `next` pointers	`head -> n0 -> n1 -> NULL`
NULL terminator	Pointer value `0` marking the list's end	last node's `next_p == 0`
Traversal	Following `next` pointers until `NULL`	`ld a0, 0(a0)` in a loop

Practice Problems¶

Problem 1: Choose the Load Instruction¶

A struct field is declared unsigned char status; at offset 6. You want to load it into t0 from a node whose address is in a0. Which instruction, and why?

Click to reveal solution

lbu t0, 6(a0)      # unsigned char -> unsigned byte load

The field is one byte (`char`), so use a byte load. It is **unsigned**, so use the `u` (zero-extending) variant `lbu`. Using `lb` would sign-extend: a stored value of `200` (`0xC8`, top bit set) would read back as `-56` instead of `200`. Match the load to the declared type and size: 1 byte + unsigned = `lbu`, with the field offset `6`.

Problem 2: Compute the Offsets¶

Give the offset of each field and the total size of:

struct rec_st {
    char    tag;
    int64_t key;
    int     count;
};

Click to reveal solution

# offset  field        reason
#   0      char tag     starts at 0
#   8      int64_t key  must align to 8; offsets 1..7 are padding
#   16     int count    follows the 8-byte key at offset 16
# size = 24             rounded up to a multiple of 8 (largest alignment)
#                       offsets 20..23 are tail padding

`tag` is at 0. `key` is 8-byte aligned, so 7 bytes of padding push it to offset 8. `count` follows at 16. The struct's largest member needs 8-byte alignment, so the total size rounds up from 20 to **24** (4 bytes of tail padding). Access them with `lb tag@0`, `ld key@8`, `lw count@16`.

Problem 3: Why No `swu`?¶

Explain why RISC-V has lwu but no swu.

Click to reveal solution

Extension (sign vs zero) is only a question when you move a **small** value into a **wider** register — the high bits have to be filled with something. That happens on a load (a small memory value going into a 64-bit register), so loads come in signed and unsigned flavors. A store goes the other way: it copies the **low N bits** of the register straight into an N-byte memory slot. There is nothing wider to fill, so there is nothing to extend. `sw` already does exactly the right thing for both signed and unsigned 32-bit values — it just writes the low 32 bits. Hence `lwu` exists but `swu` does not (likewise no `sbu`, `shu`, `sdu`).

Problem 4: Translate the Traversal¶

Translate this C loop to RISC-V assembly. Assume struct node_st has next_p at offset 0 and value at offset 8, and the list head pointer is already in a0. Sum all the values and return the total in a0.

int sum(struct node_st *p) {
    int total = 0;
    while (p != NULL) {
        total = total + p->value;
        p = p->next_p;
    }
    return total;
}

Click to reveal solution

# int sum(struct node_st *p)
# a0 = p ;  next_p @ 0, value @ 8
sum:
    li   t0, 0                # total = 0
sum_loop:
    beq  a0, zero, sum_done   # p == NULL -> stop
    lw   t1, 8(a0)            # t1 = p->value   (int, signed -> lw)
    add  t0, t0, t1           # total += value
    ld   a0, 0(a0)            # p = p->next_p   (pointer -> ld)
    j    sum_loop
sum_done:
    mv   a0, t0              # return total
    ret

The value field is a signed `int`, so `lw`; the `next_p` field is a pointer, so `ld`. The `beq a0, zero` test handles both an empty list and the end of the list.

Problem 5: Trace the Pointer Chain¶

Given the hand-built list n0(11) -> n1(22) -> n2(33) -> n3(44) -> NULL, how many times does ld a0, 0(a0) execute in the count function, and what is the final value returned?

Click to reveal solution

The loop body runs once per node (4 times), and each iteration executes one `ld a0, 0(a0)`: | Iteration | `a0` before | `n` after `addi` | `a0` after `ld` | |-----------|-------------|------------------|-----------------| | 1 | `&n0` | 1 | `&n1` | | 2 | `&n1` | 2 | `&n2` | | 3 | `&n2` | 3 | `&n3` | | 4 | `&n3` | 4 | `NULL (0)` | After iteration 4, `a0` is `NULL`, so `beq a0, zero, count_done` exits. `ld a0, 0(a0)` executed **4 times**, and the function returns **4** (the node count). Note the loop reads each node's `next_p` exactly once.

Problem 6: Spot the Bug¶

A student writes the following to advance through a list whose nodes have value at offset 0 and next_p at offset 8. Their traversal never terminates. What went wrong?

loop:
    beq  a0, zero, done
    lw   t0, 0(a0)        # read value
    ld   a0, 0(a0)        # advance to next node
    j    loop
done:

Click to reveal solution

The advance load uses the **wrong offset**. With `value` at offset 0 and `next_p` at offset 8, `ld a0, 0(a0)` reloads the *value* into `a0` (reinterpreting an `int` as an address) instead of the `next_p` pointer. The fix:

    ld   a0, 8(a0)        # p = p->next_p   (next_p is at offset 8 here)

This is exactly why getting struct offsets right matters: the hardware will happily load whatever bytes you point it at. Always confirm the layout (via `offsetof` or `gcc -S`) before hard-coding offsets in assembly.

Summary¶

RISC-V loads and stores come in four sizes — ld/sd (8 bytes), lw/sw (4), lh/sh (2), lb/sb (1) — matching the common C integer types.
Signed loads (lb, lh, lw) sign-extend the value into the wide register; unsigned loads (lbu, lhu, lwu) zero-extend. Choose the variant that matches the field's declared C type, or you will read negative values where you expected positive ones (and vice versa).
Stores have no unsigned variant because a store just writes the low N bits to memory; there is nothing to extend.
Struct fields are accessed by byte offset, and the compiler inserts padding so each field is naturally aligned (4-byte fields on multiples of 4, 8-byte fields on multiples of 8). Always use the same offsets the C compiler chose — find them with offsetof or gcc -S.
All pointers are 8 bytes on RV64 and are loaded/stored with ld/sd, no matter what they point to.
A linked list is a chain of structs connected by next pointers, terminated by NULL (the integer 0). Singly linked lists go forward only; doubly linked lists add a prev pointer for O(1) interior removal and backward traversal at the cost of extra memory.
Traversing a list in assembly is a loop: test the pointer against zero for the NULL terminator, process the current node's fields (lw value, 8(a0)), then advance with ld a0, 0(a0). This is the core pattern behind Project 3's findmaxll and findmaxllp.

RISC-V Assembly Part 8: Linked Lists¶

Overview¶

Learning Objectives¶

Prerequisites¶

1. RISC-V Memory Instructions: Loads and Stores¶

The Size Variants¶

Signed vs Unsigned Loads¶

Sign Extension in Pictures¶

Choosing the Right Load¶

All Loads Work the Same Way¶

Stores Have No Unsigned Variant¶

2. Struct Memory Layout¶

A Simple Struct of Two ints¶

A Struct of Two int64_ts¶

3. Alignment and Padding¶

Mixed Sizes Force Padding¶

Padding Can Hide Between and After Fields¶

Discovering Offsets in C¶

4. Pointers Are All 8 Bytes¶

5. Linked Lists¶

Singly Linked Lists¶

Doubly Linked Lists¶

Singly vs Doubly — Trade-offs¶

6. A Linked-List Node in C¶

Building a List by Hand on the Stack¶

How It Looks on the Stack¶

Node Size in Memory¶

7. Traversing a Linked List in Assembly¶

Finding the Maximum Value (Project 3 findmaxll)¶

Printing While Traversing (Project 3 findmaxllp)¶

8. Watching It Run in GDB¶

Key Concepts¶

Practice Problems¶

Problem 1: Choose the Load Instruction¶

Problem 2: Compute the Offsets¶

Problem 3: Why No swu?¶

Problem 4: Translate the Traversal¶

Problem 5: Trace the Pointer Chain¶

Problem 6: Spot the Bug¶

Further Reading¶

Summary¶

A Simple Struct of Two `int`s¶

A Struct of Two `int64_t`s¶

Finding the Maximum Value (Project 3 `findmaxll`)¶

Printing While Traversing (Project 3 `findmaxllp`)¶

Problem 3: Why No `swu`?¶