RISC-V Assembly Part 8: Linked Lists

# RISC-V Assembly Part 8: Linked Lists

## CS 315 Computer Architecture

---

## Overview

Three closely connected ideas:

1. **Load/store instruction family** — sizes and sign extension
2. **Struct memory layout** — offsets, alignment, padding
3. **Linked lists in assembly** — traverse nodes by following pointers

<div class="info-box">
Foundation for Project 3: <code>findmaxll</code> and <code>findmaxllp</code>
</div>

---

## Load/Store Size Variants

| Pair | Size | Bits | Typical C type |
|------|------|------|----------------|
| `ld` / `sd` | doubleword | 64 | `int64_t`, pointer |
| `lw` / `sw` | word | 32 | `int`, `int32_t` |
| `lh` / `sh` | halfword | 16 | `short`, `int16_t` |
| `lb` / `sb` | byte | 8 | `char`, `int8_t` |

Address syntax: `offset(base_register)`

```asm
ld t0,  (a0)    # load 64 bits from address in a0
lw t0, 4(a0)    # load 32 bits from a0 + 4
lb t0, 0(a0)    # load  8 bits from a0
sd t0, 8(a1)    # store 64 bits to a1 + 8
```

---

## Signed vs Unsigned Loads

When loading fewer than 64 bits, the high bits must be filled:

| Load | High bits | Extension type |
|------|-----------|----------------|
| `lb` | copy of bit 7 | **sign extension** |
| `lbu` | zeros | **zero extension** |
| `lh` / `lhu` | sign / zero | same pattern |
| `lw` / `lwu` | sign / zero | same pattern |
| `ld` | (fills full register — no variant) | — |

<div class="highlight-box">
Rule: match the load to the field's declared C type.<br>
Signed type → signed load  |  Unsigned type → unsigned load (<code>u</code> suffix)
</div>

---

## Sign vs Zero Extension: Example

Byte in memory = `0xFF` (bit 7 is set)

**`lb t0, (a0)`** — sign-extend:
```text
 bit 63 ......... bit 8 | bit 7 ........ bit 0
 1 1 1 1 ... 1 1 1 1 1  | 1 1 1 1 1 1 1 1
  (all copies of bit 7)      (loaded byte)
```
Result: `0xFFFFFFFFFFFFFFFF` = **-1**

**`lbu t0, (a0)`** — zero-extend:
```text
 bit 63 ......... bit 8 | bit 7 ........ bit 0
 0 0 0 0 ... 0 0 0 0 0  | 1 1 1 1 1 1 1 1
  (zeros)                    (loaded byte)
```
Result: `0x00000000000000FF` = **255**

---

## Why No Unsigned Stores?

- `sw`, `sh`, `sb` — **these exist**
- `swu`, `shu`, `sbu` — **these do NOT exist**

A store copies the **low N bits** of a register into an N-byte memory slot.

There is nothing wider to fill, so there is nothing to extend.

<div class="info-box">
Sign/zero extension is only a question when loading a small value into a wider (64-bit) register. Stores go the other direction — no extension needed.
</div>

---

## Struct Memory Layout: Basics

Fields are accessed by **byte offset** from the struct's start.

```c
struct foo_st {
    int a;   // 4 bytes
    int b;   // 4 bytes
};
```

```text
# offset  field
#   0      a
#   4      b
```

```asm
# a0 = pointer to struct
lw a0,  (a0)    # a0 = foo.a  (offset 0)
lw a0, 4(a0)    # a0 = foo.b  (offset 4)
```

---

## Struct Layout: 64-bit Fields

```c
struct foo_st {
    int64_t a;   // 8 bytes
    int64_t b;   // 8 bytes
};
```

```text
# offset  field
#   0      a
#   8      b
```

```asm
ld a0,  (a0)    # a0 = foo.a  (offset 0)
ld a0, 8(a0)    # a0 = foo.b  (offset 8)
```

Use `ld` for 8-byte fields (and all pointers).

---

## Alignment and Padding

The compiler inserts **padding** so each field is naturally aligned:

- 4-byte field starts on a multiple of 4
- 8-byte field starts on a multiple of 8

```c
struct foo_st {
    char    c;     // 1 byte  → offset 0
    int     a;     // 4 bytes → offset 4  (3 bytes padding at 1,2,3)
    int64_t b;     // 8 bytes → offset 8
};                 // total size: 16 bytes
```

```text
 offset 15 | b |
        ...| b |
         8 | b |
         7 | a |
         4 | a |
         3 |pad|
         2 |pad|
         1 |pad|
         0 | c |  <- start of struct
```

---

## Padding: Multiple Fields Example

```c
struct foo_st {
    int  a;    // offset  0
    char b;    // offset  4
    char c;    // offset  5
    int  d;    // offset  8  (offsets 6,7 are padding)
    char e;    // offset 12
};             // size = 16  (tail padding at 13,14,15)
```

Key takeaways:
- Padding appears **between** fields (to align the next field)
- Padding appears **after** the last field (to keep array elements aligned)
- Never guess offsets — use `offsetof()` or `gcc -S`

---

## Discovering Offsets in C

```c
#include <stddef.h>
#include <stdio.h>

struct foo_st { char c; int a; int64_t b; };

int main(void) {
    printf("c at %zu\n", offsetof(struct foo_st, c));  // 0
    printf("a at %zu\n", offsetof(struct foo_st, a));  // 4
    printf("b at %zu\n", offsetof(struct foo_st, b));  // 8
    printf("size = %zu\n", sizeof(struct foo_st));     // 16
}
```

<div class="highlight-box">
Always verify struct offsets with <code>offsetof</code> before writing assembly. Wrong offsets = wrong data loaded silently.
</div>

---

## All Pointers Are 8 Bytes

On RV64, every pointer is a 64-bit value — **8 bytes** — regardless of what it points to.

```c
int           *ip;   // 8 bytes
char          *cp;   // 8 bytes
struct foo_st *fp;   // 8 bytes
```

- Always load/store pointers with `ld` / `sd`
- A `next` pointer in a linked list node is **always** 8 bytes

---

## What Is a Linked List?

An **array** stores elements contiguously. A **linked list** stores each element in a separate node connected by pointers.

<div class="mermaid">
flowchart LR
    head(["head"]) --> N0["x | next"]
    N0 --> N1["x | next"]
    N1 --> N2["x | next"]
    N2 --> NULL["NULL (0)"]
</div>

- `head` points to the first node
- Each `next` pointer points to the following node
- Last node's `next` = `NULL` (integer `0`) — marks the end
- Can only traverse **forward**

---

## Singly vs Doubly Linked Lists

<div class="mermaid">
flowchart LR
    NA["NULL"] -.-> A["x | prev/next"]
    A -- next --> B["x | prev/next"]
    B -- prev --> A
    B -- next --> C["x | prev/next"]
    C -- prev --> B
    C -- next --> NB["NULL"]
</div>

| Operation | Singly | Doubly |
|-----------|--------|--------|
| Forward traversal | O(n) | O(n) |
| Backward traversal | Not possible | O(n) |
| Remove given node | O(n) | O(1) |
| Memory per node | 1 pointer (8 B) | 2 pointers (16 B) |

---

## A Linked-List Node in C

```c
struct node_st {
    struct node_st *next_p;   // offset 0  (pointer, 8 bytes)
    int             value;    // offset 8  (int, 4 bytes)
};
// total size: 16 bytes (4 bytes tail padding)
```

Putting `next_p` first is convenient:

- `ld next, (node)` — no offset needed to advance
- `lw val, 8(node)` — value at offset 8

---

## Building a List on the Stack

```c
struct node_st n0, n1, n2, n3;
n0.value = 11;  n0.next_p = &n1;
n1.value = 22;  n1.next_p = &n2;
n2.value = 33;  n2.next_p = &n3;
n3.value = 44;  n3.next_p = NULL;
struct node_st *head = &n0;
```

```text
         +-------------+
         |   44  NULL  |  n3 (value=44, next_p=NULL)
         +-------------+
    +--> |   33  .---> n3  |  n2
    |    +-------------+
    +--> |   22  .---> n2  |  n1
    |    +-------------+
head --> |   11  .---> n1  |  n0
         +-------------+
```

---

## Traversal Loop in C

```c
int count(struct node_st *p) {
    int n = 0;
    while (p != NULL) {
        n = n + 1;
        p = p->next_p;     // advance: load next pointer
    }
    return n;
}
```

Three parts every traversal loop needs:

1. **Termination test**: `p != NULL`
2. **Body**: process current node's fields
3. **Advance**: `p = p->next_p` — load the next pointer

---

## Count: RISC-V Assembly

```asm
# int count(struct node_st *p)
# a0 = p  (pointer to current node)
# node_st layout:  next_p @ 0,  value @ 8
count:
    li   t0, 0               # t0 = n = 0
count_loop:
    beq  a0, zero, count_done  # if p == NULL, stop
    addi t0, t0, 1             # n = n + 1
    ld   a0, 0(a0)             # p = p->next_p  (advance)
    j    count_loop
count_done:
    mv   a0, t0               # return value in a0
    ret
```

- `beq a0, zero` tests for `NULL` (NULL = integer 0 = `zero` register)
- `ld a0, 0(a0)` loads `next_p` at offset 0 and overwrites `a0`

---

## Load Sizes Must Match Field Types

In `count` and all list traversals:

| Field | C type | Size | Instruction |
|-------|--------|------|-------------|
| `next_p` | pointer | 8 bytes | `ld` |
| `value` | `int` | 4 bytes | `lw` |

<div class="highlight-box">
Using <code>lw</code> to load a pointer, or <code>ld</code> to load an <code>int</code>, is a classic bug. The hardware loads whatever you say — it won't warn you.
</div>

---

## findmaxll: Maximum Value (Project 3)

```asm
# int findmaxll(struct node_st *p)
# next_p @ 0,  value @ 8
findmaxll:
    lw   t1, 8(a0)             # max = p->value  (seed)
findmax_loop:
    beq  a0, zero, findmax_done
    lw   t2, 8(a0)             # t2 = p->value
    ble  t2, t1, findmax_skip  # if value <= max, skip
    mv   t1, t2                # max = value
findmax_skip:
    ld   a0, 0(a0)             # p = p->next_p
    j    findmax_loop
findmax_done:
    mv   a0, t1                # return max
    ret
```

`lw` for `value` (signed `int`), `ld` for `next_p` (pointer).

---

## Traversal Flowchart

<div class="mermaid">
flowchart TD
    A["p = head"] --> B{"p == NULL?"}
    B -- yes --> E["return max"]
    B -- no --> C["lw t2, 8(a0)  — read value"]
    C --> D["update max if needed"]
    D --> F["ld a0, 0(a0)  — advance p"]
    F --> B
</div>

---

## findmaxllp: Printing During Traversal

`findmaxllp` calls `printf` inside the loop — making it a **non-leaf function**.

Requirements beyond `findmaxll`:

- Build a **stack frame** and save `ra`
- Save current node pointer and running max in **callee-saved registers** (`s0`, `s1`, ...) so they survive the `printf` call
- Restore saved registers and free the frame before `ret`

The traversal logic is identical — only calling-convention bookkeeping differs.

---

## Non-Leaf Function Frame

```asm
findmaxllp:
    addi sp, sp, -32
    sd   ra,  24(sp)    # save return address
    sd   s0,  16(sp)    # save s0 (will hold p)
    sd   s1,   8(sp)    # save s1 (will hold max)
    # ... traversal with printf calls using s0, s1 ...
    ld   ra,  24(sp)
    ld   s0,  16(sp)
    ld   s1,   8(sp)
    addi sp, sp, 32
    ret
```

<div class="info-box">
Caller-saved registers (<code>t0</code>–<code>t6</code>, <code>a0</code>–<code>a7</code>) are clobbered by <code>printf</code>. Use callee-saved registers (<code>s0</code>–<code>s11</code>) to hold values across calls.
</div>

---

## Debugging with GDB

```bash
gdb ./findmaxll
(gdb) break findmaxll
(gdb) run 1 2 3 4 99 5
(gdb) info registers a0          # address of current node
(gdb) x/2gx $a0                  # dump next_p (offset 0) and value (offset 8)
(gdb) stepi                      # single-step instructions
```

What to watch:

- `a0` jumps by **16 bytes** per iteration (padded `node_st` size) if nodes are in an array
- `x/2gx $a0` shows two 8-byte words: `next_p` then `value` (in low 4 bytes)
- When `ld a0, 0(a0)` brings in `0`, the next `beq` exits the loop

---

## Spot the Bug

Node layout: `value @ 0`, `next_p @ 8`

```asm
loop:
    beq  a0, zero, done
    lw   t0, 0(a0)      # read value — OK
    ld   a0, 0(a0)      # advance?  <- BUG
    j    loop
```

**Problem**: `ld a0, 0(a0)` loads `value` (offset 0) as a pointer instead of `next_p` (offset 8).

```asm
    ld   a0, 8(a0)      # correct: next_p is at offset 8
```

The hardware loads whatever bytes you specify — wrong offset = wrong data.

---

## Practice: Sum All Values

Translate to RISC-V (`next_p @ 0`, `value @ 8`):

```c
int sum(struct node_st *p) {
    int total = 0;
    while (p != NULL) {
        total = total + p->value;
        p = p->next_p;
    }
    return total;
}
```

Key choices: `lw` for `value` (signed `int`), `ld` for `next_p` (pointer), `beq a0, zero` for NULL test.

---

## Practice: Sum — Solution

```asm
# int sum(struct node_st *p)
# a0 = p ;  next_p @ 0,  value @ 8
sum:
    li   t0, 0                 # total = 0
sum_loop:
    beq  a0, zero, sum_done    # p == NULL -> stop
    lw   t1, 8(a0)             # t1 = p->value  (int -> lw)
    add  t0, t0, t1            # total += value
    ld   a0, 0(a0)             # p = p->next_p  (pointer -> ld)
    j    sum_loop
sum_done:
    mv   a0, t0                # return total
    ret
```

---

## Key Concepts

| Concept | Key Point |
|---------|-----------|
| Load/store sizes | `ld/sd`=8B, `lw/sw`=4B, `lh/sh`=2B, `lb/sb`=1B |
| Sign extension | `lb`/`lh`/`lw` fill high bits from sign bit |
| Zero extension | `lbu`/`lhu`/`lwu` fill high bits with zeros |
| No unsigned stores | Stores copy low N bits; nothing to extend |
| Struct offset | Byte distance from struct start to field |
| Padding | Filler bytes for natural alignment |
| All pointers = 8B | Always `ld`/`sd`, never `lw`/`sw` |
| NULL terminator | Pointer value `0`; test with `beq reg, zero` |
| Traversal pattern | Test NULL → process fields → `ld a0, 0(a0)` |

---

## Summary

1. **Four load/store sizes** match C types: `ld/sd` (8B), `lw/sw` (4B), `lh/sh` (2B), `lb/sb` (1B)

2. **Signed loads sign-extend; unsigned loads (`u` suffix) zero-extend** — match the declared C type or you will read garbage

3. **Stores have no unsigned variant** — they just write the low N bits

4. **Struct fields use byte offsets**; the compiler inserts padding for alignment — always verify with `offsetof` or `gcc -S`

5. **All pointers are 8 bytes** on RV64 — always load/store with `ld`/`sd`

6. **Linked list traversal** = loop: test `beq a0, zero` for NULL, process node fields, advance with `ld a0, 0(a0)`

7. **Non-leaf traversal** (e.g., `findmaxllp`) requires saving `ra` and using callee-saved registers across `printf` calls