RISC-V Machine Code and Emulation

# RISC-V Machine Code and Emulation

## CS 315 Computer Architecture

---

## Learning Objectives

- Explain the C → assembly → machine code translation chain
- Read a 32-bit instruction word from memory using C pointers
- Decode the six R-type fields with shift-and-mask operations
- Describe the emulator's processor state: registers, PC, and stack
- Trace the fetch-decode-execute cycle
- Understand how `ret`/`jalr` stops the emulator loop

---

## From C to Machine Code

- The **compiler** turns C into assembly text
- The **assembler** turns mnemonics into 32-bit binary words
- The **processor** (or our emulator) fetches and runs those words

<div class="info-box">
For the emulator, <strong>machine code is the input</strong> — real compiled functions at real addresses.
</div>

---

## Bitwise Ops: Same in C and RISC-V

| C operator | RISC-V instruction | Use in decoding |
|------------|--------------------|-----------------|
| `&` | `and` / `andi` | Mask off upper bits |
| `\|` | `or` / `ori` | Combine split pieces |
| `>>` (unsigned) | `srl` / `srli` | Bring field to bit 0 |
| `>>` (signed) | `sra` / `srai` | Sign-extend immediates |
| `<<` | `sll` / `slli` | Position a field |

<div class="highlight-box">
The same operators are instructions the emulator <em>executes</em> and tools the emulator uses to <em>decode</em> instructions.
</div>

---

## Processor vs. Emulator

<div class="mermaid">
flowchart LR
    subgraph HW["RISC-V Hardware"]
        R1["REGS (silicon)"]
        PC1["PC"]
        EX1["execute(iw)"]
    end
    subgraph SW["Our Emulator (C)"]
        R2["regs[32]"]
        PC2["pc"]
        ST2["stack[]"]
        EX2["execute(iw)"]
    end
    subgraph MEM["Memory (shared)"]
        C["CODE\n0x00B50533\n0x00008067"]
    end
    PC1 -.fetch.-> C
    PC2 -.fetch.-> C
</div>

Both read the **same machine code** from memory — hardware does it in silicon; the emulator does it in C.

---

## Processor State

RISC-V RV64 state the emulator must reproduce:

| Component | Type | Notes |
|-----------|------|-------|
| 32 registers (`x0`–`x31`) | `uint64_t[32]` | 64-bit each; `x0` always 0 |
| Program counter | `uint64_t` | Address of next instruction |
| Stack memory | `uint8_t[]` | Grows downward |

---

## The `rv_state` Struct

```c
#include <stdint.h>

#define NREGS       32
#define STACK_SIZE  8192    // 8 KB

struct rv_state {
    uint64_t regs[NREGS];        // x0..x31
    uint64_t pc;                 // program counter
    uint8_t  stack[STACK_SIZE];  // emulated stack
};
```

<div class="info-box">
<strong>Why 5-bit register fields?</strong> 2<sup>5</sup> = 32 — exactly enough to index all 32 registers.
</div>

---

## The 32-bit Instruction Word

Every RISC-V base ISA instruction is exactly **32 bits**. Six formats slice those bits differently:

```text
R-type: | funct7  | rs2 | rs1 |funct3| rd    | opcode |
I-type: |   imm[11:0]   | rs1 |funct3| rd    | opcode |
S-type: | imm[11:5]| rs2| rs1 |funct3|imm4:0 | opcode |
B-type: |  imm(scattered)| rs1|funct3| imm   | opcode |
U-type: |       imm[31:12]           | rd    | opcode |
J-type: |       imm(scattered)       | rd    | opcode |
```

<div class="highlight-box">
The <strong>opcode [6:0]</strong> is always in the same position — it tells us the format.
</div>

---

## R-Type Field Layout

```text
bit: 31      25 24   20 19   15 14  12 11    7 6      0
     [ funct7 ] [ rs2 ] [ rs1 ] [fn3] [  rd ] [opcode]
        7 bits    5 bits  5 bits  3 b   5 bits   7 bits
```

Decoding steps:
1. Check **opcode** → format
2. Check **funct3** → operation family
3. Check **funct7** → exact operation (e.g., ADD vs SUB)

---

## Worked Example: `add a0, a0, a1`

Instruction word: **`0x00B50533`**

```text
Binary: 0000 0000 1011 0101 0000 0101 0011 0011

funct7   rs2    rs1  funct3  rd    opcode
0000000 01011  01010   000  01010  0110011
```

| Field | Value | Meaning |
|-------|-------|---------|
| opcode | `0110011` | R-type |
| rd | `01010` = 10 | `a0` |
| funct3 | `000` | ADD/SUB family |
| rs1 | `01010` = 10 | `a0` |
| rs2 | `01011` = 11 | `a1` |
| funct7 | `0000000` | ADD |

---

## Decoding Diagram: `0x00B50533`

<div class="mermaid">
graph LR
    IW["0x00B50533"] --> OP["opcode=0110011\nR-type"]
    IW --> RD["rd=01010\na0 (x10)"]
    IW --> F3["funct3=000\nADD/SUB family"]
    IW --> R1["rs1=01010\na0 (x10)"]
    IW --> R2["rs2=01011\na1 (x11)"]
    IW --> F7["funct7=0000000\nADD"]
</div>

---

## R-Type Operation Table

All R-type: opcode = `0110011`

| Instruction | funct7 | funct3 | Operation |
|-------------|--------|--------|-----------|
| `add` | `0000000` | `000` | `rd = rs1 + rs2` |
| `sub` | `0100000` | `000` | `rd = rs1 - rs2` |
| `mul` | `0000001` | `000` | `rd = rs1 * rs2` |
| `sll` | `0000000` | `001` | `rd = rs1 << rs2` |
| `srl` | `0000000` | `101` | `rd = rs1 >> rs2` (logical) |
| `sra` | `0100000` | `101` | `rd = rs1 >> rs2` (arith) |
| `and` | `0000000` | `111` | `rd = rs1 & rs2` |
| `or`  | `0000000` | `110` | `rd = rs1 \| rs2` |
| `xor` | `0000000` | `100` | `rd = rs1 ^ rs2` |

---

## Reading the Instruction Word in C

```c
// add2_s is a real compiled assembly function
extern uint64_t add2_s(uint64_t a0, uint64_t a1);

// Cast function address to a pointer to 32-bit words
uint32_t *pc = (uint32_t *) add2_s;

uint32_t iw = *pc;       // fetch first instruction
pc = pc + 1;             // advance 4 bytes (one instruction)
iw = *pc;                // fetch second instruction
```

Inside the emulator:

```c
uint32_t iw = *((uint32_t *) rsp->pc);   // fetch at pc
```

<div class="info-box">
A <code>uint32_t *</code> advances by <strong>4 bytes</strong> per <code>+1</code> — exactly one instruction.
</div>

---

## Shift-and-Mask: The Recipe

Extract `n` bits starting at bit position `start`:

```text
field = (iw >> start) & ((1 << n) - 1)
```

Example — extract **opcode** (bits `[6:0]`):

```text
iw   = ...xxxx xxxx  0110011   (low 7 bits)
mask =              01111111   (0x7F)
result=             0110011
```

`(1 << 7) - 1 = 0x7F` — seven 1-bits, masking exactly the opcode.

---

## `get_bits` Helper

```c
static inline uint32_t get_bits(uint32_t iw,
                                uint32_t start,
                                uint32_t count) {
    uint32_t mask = (1u << count) - 1u;
    return (iw >> start) & mask;
}
```

One-liner extractors for every R-type field:

```c
static uint32_t get_opcode(uint32_t iw) { return get_bits(iw,  0, 7); }
static uint32_t get_rd    (uint32_t iw) { return get_bits(iw,  7, 5); }
static uint32_t get_funct3(uint32_t iw) { return get_bits(iw, 12, 3); }
static uint32_t get_rs1   (uint32_t iw) { return get_bits(iw, 15, 5); }
static uint32_t get_rs2   (uint32_t iw) { return get_bits(iw, 20, 5); }
static uint32_t get_funct7(uint32_t iw) { return get_bits(iw, 25, 7); }
```

---

## Inline Equivalents (for reference)

```c
uint32_t iw = 0x00B50533;

uint32_t opcode = iw         & 0x7F;  // bits [6:0]
uint32_t rd     = (iw >>  7) & 0x1F;  // bits [11:7]
uint32_t funct3 = (iw >> 12) & 0x7;   // bits [14:12]
uint32_t rs1    = (iw >> 15) & 0x1F;  // bits [19:15]
uint32_t rs2    = (iw >> 20) & 0x1F;  // bits [24:20]
uint32_t funct7 = (iw >> 25) & 0x7F;  // bits [31:25]
```

Result: `opcode=0x33, rd=10, funct3=0, rs1=10, rs2=11, funct7=0`

Matches the hand-decoded `add a0, a0, a1`.

---

## Interpreter vs. Emulator

| | Interpreter | Emulator |
|---|---|---|
| Input | Source / bytecode / AST | Target machine code |
| Models a specific CPU? | Not necessarily | Yes |
| Lab 6 project | reads & runs instructions | models RISC-V CPU |

<div class="highlight-box">
Our program <em>interprets</em> one instruction at a time, but it is best called an <strong>emulator</strong> because it faithfully models RISC-V hardware state.
</div>

---

## Fetch-Decode-Execute Cycle

<div class="mermaid">
flowchart LR
    A["Fetch\niw = *pc"] --> B["Decode\nopcode, rd, funct3, ..."]
    B --> C["Execute\nupdate regs / memory"]
    C --> D["Update PC\npc += 4 or branch"]
    D --> A
</div>

Both real hardware and our emulator run this loop — the only difference is C variables vs. silicon.

---

## The Emulate Loop

```c
uint64_t rv_emulate(struct rv_state *rsp) {
    while (rsp->pc != 0) {
        rv_one(rsp);           // one fetch-decode-execute
        rsp->regs[0] = 0;      // x0 is hardwired to zero
    }
    return rsp->regs[10];      // result is in a0 (x10)
}
```

```c
void rv_one(struct rv_state *rsp) {
    uint32_t iw     = *((uint32_t *) rsp->pc);  // FETCH
    uint32_t opcode = get_opcode(iw);           // DECODE

switch (opcode) {
        case 0b0110011: emu_r_type(rsp, iw); break; // R-type
        case 0b0010011: emu_i_type(rsp, iw); break; // I-type arith
        case 0b1100111: emu_jalr(rsp, iw);   break; // jalr/ret
        default: printf("Unknown opcode 0x%X\n", opcode); exit(1);
    }
}
```

---

## Executing an R-Type Instruction

```c
void emu_r_type(struct rv_state *rsp, uint32_t iw) {
    uint32_t rd = get_rd(iw), rs1 = get_rs1(iw), rs2 = get_rs2(iw);
    uint32_t f3 = get_funct3(iw), f7 = get_funct7(iw);

if      (f3 == 0 && f7 == 0b0000000)
        rsp->regs[rd] = rsp->regs[rs1] + rsp->regs[rs2]; // add
    else if (f3 == 0 && f7 == 0b0100000)
        rsp->regs[rd] = rsp->regs[rs1] - rsp->regs[rs2]; // sub
    else if (f3 == 0 && f7 == 0b0000001)
        rsp->regs[rd] = rsp->regs[rs1] * rsp->regs[rs2]; // mul
    else if (f3 == 7 && f7 == 0b0000000)
        rsp->regs[rd] = rsp->regs[rs1] & rsp->regs[rs2]; // and
    // ... more cases ...

rsp->pc += 4;
}
```

`uint64_t` arithmetic wraps on overflow exactly like RISC-V hardware.

---

## Initializing the Emulator

```c
void rv_init(struct rv_state *rsp, uint64_t (*func)(),
             uint64_t a0, uint64_t a1, uint64_t a2, uint64_t a3) {
    for (int i = 0; i < NREGS; i++) rsp->regs[i] = 0;

rsp->pc      = (uint64_t) func;           // first instruction
    rsp->regs[10] = a0;  rsp->regs[11] = a1; // args in a0..a3
    rsp->regs[12] = a2;  rsp->regs[13] = a3;

// sp starts at TOP of stack array (stack grows down)
    rsp->regs[2] = (uint64_t) &rsp->stack[STACK_SIZE];

rsp->regs[1] = 0;    // ra = 0  (halt sentinel)
}
```

Usage:

```c
struct rv_state state;
rv_init(&state, (uint64_t (*)()) quadratic_s, 2, 4, 6, 8);
uint64_t result = rv_emulate(&state);
```

---

## Stack Layout

- `addi sp, sp, -16` → sp moves **down** (allocate)
- `addi sp, sp, 16`  → sp moves **up** (free)
- Starting at `&stack[STACK_SIZE]` gives the full 8 KB to grow into

---

## How Emulation Stops: `ret` and `jalr`

<div class="mermaid">
flowchart TD
    A["rv_init: ra = 0"] --> B["emulate instructions"]
    B --> C{"pc != 0?"}
    C -->|yes| B
    C -->|no| D["return regs[a0]"]
    E["ret = jalr x0, 0(ra)\nsets pc = ra = 0"] --> C
</div>

```c
void emu_jalr(struct rv_state *rsp, uint32_t iw) {
    uint32_t rs1 = get_rs1(iw);
    rsp->pc = rsp->regs[rs1];   // pc = ra; top call: ra==0 -> stop
}
```

---

## Beyond R-Type: Other Formats

| Format | Opcode | Used by | Key note |
|--------|--------|---------|----------|
| **I-type** | `0010011` | `addi`, `li`, `lw`, `jalr` | 12-bit sign-extended imm in `[31:20]` |
| **B-type** | `1100011` | `beq`, `bne`, `blt`, `bge` | 13-bit scattered imm; if taken `pc += imm` |
| **J-type** | `1101111` | `jal`, `j`, `call` | 21-bit scattered imm; `pc += imm` |

Sign extension for immediates:

```c
static int64_t sign_extend(uint64_t value, int start) {
    int shift = 63 - start;
    return ((int64_t)(value << shift)) >> shift;
}
```

---

## I-Type: `addi` (and `li`, `mv`)

```c
void emu_i_type(struct rv_state *rsp, uint32_t iw) {
    uint32_t rd     = get_rd(iw);
    uint32_t rs1    = get_rs1(iw);
    uint32_t funct3 = get_funct3(iw);
    int64_t  imm    = sign_extend(get_bits(iw, 20, 12), 11);

if (funct3 == 0b000) {               // addi
        rsp->regs[rd] = rsp->regs[rs1] + imm;
    }
    rsp->pc += 4;
}
```

<div class="info-box">
<code>li a0, 5</code> is <code>addi a0, zero, 5</code>; <code>mv a0, a1</code> is <code>addi a0, a1, 0</code>.
Implementing <code>addi</code> gives you <code>li</code> and <code>mv</code> for free.
</div>

---

## Lab 6: Incremental Strategy

<div class="mermaid">
flowchart TD
    A["Pick target program\n(start: quadratic)"] --> B["Run; see unsupported\ninstruction"]
    B --> C["Identify format\nfrom opcode"]
    C --> D["Decode fields\nwith get_bits"]
    D --> E["Implement operation\nupdate pc"]
    E --> F{"Emu == C == Asm?"}
    F -->|no| B
    F -->|yes| G["Move to next program"]
</div>

---

## Expected Lab 6 Output

```bash
$ ./lab06 quadratic 2 4 6 8
C:   36
Asm: 36
Emu: 36

$ ./lab06 max3 3 8 5
C:   8
Asm: 8
Emu: 8
```

All three lines must agree. Instructions you will likely need:

- R-type: `add`, `sub`, `mul`, `and`, `or`, `xor`, `sll`, `srl`, `sra`
- I-type: `addi` (covers `li`, `mv`), loads
- Control: `jalr` (`ret`), `jal` (`j`), branches (`beq`, `bne`, `blt`, `bge`)

---

## Key Concepts Cheat Sheet

| Concept | Quick Reference |
|---------|-----------------|
| `iw = 0x00B50533` | `add a0, a0, a1` |
| opcode `[6:0]` = `0110011` | R-type |
| `get_bits(iw, 7, 5)` | `rd` field |
| `(iw >> 15) & 0x1F` | `rs1` field |
| `pc += 4` | advance one instruction |
| `ra = 0` at init | halt sentinel for `ret` |
| `sp = &stack[STACK_SIZE]` | stack grows down |

---

## Summary

1. **Machine code is 32-bit binary** — the assembler produces it; the emulator reads it

2. **`struct rv_state`** mirrors the hardware: 32 registers, `pc`, and a stack

3. **opcode `[6:0]` selects format; funct3/funct7 select the operation**

4. **Decode with shift-and-mask**: `(iw >> start) & mask` — use `get_bits` for all six fields

5. **Fetch with a pointer cast**: `iw = *((uint32_t *) rsp->pc)`

6. **Fetch-decode-execute loops until `pc == 0`** — `ra = 0` at init makes the top-level `ret` stop the loop

7. **Extend incrementally**: run a program, find the first unsupported instruction, implement it, repeat