flowchart LR
A["C source (.c)"] -->|"gcc compiles"| B["Assembly (.s)"]
B -->|"as assembles"| C["Machine code (.o / binary)"]
C -->|"processor fetches"| D["Execution"]
- The **compiler** turns C into assembly text
- The **assembler** turns mnemonics into 32-bit binary words
- The **processor** (or our emulator) fetches and runs those words
flowchart LR
subgraph HW["RISC-V Hardware"]
R1["REGS (silicon)"]
PC1["PC"]
EX1["execute(iw)"]
end
subgraph SW["Our Emulator (C)"]
R2["regs[32]"]
PC2["pc"]
ST2["stack[]"]
EX2["execute(iw)"]
end
subgraph MEM["Memory (shared)"]
C["CODE\n0x00B50533\n0x00008067"]
end
PC1 -.fetch.-> C
PC2 -.fetch.-> C
Both read the **same machine code** from memory — hardware does it in silicon; the emulator does it in C.
---
## Processor State
RISC-V RV64 state the emulator must reproduce:
| Component | Type | Notes |
|-----------|------|-------|
| 32 registers (`x0`–`x31`) | `uint64_t[32]` | 64-bit each; `x0` always 0 |
| Program counter | `uint64_t` | Address of next instruction |
| Stack memory | `uint8_t[]` | Grows downward |
```text
Memory layout (high address at top)
+------------------+
| STACK | <- sp points here, grows down
| DATA | <- globals
| CODE | <- pc points here
+------------------+
```
---
## The `rv_state` Struct
```c
#include
#define NREGS 32
#define STACK_SIZE 8192 // 8 KB
struct rv_state {
uint64_t regs[NREGS]; // x0..x31
uint64_t pc; // program counter
uint8_t stack[STACK_SIZE]; // emulated stack
};
```
Why 5-bit register fields? 25 = 32 — exactly enough to index all 32 registers.
---
## The 32-bit Instruction Word
Every RISC-V base ISA instruction is exactly **32 bits**. Six formats slice those bits differently:
```text
R-type: | funct7 | rs2 | rs1 |funct3| rd | opcode |
I-type: | imm[11:0] | rs1 |funct3| rd | opcode |
S-type: | imm[11:5]| rs2| rs1 |funct3|imm4:0 | opcode |
B-type: | imm(scattered)| rs1|funct3| imm | opcode |
U-type: | imm[31:12] | rd | opcode |
J-type: | imm(scattered) | rd | opcode |
```
The opcode [6:0] is always in the same position — it tells us the format.
---
## R-Type Field Layout
```text
bit: 31 25 24 20 19 15 14 12 11 7 6 0
[ funct7 ] [ rs2 ] [ rs1 ] [fn3] [ rd ] [opcode]
7 bits 5 bits 5 bits 3 b 5 bits 7 bits
```
Decoding steps:
1. Check **opcode** → format
2. Check **funct3** → operation family
3. Check **funct7** → exact operation (e.g., ADD vs SUB)
---
## Worked Example: `add a0, a0, a1`
Instruction word: **`0x00B50533`**
```text
Binary: 0000 0000 1011 0101 0000 0101 0011 0011
funct7 rs2 rs1 funct3 rd opcode
0000000 01011 01010 000 01010 0110011
```
| Field | Value | Meaning |
|-------|-------|---------|
| opcode | `0110011` | R-type |
| rd | `01010` = 10 | `a0` |
| funct3 | `000` | ADD/SUB family |
| rs1 | `01010` = 10 | `a0` |
| rs2 | `01011` = 11 | `a1` |
| funct7 | `0000000` | ADD |
---
## Decoding Diagram: `0x00B50533`
graph LR
IW["0x00B50533"] --> OP["opcode=0110011\nR-type"]
IW --> RD["rd=01010\na0 (x10)"]
IW --> F3["funct3=000\nADD/SUB family"]
IW --> R1["rs1=01010\na0 (x10)"]
IW --> R2["rs2=01011\na1 (x11)"]
IW --> F7["funct7=0000000\nADD"]
---
## R-Type Operation Table
All R-type: opcode = `0110011`
| Instruction | funct7 | funct3 | Operation |
|-------------|--------|--------|-----------|
| `add` | `0000000` | `000` | `rd = rs1 + rs2` |
| `sub` | `0100000` | `000` | `rd = rs1 - rs2` |
| `mul` | `0000001` | `000` | `rd = rs1 * rs2` |
| `sll` | `0000000` | `001` | `rd = rs1 << rs2` |
| `srl` | `0000000` | `101` | `rd = rs1 >> rs2` (logical) |
| `sra` | `0100000` | `101` | `rd = rs1 >> rs2` (arith) |
| `and` | `0000000` | `111` | `rd = rs1 & rs2` |
| `or` | `0000000` | `110` | `rd = rs1 \| rs2` |
| `xor` | `0000000` | `100` | `rd = rs1 ^ rs2` |
---
## Reading the Instruction Word in C
```c
// add2_s is a real compiled assembly function
extern uint64_t add2_s(uint64_t a0, uint64_t a1);
// Cast function address to a pointer to 32-bit words
uint32_t *pc = (uint32_t *) add2_s;
uint32_t iw = *pc; // fetch first instruction
pc = pc + 1; // advance 4 bytes (one instruction)
iw = *pc; // fetch second instruction
```
Inside the emulator:
```c
uint32_t iw = *((uint32_t *) rsp->pc); // fetch at pc
```
A uint32_t * advances by 4 bytes per +1 — exactly one instruction.
---
## Shift-and-Mask: The Recipe
Extract `n` bits starting at bit position `start`:
```text
field = (iw >> start) & ((1 << n) - 1)
```
Example — extract **opcode** (bits `[6:0]`):
```text
iw = ...xxxx xxxx 0110011 (low 7 bits)
mask = 01111111 (0x7F)
result= 0110011
```
`(1 << 7) - 1 = 0x7F` — seven 1-bits, masking exactly the opcode.
---
## `get_bits` Helper
```c
static inline uint32_t get_bits(uint32_t iw,
uint32_t start,
uint32_t count) {
uint32_t mask = (1u << count) - 1u;
return (iw >> start) & mask;
}
```
One-liner extractors for every R-type field:
```c
static uint32_t get_opcode(uint32_t iw) { return get_bits(iw, 0, 7); }
static uint32_t get_rd (uint32_t iw) { return get_bits(iw, 7, 5); }
static uint32_t get_funct3(uint32_t iw) { return get_bits(iw, 12, 3); }
static uint32_t get_rs1 (uint32_t iw) { return get_bits(iw, 15, 5); }
static uint32_t get_rs2 (uint32_t iw) { return get_bits(iw, 20, 5); }
static uint32_t get_funct7(uint32_t iw) { return get_bits(iw, 25, 7); }
```
---
## Inline Equivalents (for reference)
```c
uint32_t iw = 0x00B50533;
uint32_t opcode = iw & 0x7F; // bits [6:0]
uint32_t rd = (iw >> 7) & 0x1F; // bits [11:7]
uint32_t funct3 = (iw >> 12) & 0x7; // bits [14:12]
uint32_t rs1 = (iw >> 15) & 0x1F; // bits [19:15]
uint32_t rs2 = (iw >> 20) & 0x1F; // bits [24:20]
uint32_t funct7 = (iw >> 25) & 0x7F; // bits [31:25]
```
Result: `opcode=0x33, rd=10, funct3=0, rs1=10, rs2=11, funct7=0`
Matches the hand-decoded `add a0, a0, a1`.
---
## Interpreter vs. Emulator
| | Interpreter | Emulator |
|---|---|---|
| Input | Source / bytecode / AST | Target machine code |
| Models a specific CPU? | Not necessarily | Yes |
| Lab 6 project | reads & runs instructions | models RISC-V CPU |
Our program interprets one instruction at a time, but it is best called an emulator because it faithfully models RISC-V hardware state.
---
## Fetch-Decode-Execute Cycle
flowchart LR
A["Fetch\niw = *pc"] --> B["Decode\nopcode, rd, funct3, ..."]
B --> C["Execute\nupdate regs / memory"]
C --> D["Update PC\npc += 4 or branch"]
D --> A
Both real hardware and our emulator run this loop — the only difference is C variables vs. silicon.
---
## The Emulate Loop
```c
uint64_t rv_emulate(struct rv_state *rsp) {
while (rsp->pc != 0) {
rv_one(rsp); // one fetch-decode-execute
rsp->regs[0] = 0; // x0 is hardwired to zero
}
return rsp->regs[10]; // result is in a0 (x10)
}
```
```c
void rv_one(struct rv_state *rsp) {
uint32_t iw = *((uint32_t *) rsp->pc); // FETCH
uint32_t opcode = get_opcode(iw); // DECODE
switch (opcode) {
case 0b0110011: emu_r_type(rsp, iw); break; // R-type
case 0b0010011: emu_i_type(rsp, iw); break; // I-type arith
case 0b1100111: emu_jalr(rsp, iw); break; // jalr/ret
default: printf("Unknown opcode 0x%X\n", opcode); exit(1);
}
}
```
---
## Executing an R-Type Instruction
```c
void emu_r_type(struct rv_state *rsp, uint32_t iw) {
uint32_t rd = get_rd(iw), rs1 = get_rs1(iw), rs2 = get_rs2(iw);
uint32_t f3 = get_funct3(iw), f7 = get_funct7(iw);
if (f3 == 0 && f7 == 0b0000000)
rsp->regs[rd] = rsp->regs[rs1] + rsp->regs[rs2]; // add
else if (f3 == 0 && f7 == 0b0100000)
rsp->regs[rd] = rsp->regs[rs1] - rsp->regs[rs2]; // sub
else if (f3 == 0 && f7 == 0b0000001)
rsp->regs[rd] = rsp->regs[rs1] * rsp->regs[rs2]; // mul
else if (f3 == 7 && f7 == 0b0000000)
rsp->regs[rd] = rsp->regs[rs1] & rsp->regs[rs2]; // and
// ... more cases ...
rsp->pc += 4;
}
```
`uint64_t` arithmetic wraps on overflow exactly like RISC-V hardware.
---
## Initializing the Emulator
```c
void rv_init(struct rv_state *rsp, uint64_t (*func)(),
uint64_t a0, uint64_t a1, uint64_t a2, uint64_t a3) {
for (int i = 0; i < NREGS; i++) rsp->regs[i] = 0;
rsp->pc = (uint64_t) func; // first instruction
rsp->regs[10] = a0; rsp->regs[11] = a1; // args in a0..a3
rsp->regs[12] = a2; rsp->regs[13] = a3;
// sp starts at TOP of stack array (stack grows down)
rsp->regs[2] = (uint64_t) &rsp->stack[STACK_SIZE];
rsp->regs[1] = 0; // ra = 0 (halt sentinel)
}
```
Usage:
```c
struct rv_state state;
rv_init(&state, (uint64_t (*)()) quadratic_s, 2, 4, 6, 8);
uint64_t result = rv_emulate(&state);
```
---
## Stack Layout
```text
rv_state emulated stack[]
+---------+ +------------------+ high addr
| regs[32]| sp --> | stack[STACK_SIZE]| <- sp starts here
| pc | | (grows down) |
| stack[] | | ... |
+---------+ | stack[0] | low addr
+------------------+
```
- `addi sp, sp, -16` → sp moves **down** (allocate)
- `addi sp, sp, 16` → sp moves **up** (free)
- Starting at `&stack[STACK_SIZE]` gives the full 8 KB to grow into
---
## How Emulation Stops: `ret` and `jalr`
flowchart TD
A["rv_init: ra = 0"] --> B["emulate instructions"]
B --> C{"pc != 0?"}
C -->|yes| B
C -->|no| D["return regs[a0]"]
E["ret = jalr x0, 0(ra)\nsets pc = ra = 0"] --> C
```c
void emu_jalr(struct rv_state *rsp, uint32_t iw) {
uint32_t rs1 = get_rs1(iw);
rsp->pc = rsp->regs[rs1]; // pc = ra; top call: ra==0 -> stop
}
```
---
## Beyond R-Type: Other Formats
| Format | Opcode | Used by | Key note |
|--------|--------|---------|----------|
| **I-type** | `0010011` | `addi`, `li`, `lw`, `jalr` | 12-bit sign-extended imm in `[31:20]` |
| **B-type** | `1100011` | `beq`, `bne`, `blt`, `bge` | 13-bit scattered imm; if taken `pc += imm` |
| **J-type** | `1101111` | `jal`, `j`, `call` | 21-bit scattered imm; `pc += imm` |
Sign extension for immediates:
```c
static int64_t sign_extend(uint64_t value, int start) {
int shift = 63 - start;
return ((int64_t)(value << shift)) >> shift;
}
```
---
## I-Type: `addi` (and `li`, `mv`)
```c
void emu_i_type(struct rv_state *rsp, uint32_t iw) {
uint32_t rd = get_rd(iw);
uint32_t rs1 = get_rs1(iw);
uint32_t funct3 = get_funct3(iw);
int64_t imm = sign_extend(get_bits(iw, 20, 12), 11);
if (funct3 == 0b000) { // addi
rsp->regs[rd] = rsp->regs[rs1] + imm;
}
rsp->pc += 4;
}
```
li a0, 5 is addi a0, zero, 5; mv a0, a1 is addi a0, a1, 0.
Implementing addi gives you li and mv for free.
---
## Lab 6: Incremental Strategy
flowchart TD
A["Pick target program\n(start: quadratic)"] --> B["Run; see unsupported\ninstruction"]
B --> C["Identify format\nfrom opcode"]
C --> D["Decode fields\nwith get_bits"]
D --> E["Implement operation\nupdate pc"]
E --> F{"Emu == C == Asm?"}
F -->|no| B
F -->|yes| G["Move to next program"]
---
## Expected Lab 6 Output
```bash
$ ./lab06 quadratic 2 4 6 8
C: 36
Asm: 36
Emu: 36
$ ./lab06 max3 3 8 5
C: 8
Asm: 8
Emu: 8
```
All three lines must agree. Instructions you will likely need:
- R-type: `add`, `sub`, `mul`, `and`, `or`, `xor`, `sll`, `srl`, `sra`
- I-type: `addi` (covers `li`, `mv`), loads
- Control: `jalr` (`ret`), `jal` (`j`), branches (`beq`, `bne`, `blt`, `bge`)
---
## Key Concepts Cheat Sheet
| Concept | Quick Reference |
|---------|-----------------|
| `iw = 0x00B50533` | `add a0, a0, a1` |
| opcode `[6:0]` = `0110011` | R-type |
| `get_bits(iw, 7, 5)` | `rd` field |
| `(iw >> 15) & 0x1F` | `rs1` field |
| `pc += 4` | advance one instruction |
| `ra = 0` at init | halt sentinel for `ret` |
| `sp = &stack[STACK_SIZE]` | stack grows down |
---
## Summary
1. **Machine code is 32-bit binary** — the assembler produces it; the emulator reads it
2. **`struct rv_state`** mirrors the hardware: 32 registers, `pc`, and a stack
3. **opcode `[6:0]` selects format; funct3/funct7 select the operation**
4. **Decode with shift-and-mask**: `(iw >> start) & mask` — use `get_bits` for all six fields
5. **Fetch with a pointer cast**: `iw = *((uint32_t *) rsp->pc)`
6. **Fetch-decode-execute loops until `pc == 0`** — `ra = 0` at init makes the top-level `ret` stop the loop
7. **Extend incrementally**: run a program, find the first unsupported instruction, implement it, repeat