← Back to Course
# RISC-V Emulation: Immediates, JAL, and Memory ## CS 315 Computer Architecture --- ## Overview Building the `rv_emu` RISC-V emulator continues. Today we tackle the three hardest decode/execute jobs: - **Immediates** — reconstructing signed constants from scattered bit fields - **Branches** — PC-relative offsets, condition evaluation, PC update - **JAL** — jump-and-link: the machine behind `call` and `j` - **Memory** — loads and stores as typed C pointer dereferences --- ## Lab 6 and Project 4 | Item | Detail | |------|--------| | Lab 6 (RISC-V Emulation) | Due Wed Oct 1 | | Project 4 (Emu + Analysis + Cache) | Due Tue Oct 7 | | Project 4 Interactive Grading | Wed Oct 8 | | Midterm | Thu Oct 9 | ```text Lab 6 starter rv_emu.c ~131 lines solution rv_emu.c ~225 lines ``` The ~90-line gap is exactly the code this lecture develops. --- ## Workflow: Adding an Instruction
flowchart TD A["Pick a failing test program"] --> B["Disassemble / inspect asm"] B --> C["Identify REAL instructions used"] C --> D["Look up opcode + funct3 on cheat sheet"] D --> E["Determine instruction TYPE (R/I/B/S/J)"] E --> F["Extract fields with get_bits"] F --> G["Implement execute logic"] G --> H["Run test, iterate"] H -->|more failures| A
--- ## Pseudo-instructions vs. Real Instructions
The assembler emits
real
machine instructions. The disassembler shows only real instructions — those are what the emulator must decode.
| Pseudo-instruction | Real instruction emitted | |--------------------|--------------------------| | `li t0, 99` | `addi t0, zero, 99` | | `j label` | `jal zero, offset` | | `call foo` | `jal ra, offset` | | `bgt r1, r2, L` | `blt r2, r1, L` (operands swapped!) | --- ## `li` is Really `addi` ```text li t0, 99 # pseudo-instruction addi t0, zero, 99 # real instruction (assembler emits this) ``` - `x0` (`zero`) is hardwired to 0, so `t0 = 0 + 99 = 99` - The constant `99` is encoded **inside** the 32-bit instruction word — that is the *immediate*
flowchart LR A["li t0, 99"] -->|assembler| B["addi t0, zero, 99"] B -->|encode| C["0x06300293"] C -->|emulator decodes| D["t0 = regs[zero] + 99 = 99"]
--- ## I-type Instruction Layout ```text 31 20 19 15 14 12 11 7 6 0 | imm[11:0] | rs1 | funct3 | rd | opcode | | 12 bits | 5 bits | 3 bits | 5 bits| 7 bits | ``` **Decode `0x06300293` by hand:** | Field | Bits | Value | Meaning | |-------|------|-------|---------| | opcode | [6:0] | `0x13` | I-type ALU | | rd | [11:7] | `5` | `t0` | | funct3 | [14:12] | `0` | `addi` | | rs1 | [19:15] | `0` | `zero` | | imm[11:0] | [31:20] | `0b000001100011` | **99** | --- ## `get_bits` Helper ```c // Extract `count` bits starting at bit `start` (LSB position) uint32_t get_bits(uint32_t iw, uint32_t start, uint32_t count) { uint32_t mask = (1u << count) - 1; return (iw >> start) & mask; } uint32_t get_rd(uint32_t iw) { return get_bits(iw, 7, 5); } uint32_t get_rs1(uint32_t iw) { return get_bits(iw, 15, 5); } uint32_t get_rs2(uint32_t iw) { return get_bits(iw, 20, 5); } uint32_t get_funct3(uint32_t iw) { return get_bits(iw, 12, 3); } uint32_t get_funct7(uint32_t iw) { return get_bits(iw, 25, 7); } ``` I-type immediate: `get_bits(iw, 20, 12)` — 12 bits starting at bit 20. --- ## Sign Extension Raw `get_bits` gives an **unsigned** value. Immediates are **signed** two's complement. Example: `li t0, -2` → 12-bit field = `0b111111111110` ```text Zero-extend: 0x0000_0000_0000_0FFE = 4094 (WRONG) Sign-extend: 0xFFFF_FFFF_FFFF_FFFE = -2 (CORRECT) ``` Must **replicate the sign bit** (bit 11) into all upper bits. --- ## The `sign_extend` Helper ```c // Replicate bit `sign_bit` of `value` into all higher bits. int64_t sign_extend(uint64_t value, int sign_bit) { int shift = 63 - sign_bit; return ((int64_t)(value << shift)) >> shift; } ``` For a 12-bit immediate (`sign_bit = 11`, `shift = 52`):
flowchart TD A["raw 12-bit field
get_bits(iw, 20, 12)"] --> B["shift left by 52
sign bit lands in bit 63"] B --> C["arithmetic right shift by 52
fills with sign bit"] C --> D["int64_t imm (correct signed value)"]
The cast to
int64_t
is essential:
>>
on a signed type is an
arithmetic
shift (fills with sign bit), not a logical shift (fills with 0).
--- ## B-type Branch Layout Branch immediates are **scattered** across four fields: ```text 31 30 25 24 20 19 15 14 12 11 8 7 6 0 |imm[12]|imm[10:5]|rs2 |rs1 |funct3|imm[4:1]|imm[11]|opcode | ``` No bit 0 — instructions are 2-byte aligned, so the low bit is always 0 (implicit). Assembled immediate: `imm[12] imm[11] imm[10:5] imm[4:1] 0` (13-bit signed, sign bit = 12) --- ## Reconstructing the Branch Immediate **Step 1: Get the parts** ```c uint32_t imm12 = get_bits(iw, 31, 1); // imm[12] 1 bit uint32_t imm10_5 = get_bits(iw, 25, 6); // imm[10:5] 6 bits uint32_t imm4_1 = get_bits(iw, 8, 4); // imm[4:1] 4 bits uint32_t imm11 = get_bits(iw, 7, 1); // imm[11] 1 bit ``` **Step 2: Combine** (shift each piece to its final bit position) ```c uint64_t uimm = (imm12 << 12) | (imm11 << 11) | (imm10_5 << 5) | (imm4_1 << 1); ``` **Step 3: Sign-extend** (sign bit is bit 12) ```c int64_t imm = sign_extend(uimm, 12); ``` --- ## Branch Logic: PC Update Branches use **PC-relative** addressing: ```c if (take) { rsp->pc += imm; // taken: jump by signed offset } else { rsp->pc += 4; // not taken: next sequential instruction } ```
A backward branch (loop top) has a
negative
offset — that is why
imm
must be signed.
--- ## Branch Conditions (`funct3`) | Instruction | funct3 | Condition | |-------------|--------|-----------| | `beq` | `0x0` | `regs[rs1] == regs[rs2]` | | `bne` | `0x1` | `regs[rs1] != regs[rs2]` | | `blt` | `0x4` | `(int64_t)regs[rs1] < (int64_t)regs[rs2]` | | `bge` | `0x5` | `(int64_t)regs[rs1] >= (int64_t)regs[rs2]` | | `bltu` | `0x6` | `regs[rs1] < regs[rs2]` (unsigned) | | `bgeu` | `0x7` | `regs[rs1] >= regs[rs2]` (unsigned) | For signed comparisons (`blt`, `bge`), cast to `int64_t` — otherwise a negative value looks like a huge positive number. --- ## Complete `emu_b_type` ```c void emu_b_type(struct rv_state *rsp, uint32_t iw) { uint32_t rs1 = get_rs1(iw), rs2 = get_rs2(iw); uint32_t funct3 = get_funct3(iw); uint64_t uimm = (get_bits(iw,31,1) << 12) | (get_bits(iw,7,1) << 11) | (get_bits(iw,25,6) << 5) | (get_bits(iw,8,4) << 1); int64_t imm = sign_extend(uimm, 12); bool take = false; switch (funct3) { case 0x0: take = (rsp->regs[rs1] == rsp->regs[rs2]); break; // beq case 0x1: take = (rsp->regs[rs1] != rsp->regs[rs2]); break; // bne case 0x4: take = ((int64_t)rsp->regs[rs1] < (int64_t)rsp->regs[rs2]); break; case 0x5: take = ((int64_t)rsp->regs[rs1] >= (int64_t)rsp->regs[rs2]); break; } rsp->pc += take ? imm : 4; } ``` --- ## JAL — Jump and Link `jal rd, offset` does two things: 1. **Link**: `regs[rd] = pc + 4` (save return address) 2. **Jump**: `pc += offset` (PC-relative, signed) | Pseudo | Real | Effect | |--------|------|--------| | `call foo` | `jal ra, offset` | Save return addr in `ra`, jump to `foo` | | `j label` | `jal zero, offset` | Discard link (write to `x0`), just jump | `ret` is `jalr zero, 0(ra)` — jumps to whatever address is in `ra`. --- ## JAL in Action
flowchart LR A["call foo\n(jal ra, offset)"] --> B["ra = pc + 4"] B --> C["pc += offset\n(enter foo)"] C --> D["...foo body..."] D --> E["ret\n(jalr zero, 0(ra))"] E --> F["pc = ra\n(back after call)"]
--- ## J-type Immediate and `emu_jal` J-type immediate is also scattered (sign bit = 20, implicit low zero): ```c uint32_t imm20 = get_bits(iw, 31, 1); uint32_t imm10_1 = get_bits(iw, 21, 10); uint32_t imm11 = get_bits(iw, 20, 1); uint32_t imm19_12 = get_bits(iw, 12, 8); uint64_t uimm = (imm20 << 20) | (imm19_12 << 12) | (imm11 << 11) | (imm10_1 << 1); int64_t imm = sign_extend(uimm, 20); ``` ```c void emu_jal(struct rv_state *rsp, uint32_t iw) { uint32_t rd = get_rd(iw); /* reconstruct imm as above */ if (rd != 0) { rsp->regs[rd] = rsp->pc + 4; // LINK } rsp->pc += imm; // JUMP } ``` --- ## Memory: The Mental Model A RISC-V load is a **typed C pointer dereference**: ```text lw t0, 8(a0) → t0 = *((uint32_t *)(a0 + 8)) ``` A store is the dereference on the **left**: ```text sw t0, 8(a0) → *((uint32_t *)(a0 + 8)) = t0 ``` The pointer **cast** controls: - How many bytes are read/written - Whether the value is sign- or zero-extended --- ## Load Instruction: I-type Encoding Loads share the I-type encoding with `addi`: ```text 31 20 19 15 14 12 11 7 6 0 | offset[11:0] | rs1 | funct3 | rd | opcode | | signed imm | base | width | dest | (0x03) | ``` | Field | Role | |-------|------| | offset[11:0] | signed 12-bit immediate (same as `addi`) | | rs1 | base address register | | funct3 | access width / signedness | | rd | destination register | --- ## Load Widths and C Casts **Target address**: `ta = regs[rs1] + sign_extend(offset, 11)` | Instruction | funct3 | C cast | Bytes | |-------------|--------|--------|-------| | `lb` | `0x0` | `*(int8_t *)ta` | 1 | | `lw` | `0x2` | `*(uint32_t *)ta` | 4 | | `ld` | `0x3` | `*(uint64_t *)ta` | 8 | Getting the cast wrong reads the wrong number of bytes or mis-interprets the sign. --- ## `emu_load` Sketch ```c void emu_load(struct rv_state *rsp, uint32_t iw) { uint32_t rd = get_rd(iw); uint32_t rs1 = get_rs1(iw); uint32_t funct3 = get_funct3(iw); int64_t offset = sign_extend(get_bits(iw, 20, 12), 11); uint64_t ta = rsp->regs[rs1] + offset; switch (funct3) { case 0x0: rsp->regs[rd] = *((int8_t *)ta); break; // lb case 0x2: rsp->regs[rd] = *((uint32_t *)ta); break; // lw case 0x3: rsp->regs[rd] = *((uint64_t *)ta); break; // ld } rsp->pc += 4; } ``` --- ## Store Instruction: S-type Encoding Stores are **S-type** — no `rd` field; immediate is split to keep register fields in place: ```text 31 25 24 20 19 15 14 12 11 7 6 0 |imm[11:5] |rs2 |rs1 |funct3|imm[4:0] |opcode | ``` | Field | Role | |-------|------| | rs1 | base address register | | rs2 | value to store | | funct3 | access width | | imm[11:5] + imm[4:0] | signed offset (split across high/low ends) | --- ## Reconstructing the S-type Immediate ```c uint32_t imm11_5 = get_bits(iw, 25, 7); // high 7 bits uint32_t imm4_0 = get_bits(iw, 7, 5); // low 5 bits int64_t offset = sign_extend((imm11_5 << 5) | imm4_0, 11); ``` --- ## `emu_store` Sketch ```c void emu_store(struct rv_state *rsp, uint32_t iw) { uint32_t rs1 = get_rs1(iw), rs2 = get_rs2(iw); uint32_t funct3 = get_funct3(iw); int64_t offset = sign_extend( (get_bits(iw,25,7) << 5) | get_bits(iw,7,5), 11); uint64_t ta = rsp->regs[rs1] + offset; switch (funct3) { case 0x0: *((uint8_t *)ta) = (uint8_t) rsp->regs[rs2]; break; // sb case 0x2: *((uint32_t *)ta) = (uint32_t)rsp->regs[rs2]; break; // sw case 0x3: *((uint64_t *)ta) = rsp->regs[rs2]; break; // sd } rsp->pc += 4; } ``` --- ## Loads vs. Stores at a Glance | | Load (`lw`) | Store (`sw`) | |--|-------------|--------------| | Format | I-type | S-type | | C model | `rd = *(uint32_t *)(rs1+off)` | `*(uint32_t *)(rs1+off) = rs2` | | Dereference side | right (read) | left (write) | | Has `rd`? | yes | no (uses `rs2`) | | Immediate | contiguous `[11:0]` | split `[11:5]` + `[4:0]` | | Direction | memory → register | register → memory | --- ## The Full Decode/Dispatch Picture
flowchart TD A["Fetch: iw = *(uint32_t *)pc"] --> B["opcode = get_bits(iw, 0, 7)"] B --> C{"dispatch on opcode"} C -->|0x13| D["emu_i_type\naddi: imm[11:0]"] C -->|0x33| E["emu_r_type\nadd/sub"] C -->|0x63| F["emu_b_type\n4 imm fields, PC-relative"] C -->|0x6F| G["emu_jal\nlink rd=pc+4, pc+=imm"] C -->|0x03| H["emu_load\nrd = *(cast)(rs1+off)"] C -->|0x23| I["emu_store\n*(cast)(rs1+off) = rs2"]
--- ## Project 4: Dynamic Analysis Project 4 layers **instruction counters** on each dispatch point: ```c // Inside emu_b_type: if (take) rsp->analysis.b_taken++; else rsp->analysis.b_not_taken++; // Inside emu_load: rsp->analysis.ld_count++; // Inside emu_store: rsp->analysis.st_count++; ``` One increment per case — the dispatch structure already touches every instruction exactly once. --- ## Key Concepts | Concept | Key Point | |---------|-----------| | `get_bits(iw, start, count)` | Shift-and-mask field extraction | | `sign_extend(val, sign_bit)` | Shift up then arithmetic shift right | | I-type immediate | Contiguous `[31:20]`, 12 bits | | B-type immediate | 4 scattered fields, implicit bit 0 = 0 | | PC-relative branch | `pc += imm` if taken, `pc += 4` if not | | JAL | Link `rd = pc+4`, jump `pc += imm` | | Load (I-type) | `rd = *(cast *)(rs1 + offset)` | | Store (S-type) | `*(cast *)(rs1 + offset) = rs2` | --- ## Summary 1. **`li` is `addi`** — constants are embedded in the instruction word as immediates 2. **I-type decode** — `get_bits(iw, 20, 12)` extracts the 12-bit immediate; sign-extend with `sign_extend(val, 11)` 3. **Sign extension** — shift left to move sign bit to bit 63, then arithmetic shift right; cast to `int64_t` is required 4. **Branch immediates** — 4 scattered fields, implicit low zero, 13-bit signed offset, sign bit = 12 5. **Branch logic** — `pc += imm` if taken, `pc += 4` if not; `funct3` selects `beq`/`bne`/`blt`/`bge` 6. **JAL** — `call` = `jal ra`; `j` = `jal zero`; guard `if (rd != 0)` to preserve `x0 = 0` 7. **Loads and stores** — typed C pointer dereferences; cast controls width; I-type for loads, S-type for stores