Skip to content

RISC-V Emulation: Immediates, JAL, and Memory

Overview

This lecture continues building the rv_emu RISC-V emulator. Having fetched, decoded the opcode, and dispatched on instruction type in earlier sessions, we now focus on the three hardest decode/execute jobs: reconstructing immediates from the scattered bit fields RISC-V uses for I-type and B-type instructions, implementing branches (compute a PC-relative offset, evaluate a condition, update the PC), implementing JAL (jump-and-link, the machine instruction behind call and j), and implementing memory instructions (loads and stores) by translating the assembly into C pointer dereferences. Lab 6 is the starter emulator; Project 4 expands it with more instructions, dynamic analysis, and a cache simulator.

Learning Objectives

  • Decode an I-type instruction word by hand and extract the 12-bit signed immediate at bit 20
  • Use the get_bits(iw, start, count) helper to slice fields out of an instruction word
  • Sign-extend a narrow two's complement immediate to a 64-bit value using shifts
  • Reconstruct a B-type branch immediate from its four scattered bit fields and account for the implicit low-order zero
  • Implement branch logic as PC-relative addressing: pc += imm if taken, pc += 4 if not
  • Implement the JAL instruction and explain how call and j are pseudo-instructions built on it
  • Translate load and store instructions into correctly typed C pointer dereferences (lb/lw/ld, sw)
  • Distinguish I-type (loads) from S-type (stores) encodings and their immediate layouts

Prerequisites

  • The fetch-decode-execute cycle and the struct rv_state (Lecture 6)
  • Bit manipulation: shift, mask, OR-assembly, and sign extension with shifts (Lecture 6)
  • Two's complement representation and signed vs. unsigned C types
  • RISC-V instruction formats: R-type and I-type field layouts
  • C pointers, pointer arithmetic, and type casts
  • Pseudo-instructions vs. real instructions (li, j, call, bgt)

1. Where We Are: Lab 6 and Project 4

The course schedule framing the session:

Item Detail
Lab 6 (RISC-V Emulation) Due Wed Oct 1, 11:59pm
Extra office hours Today 4:30–5:30pm (Zoom and office)
Project 4 (Emulation + Analysis + Cache) Published; due Tue Oct 7, 11:59pm
Project 4 Interactive Grading Wed Oct 8
Lab 7 Exam-like problems (with the Fall 2024 CS 315 midterm for practice)
Midterm Thu Oct 9

A useful sense of scale from the instructor's reference solution:

Lab 6   starter  rv_emu.c   ~131 lines of code
        solution rv_emu.c   ~225 lines of code

The gap between starter and solution — roughly 90 lines — is exactly the immediate reconstruction, branch logic, JAL, and memory code this lecture develops. The emulator does not need to implement all of RISC-V; only enough real instructions to run the given assembly programs (quadratic_s, midpoint_s, max3_s, get_bitseq_s, and for Project 4 also to_upper, swap_s, sort_s, fib_rec_s, and more).

The workflow for adding an instruction

flowchart TD
    A["Pick a failing test program"] --> B["Disassemble / inspect the asm"]
    B --> C["Identify the REAL instructions it uses"]
    C --> D["Look up opcode + funct3/funct7 on the cheat sheet"]
    D --> E["Determine the instruction TYPE (R, I, B, S, J)"]
    E --> F["Extract fields with get_bits"]
    F --> G["Implement the execute logic for that type"]
    G --> H["Run the test, iterate"]
    H -->|more failures| A

A key reminder from the discussion: the assembler emits real machine instructions, but assembly source often uses pseudo-instructions. bgt r1, r2, L is assembled as the real instruction blt r2, r1, L (operands swapped). When you disassemble, you will only ever see the real instructions — those are the ones the emulator must decode.


2. Immediates: li Is Really addi

The simplest immediate appears when loading a constant into a register:

li t0, 99        # pseudo-instruction
addi t0, zero, 99   # the real instruction the assembler emits

li ("load immediate") does not exist in hardware. The assembler rewrites it as addi t0, zero, 99, which computes t0 = x0 + 99. Because x0 is hardwired to 0, the result is simply 99. The constant 99 rides along inside the 32-bit instruction word itself — that embedded constant is the immediate.

flowchart LR
    A["li t0, 99"] -->|assembler| B["addi t0, zero, 99"]
    B -->|encode| C["0x06300293"]
    C -->|emulator decodes| D["t0 = regs[zero] + 99 = 99"]

Immediates are encoded in clever, hardware-friendly ways. The encoding optimizes the hardware (so that, for example, the sign bit is always in the same place, and register fields never move), at the cost of making decode in software a little fiddly. That fiddliness is what this lecture is about.


3. Decoding an I-type Instruction by Hand

The instructor worked a complete decode of an I-type addi. The instruction word is:

0x06300293

Expanding to binary and grouping into the I-type fields:

0000 0110 0011  0000 0  000   00101  001 0011
\___________/   \____/  \_/   \____/ \______/
 imm[11:0]       rs1   funct3   rd    opcode
   12 bits      (zero) (addi) (t0=x5) (I-type)

The I-type layout (bit 31 on the left, bit 0 on the right):

 31          20 19    15 14   12 11   7 6        0
| imm[11:0]    | rs1    | funct3 | rd   | opcode  |
|   12 bits    | 5 bits | 3 bits |5 bits| 7 bits  |

Field-by-field decode:

Field Bits Value (bin) Meaning
opcode [6:0] 0010011 (0x13) I-type ALU op (addi, slli, ori, …)
rd [11:7] 00101 = 5 destination = x5 = t0
funct3 [14:12] 000 selects addi within the I-type group
rs1 [19:15] 00000 = 0 source = x0 = zero
imm[11:0] [31:20] 000001100011 the immediate

Converting the immediate to decimal:

imm = 0b000001100011
    = 64 + 32 + 2 + 1
    = 96 + 3
    = 99

So the instruction is addi t0, zero, 99t0 = 99, exactly as the li t0, 99 pseudo-instruction promised.

Field helpers

Rather than re-deriving shift/mask constants every time, the emulator uses small helper functions over the instruction word iw:

// get_bits(iw, start, count): extract `count` bits starting at bit `start`
uint32_t get_rd(uint32_t iw)     { return get_bits(iw, 7, 5); }
uint32_t get_rs1(uint32_t iw)    { return get_bits(iw, 15, 5); }
uint32_t get_rs2(uint32_t iw)    { return get_bits(iw, 20, 5); }
uint32_t get_funct3(uint32_t iw) { return get_bits(iw, 12, 3); }
uint32_t get_funct7(uint32_t iw) { return get_bits(iw, 25, 7); }

get_bits is itself just the shift-and-mask pattern from the bit-manipulation lecture:

uint32_t get_bits(uint32_t iw, uint32_t start, uint32_t count) {
    uint32_t mask = (1u << count) - 1;   // count low 1-bits
    return (iw >> start) & mask;         // bring field to bit 0, then mask
}

The I-type immediate is the 12 bits starting at bit 20:

uint64_t imm11_0 = get_bits(iw, 20, 12);   // raw 12-bit field, unsigned

4. Sign-Extending the Immediate

The raw field imm11_0 is only 12 bits wide and unsigned. But RISC-V immediates are signed two's complement values, and registers are 64 bits. Consider li t0, -2:

li t0, -2          # pseudo
addi t0, zero, -2  # real
0xFFE00293         # encoding

The 12-bit immediate field holds -2 in 12-bit two's complement:

imm[11:0] = 1111 1111 1110     (12 bits, value -2)

If we simply zero-extended this 12-bit field into a 64-bit register, we would get 0x0000_0000_0000_0FFE = 4094, which is wrong. We must sign-extend: replicate the sign bit (bit 11) across all the upper bits so the value stays -2:

... 1111 1111 1111 1111 1111 1110   = -2 in 64 bits  (correct)
... 0000 0000 0000 1111 1111 1110   = 4094            (wrong, zero-extended)

The instructor double-checked -2 by negating the bit pattern and adding 1:

   1111 1111 1110     (the 12-bit field)
~  0000 0000 0001     (invert)
+              1     (add 1)
=  0000 0000 0010     = 2, so the original is -(2) = -2   ✓

The sign-extend helper

We sign-extend with the shift-left-then-arithmetic-shift-right trick. The second argument is the index of the sign bit (bit 11 for a 12-bit immediate):

int64_t imm = sign_extend(imm11_0, 11);
// Replicate bit `sign_bit` of `value` into all higher bits.
int64_t sign_extend(uint64_t value, int sign_bit) {
    int shift = 63 - sign_bit;       // move sign bit up to bit 63
    return ((int64_t) (value << shift)) >> shift;  // arithmetic shift back
}

How it works for a 12-bit immediate (sign_bit = 11, shift = 52):

            sign bit (bit 11)
                 |
value          [ 1 1111 1111 110 ]                 (12-bit -2)
value << 52    [ 1 ... ] 0000...0    sign bit now sits in bit 63
>> 52 (SRA)    1111...1111 1110      sign bit replicated downward = -2
flowchart TD
    A["raw 12-bit field<br/>get_bits(iw, 20, 12)"] --> B["shift left by 63 - 11 = 52<br/>sign bit lands in bit 63"]
    B --> C["arithmetic shift right by 52<br/>SRA fills with the sign bit"]
    C --> D["int64_t imm = -2"]

The whole reason this works is that >> on a signed integer (int64_t) is an arithmetic shift in C — it fills vacated bits with the sign bit. On an unsigned type it would be a logical shift (fill with 0), which destroys the sign. So the cast to int64_t before the right shift is essential. (For programs that need a wider constant than 12 bits fit, the assembler synthesizes a sequence such as lui + addiw; our emulator targets only programs whose constants fit in 12 bits.)


5. Branches: B-type Immediate Reconstruction

Branch instructions (beq, bne, blt, bge) are B-type. Their immediate is a PC-relative offset — how far to jump from the current instruction — and it is split into four scattered fields inside the instruction word. RISC-V scatters the bits so that, across formats, the most-significant immediate bits and register fields stay in fixed positions, which simplifies the hardware.

The B-type layout

 31      30    25 24  20 19  15 14  12 11    8 7      6      0
|imm[12]|imm[10:5]|rs2  |rs1  |funct3|imm[4:1]|imm[11]|opcode |
|  1    |   6     | 5   | 5   |  3   |   4    |  1    |  7    |

Notice the immediate bits are out of order: imm[12], then imm[10:5], then imm[4:1], then imm[11] — and there is no bit 0. The low bit is always 0 because instructions are 2-byte aligned, so RISC-V does not waste an encoding bit storing it. The reconstructed immediate is therefore:

imm = imm[12] imm[11] imm[10:5] imm[4:1] 0
       ^                                 ^
      bit 12                  implicit bit 0 = 0

This builds a 13-bit signed value whose sign bit is bit 12.

Step 1 — get the parts

Pull each scattered field out with get_bits (remember: get_bits(iw, start, count), where start is the LSB position of the field):

uint32_t imm12   = get_bits(iw, 31, 1);  // imm[12]   1 bit
uint32_t imm10_5 = get_bits(iw, 25, 6);  // imm[10:5] 6 bits
uint32_t imm4_1  = get_bits(iw, 8, 4);   // imm[4:1]  4 bits
uint32_t imm11   = get_bits(iw, 7, 1);   // imm[11]   1 bit

Step 2 — combine the parts

Shift each piece into its final position and OR them together. Each field's shift equals the bit position of its lowest bit in the assembled immediate:

uint64_t uimm = (imm12   << 12)
              | (imm11   << 11)
              | (imm10_5 << 5)
              | (imm4_1  << 1);
              // bit 0 stays 0 implicitly (instruction alignment)

In lecture the instructor first wrote a stray | 0 at the end and then crossed it out — OR-ing in a zero does nothing, so the implicit low-order zero needs no term at all.

Step 3 — sign-extend

The assembled value is a 13-bit signed offset; its sign bit is bit 12:

int64_t imm = sign_extend(uimm, 12);
flowchart TD
    A["instruction word iw"] --> B["Step 1: get parts<br/>imm12, imm11, imm10_5, imm4_1"]
    B --> C["Step 2: combine<br/>(imm12<<12) | (imm11<<11) | (imm10_5<<5) | (imm4_1<<1)"]
    C --> D["Step 3: sign_extend(uimm, 12)"]
    D --> E["int64_t imm = PC-relative offset"]

Step 4 — branch logic (PC update)

Branches use PC-relative addressing — not an absolute target. If the branch is taken, add the offset to the PC; otherwise fall through to the next instruction:

if (take) {
    rsp->pc += imm;     // taken: jump by the signed offset
} else {
    rsp->pc += 4;       // not taken: next sequential instruction
}

This is why imm must be signed: a backward branch (the top of a loop) has a negative offset, and the PC moves backward. The += 4 in the not-taken case is the normal "advance to the next instruction" step.

Step 5 — branch conditions

funct3 selects which comparison the branch performs. For beq (branch if equal), funct3 == 0x0:

bool take = false;

if (get_funct3(iw) == 0x0) {            // beq
    if (rsp->regs[rs1] == rsp->regs[rs2]) {
        take = true;
    }
}
// ... other funct3 values for bne, blt, bge, bltu, bgeu

The instructor noted the body of that if is just a boolean assignment, so it can be written more compactly:

bool take = (rsp->regs[rs1] == rsp->regs[rs2]);   // for beq

The full set of B-type comparisons:

Instruction funct3 Condition (take if true)
beq 0x0 regs[rs1] == regs[rs2]
bne 0x1 regs[rs1] != regs[rs2]
blt 0x4 (int64_t)regs[rs1] < (int64_t)regs[rs2] (signed)
bge 0x5 (int64_t)regs[rs1] >= (int64_t)regs[rs2] (signed)
bltu 0x6 regs[rs1] < regs[rs2] (unsigned)
bgeu 0x7 regs[rs1] >= regs[rs2] (unsigned)

For the signed comparisons (blt, bge), the register values must be cast to int64_t so the comparison treats them as signed — otherwise a negative value (with bit 63 set) would be treated as a huge positive number.

Putting branches together

void emu_b_type(struct rv_state *rsp, uint32_t iw) {
    uint32_t rs1     = get_rs1(iw);
    uint32_t rs2     = get_rs2(iw);
    uint32_t funct3  = get_funct3(iw);

    // 1) get parts, 2) combine, 3) sign-extend
    uint32_t imm12   = get_bits(iw, 31, 1);
    uint32_t imm10_5 = get_bits(iw, 25, 6);
    uint32_t imm4_1  = get_bits(iw, 8, 4);
    uint32_t imm11   = get_bits(iw, 7, 1);
    uint64_t uimm    = (imm12 << 12) | (imm11 << 11)
                     | (imm10_5 << 5) | (imm4_1 << 1);
    int64_t  imm     = sign_extend(uimm, 12);

    // 5) condition
    bool take = false;
    switch (funct3) {
        case 0x0: take = (rsp->regs[rs1] == rsp->regs[rs2]); break;  // beq
        case 0x1: take = (rsp->regs[rs1] != rsp->regs[rs2]); break;  // bne
        case 0x4: take = ((int64_t)rsp->regs[rs1] <  (int64_t)rsp->regs[rs2]); break; // blt
        case 0x5: take = ((int64_t)rsp->regs[rs1] >= (int64_t)rsp->regs[rs2]); break; // bge
    }

    // 4) PC update (PC-relative)
    if (take) {
        rsp->pc += imm;
    } else {
        rsp->pc += 4;
    }
}

jal ("jump and link") is the J-type instruction the assembler uses for both function calls and unconditional jumps. It does two things at once: it links (saves the return address into a destination register rd) and then it jumps (adds a PC-relative offset to the PC).

Two common pseudo-instructions reduce to jal:

call offset   ->   jal ra, offset      # rd = ra (x1): save return address, then jump
j offset      ->   jal zero, offset    # rd = zero (x0): discard the link, just jump
  • For call, rd is ra (x1). The address of the instruction after the jal is written into ra so the callee can ret back. Then the PC jumps to the function.
  • For j, rd is zero (x0). Writing to x0 is a no-op (it is hardwired to 0), so the link is effectively discarded — j is "just jump."

J-type immediate

Like branches, the J-type immediate is scattered and has an implicit low-order zero:

 31      30        21 20      19        12 11   7 6      0
|imm[20]|imm[10:1]   |imm[11]|imm[19:12]  | rd   |opcode |
|  1    |   10       |  1    |   8        | 5    |  7    |

Reconstructed (sign bit is bit 20, low bit implicit 0):

uint32_t imm20    = get_bits(iw, 31, 1);
uint32_t imm10_1  = get_bits(iw, 21, 10);
uint32_t imm11    = get_bits(iw, 20, 1);
uint32_t imm19_12 = get_bits(iw, 12, 8);

uint64_t uimm = (imm20 << 20) | (imm19_12 << 12)
              | (imm11 << 11) | (imm10_1 << 1);
int64_t  imm  = sign_extend(uimm, 20);

Execute logic

void emu_jal(struct rv_state *rsp, uint32_t iw) {
    uint32_t rd = get_rd(iw);
    /* ... reconstruct imm as above ... */

    if (rd != 0) {                 // x0 must stay 0; don't write the link to zero
        rsp->regs[rd] = rsp->pc + 4;   // LINK: address after this instruction
    }
    rsp->pc += imm;                // JUMP: PC-relative, just like a branch
}

The link value is pc + 4 — the instruction immediately following the jal. When the callee later executes ret (which is jalr zero, 0(ra)), control returns there. The companion instruction jalr ("jump and link register") computes its target from a register rather than a PC-relative offset; that is how ret returns to whatever address sits in ra.

flowchart LR
    A["call foo<br/>(jal ra, offset)"] --> B["ra = pc + 4"]
    B --> C["pc += offset<br/>(enter foo)"]
    C --> D["...foo body..."]
    D --> E["ret<br/>(jalr zero, 0(ra))"]
    E --> F["pc = ra<br/>(back after call)"]

7. Memory Instructions: Loads (I-type)

Memory instructions move data between registers and memory. The processor can only compute on values in registers, so any program that touches arrays, the stack, or the heap must load values in and store results back out.

A load is an I-type instruction (same encoding family as addi). The assembly syntax puts the offset and base register together:

lw t0, offset(a0)      # general form
lw t0, 8(a0)           # concrete: load word from address (a0 + 8) into t0

The mental model: a load is a pointer dereference

The instructor's central intuition is that a RISC-V load is exactly a C pointer dereference:

t0 = *(a0 + offset);

But there is a subtlety: how many bytes do we read, and is the value signed or unsigned? That depends on the specific load instruction, and in C it is controlled by the pointer type we cast to. To load a 32-bit word, cast the computed address to a uint32_t * before dereferencing:

t0 = *((uint32_t *)(a0 + offset));

The cast picks both the access width (4 bytes for uint32_t) and the signedness.

Target address

It helps to name the computed address. The target address (TA) is the base register plus the sign-extended offset; then the load reads from there:

uint64_t ta = rsp->regs[rs1] + offset;       // base address + signed offset
rsp->regs[rd] = *((uint32_t *)ta);           // read 4 bytes (lw)

The load encoding fields

 31          20 19    15 14   12 11    7 6        0
| offset[11:0] | rs1    | funct3 | rd    | opcode  |
|  signed imm  | base   |  width | dest  | (load)  |
Field Role
offset[11:0] signed 12-bit immediate (sign-extend it, exactly as in Section 4)
rs1 base address register
funct3 selects the access width / signedness (lb, lh, lw, ld, …)
rd destination register (where the loaded value lands)

Because the offset is a 12-bit signed I-type immediate, it is reconstructed and sign-extended the same way as addi's immediate — get_bits(iw, 20, 12) then sign_extend(..., 11).

Load widths and their C pointer types

The width of a load is just the pointer type you cast the address to:

Instruction Bytes C cast Meaning
lb 1 *((int8_t *)ta) (or uint8_t *) load byte
lw 4 *((uint32_t *)ta) load word
ld 8 *((uint64_t *)ta) load doubleword

The instructor's shorthand from the board:

lb  ->  uint8_t  *
lw  ->  uint32_t *
ld  ->  uint64_t *

Pick the cast that matches the instruction's width. (For the signed loads lb/lh/lw, a fully correct emulator sign-extends the loaded value into the 64-bit register; for the programs in this lab, casting to the matching type is sufficient. The unsigned variants lbu/lhu/lwu zero-extend.) Pointer arithmetic and casting are doing the heavy lifting here: getting the cast wrong reads the wrong number of bytes or mis-interprets the sign, which corrupts the emulated result.

Load handler sketch

void emu_load(struct rv_state *rsp, uint32_t iw) {
    uint32_t rd     = get_rd(iw);
    uint32_t rs1    = get_rs1(iw);
    uint32_t funct3 = get_funct3(iw);
    int64_t  offset = sign_extend(get_bits(iw, 20, 12), 11);

    uint64_t ta = rsp->regs[rs1] + offset;   // target address

    switch (funct3) {
        case 0x0: rsp->regs[rd] = *((int8_t  *)ta); break;  // lb (sign-extends)
        case 0x2: rsp->regs[rd] = *((uint32_t *)ta); break;  // lw
        case 0x3: rsp->regs[rd] = *((uint64_t *)ta); break;  // ld
    }
    rsp->pc += 4;
}

8. Memory Instructions: Stores (S-type)

Stores write a register value out to memory. They are S-type, a different encoding from loads, because a store has no destination register — instead it needs a second source register (the value to write), and that frees up the rd bits to hold part of the immediate.

sw t0, offset(a0)        # store word: write t0 to memory at (a0 + offset)

The C model is the mirror image of a load — the dereference is on the left of the assignment:

*(a0 + offset) = t0;

with the proper width cast:

*((uint32_t *)(rsp->regs[rs1] + offset)) = rsp->regs[rs2];

The S-type encoding

The immediate is split into two fields so the register fields stay put:

 31        25 24  20 19  15 14  12 11      7 6       0
|imm[11:5]   |rs2  |rs1  |funct3|imm[4:0]   |opcode  |
|   7        | 5   | 5   |  3   |   5       |  7     |
Field Role
imm[11:5] + imm[4:0] the signed offset, split across the high and low ends
rs2 the value to store (source)
rs1 the base address register
funct3 access width (sb, sh, sw, sd)

Reconstruct the S-type immediate by OR-ing the two halves, then sign-extend (sign bit is bit 11):

uint32_t imm11_5 = get_bits(iw, 25, 7);   // high 7 bits
uint32_t imm4_0  = get_bits(iw, 7, 5);    // low 5 bits
uint64_t uimm    = (imm11_5 << 5) | imm4_0;
int64_t  offset  = sign_extend(uimm, 11);

Store handler sketch

void emu_store(struct rv_state *rsp, uint32_t iw) {
    uint32_t rs1    = get_rs1(iw);    // base address
    uint32_t rs2    = get_rs2(iw);    // value to store
    uint32_t funct3 = get_funct3(iw);

    uint32_t imm11_5 = get_bits(iw, 25, 7);
    uint32_t imm4_0  = get_bits(iw, 7, 5);
    int64_t  offset  = sign_extend((imm11_5 << 5) | imm4_0, 11);

    uint64_t ta = rsp->regs[rs1] + offset;   // target address

    switch (funct3) {
        case 0x0: *((uint8_t  *)ta) = (uint8_t)  rsp->regs[rs2]; break; // sb
        case 0x2: *((uint32_t *)ta) = (uint32_t) rsp->regs[rs2]; break; // sw
        case 0x3: *((uint64_t *)ta) =            rsp->regs[rs2]; break; // sd
    }
    rsp->pc += 4;
}

Loads vs. stores at a glance

Load (lw) Store (sw)
Format I-type S-type
C model rd = *(uint32_t *)(rs1 + off) *(uint32_t *)(rs1 + off) = rs2
Dereference side right-hand side (read) left-hand side (write)
Has rd? yes (destination) no (uses rs2 as the value)
Immediate contiguous [11:0] split [11:5]+[4:0]
Direction memory → register register → memory

9. The Decode/Dispatch Picture

Stepping back, every instruction in this lecture follows the same pipeline; only the field-extraction and execute logic differ by type:

flowchart TD
    A["Fetch: iw = *(uint32_t *)pc"] --> B["opcode = get_bits(iw, 0, 7)"]
    B --> C{"dispatch on opcode"}
    C -->|0x13| D["emu_i_type<br/>addi: imm[11:0], sign_extend(.,11)"]
    C -->|0x33| E["emu_r_type<br/>add/sub: rd, rs1, rs2"]
    C -->|0x63| F["emu_b_type<br/>combine 4 imm fields, PC-relative"]
    C -->|0x6F| G["emu_jal<br/>link rd=pc+4, pc+=imm"]
    C -->|0x03| H["emu_load (I-type)<br/>rd = *(cast)(rs1+off)"]
    C -->|0x23| I["emu_store (S-type)<br/>*(cast)(rs1+off) = rs2"]
    D --> J["update regs / memory, advance pc"]
    E --> J
    F --> J
    G --> J
    H --> J
    I --> J

Project 4 layers dynamic analysis on top of this dispatch: each handler increments counters in a struct rv_analysis (instructions executed, I/R-type count, load count, store count, jump count, branches taken vs. not taken). Because the emulator already touches every instruction at exactly these dispatch points, adding the counts is a matter of one increment per case — for example, bumping b_taken or b_not_taken inside emu_b_type, and ld_count/st_count inside the load/store handlers.


Key Concepts

Concept Definition Example
Immediate A constant value encoded directly inside the instruction word addi t0, zero, 99 embeds 99
get_bits(iw, start, count) Extract count bits beginning at bit start (LSB position) get_bits(iw, 20, 12) → I-type imm
Sign extension Replicate the sign bit into the upper bits to preserve a signed value sign_extend(0xFFE, 11) = -2
Arithmetic shift right >> on a signed type; fills with the sign bit (int64_t)(v<<52) >> 52
I-type immediate Contiguous 12-bit signed field at bits [31:20] addi, lw, jalr
B-type immediate Branch offset scattered across 4 fields, implicit low 0 beq, bne, blt, bge
PC-relative addressing Target = current PC + signed offset (not absolute) pc += imm if branch taken
JAL Jump-and-link: save pc+4 into rd, then jump calljal ra, jjal zero
Load (I-type) rd = *(width *)(rs1 + offset) lw t0, 8(a0)
Store (S-type) *(width *)(rs1 + offset) = rs2 sw t0, 8(a0)
Target address (TA) The computed memory address base + offset ta = regs[rs1] + offset
Pseudo-instruction Assembler convenience mapped to real instruction(s) liaddi, bgtblt (swapped)

Practice Problems

Problem 1: Decode an I-type Instruction

The instruction word 0x06300293 is an addi. Decode all five fields and state the equivalent assembly.

Click to reveal solution Binary: `0000 0110 0011 00000 000 00101 0010011` | Field | Bits | Value | |-------|------|-------| | opcode [6:0] | `0010011` | 0x13 (I-type ALU) | | rd [11:7] | `00101` | 5 = `t0` | | funct3 [14:12] | `000` | `addi` | | rs1 [19:15] | `00000` | 0 = `zero` | | imm [31:20] | `000001100011` | 64+32+2+1 = 99 | Assembly: `addi t0, zero, 99` (i.e. `li t0, 99`), so `t0 = 99`.

Problem 2: Sign-Extend a 12-bit Immediate

The 12-bit field of an addi is 0b111111111110. Show that sign-extending it to 64 bits gives -2, and explain why zero-extension would be wrong.

Click to reveal solution The sign bit (bit 11) is `1`, so the value is negative. Verify with two's complement:
   1111 1111 1110
~  0000 0000 0001   (invert)
+              1   (add 1)
=  0000 0000 0010 = 2   ->   original = -2
Sign-extend replicates bit 11 upward:
0xFFFF_FFFF_FFFF_FFFE = -2   (correct)
Zero-extension would give `0x0000_0000_0000_0FFE = 4094`, which is wrong because it discards the sign — the upper bits must copy the sign bit, not be filled with 0. In code: `sign_extend(0xFFE, 11)` returns `-2` because `((int64_t)(0xFFE << 52)) >> 52` arithmetic-shifts the sign bit back down.

Problem 3: Reconstruct a Branch Immediate

A B-type instruction has these extracted fields: imm12 = 1, imm11 = 1, imm10_5 = 0b111111, imm4_1 = 0b1100. Reconstruct the full signed offset.

Click to reveal solution Combine the parts (bit 0 is implicitly 0):
uimm = (1 << 12) | (1 << 11) | (0b111111 << 5) | (0b1100 << 1);
bit:  12 11 10 9 8 7 6 5 4 3 2 1 0
       1  1  1 1 1 1 1 1 1 1 0 0 0
That is `0b1_1111_1111_1000`. The sign bit (bit 12) is 1, so the value is negative. As a 13-bit two's complement number:
   1 1111 1111 1000
~  0 0000 0000 0111
+                 1
=  0 0000 0000 1000 = 8   ->   value = -8
So `sign_extend(uimm, 12)` gives `-8`. A negative offset means this is a **backward** branch (e.g., the top of a loop): if taken, `pc += -8` moves the PC back two instructions.

Problem 4: Branch PC Update

The PC is 0x1000. A beq rs1, rs2, L has reconstructed imm = 12, and regs[rs1] == regs[rs2]. What is the new PC? What if the registers were not equal?

Click to reveal solution `beq` (`funct3 == 0x0`) takes the branch when the registers are equal. - Registers equal → branch taken → `pc = 0x1000 + 12 = 0x100C`. - Registers not equal → fall through → `pc = 0x1000 + 4 = 0x1004`. The offset is PC-relative (added to the current PC), not an absolute address. Note `0x100C - 0x1000 = 12 = 3 instructions` forward.

Problem 5: JAL Semantics for call and j

Explain what jal ra, offset and jal zero, offset each do, given the PC is 0x2000 and offset = 0x40. Why does writing the link to zero make jal zero behave like a plain jump?

Click to reveal solution `jal rd, offset` does two things: `regs[rd] = pc + 4` (link), then `pc += offset` (jump). **`jal ra, offset`** (the `call` pseudo-instruction): - `ra (x1) = 0x2000 + 4 = 0x2004` — the return address (instruction after the call). - `pc = 0x2000 + 0x40 = 0x2040` — jump into the function. - Later, `ret` (`jalr zero, 0(ra)`) sets `pc = ra = 0x2004`, returning after the call. **`jal zero, offset`** (the `j` pseudo-instruction): - `zero (x0) = 0x2004` — but `x0` is hardwired to 0, so the write is a no-op (the link is discarded). - `pc = 0x2000 + 0x40 = 0x2040` — jump. Because the link goes to `x0` and is thrown away, `jal zero` is "just jump" with no return address saved — exactly what an unconditional `j` needs. In the emulator we guard with `if (rd != 0)` so we never overwrite `x0`.

Problem 6: Load and Store as Pointer Dereferences

Translate lw t0, 8(a0) and sw t0, 8(a0) into C pointer expressions, and write the emulator line for each (a0 is rs1, t0 is rd/rs2).

Click to reveal solution **Load** `lw t0, 8(a0)` — read 4 bytes from `(a0 + 8)` into `t0`:
t0 = *((uint32_t *)(a0 + 8));
// emulator:
uint64_t ta = rsp->regs[rs1] + 8;          // a0 + offset
rsp->regs[rd] = *((uint32_t *)ta);         // lw uses uint32_t *
**Store** `sw t0, 8(a0)` — write the low 4 bytes of `t0` to `(a0 + 8)`:
*((uint32_t *)(a0 + 8)) = t0;
// emulator:
uint64_t ta = rsp->regs[rs1] + 8;          // a0 + offset
*((uint32_t *)ta) = (uint32_t)rsp->regs[rs2];
The pointer **cast** chooses the access width: `uint32_t *` reads/writes 4 bytes (`lw`/`sw`), `uint64_t *` reads/writes 8 (`ld`/`sd`), `uint8_t *` reads/writes 1 (`lb`/`sb`). The dereference is on the right for a load (read) and on the left for a store (write). Note `lw` is I-type while `sw` is S-type, so the two reconstruct their offsets differently even though the C model is symmetric.

Further Reading


Summary

  1. li is addi. The assembler rewrites li t0, 99 as addi t0, zero, 99; the constant 99 rides inside the instruction word as an immediate, and t0 = x0 + 99 = 99.

  2. I-type decode is shift-and-mask. Slice the 12-bit immediate with get_bits(iw, 20, 12), plus get_rd, get_rs1, get_funct3 for the other fields. We verified 0x06300293 = addi t0, zero, 99.

  3. Immediates are signed and must be sign-extended. Use sign_extend(value, sign_bit), which shifts the sign bit to bit 63 and arithmetic-shifts it back. This only works on a signed type (int64_t), where >> fills with the sign bit.

  4. Branch immediates are scattered and PC-relative. Reconstruct them in three steps — get parts, combine with shifts and OR, sign-extend (sign bit 12) — and remember the implicit low-order zero from instruction alignment.

  5. Branch logic is pc += imm if taken, else pc += 4. funct3 selects the comparison (beq, bne, blt, bge); the condition reduces to a single boolean assignment.

  6. JAL links and jumps. jal rd, imm writes pc + 4 into rd (skipped when rd == zero) then adds the offset. call is jal ra; j is jal zero; ret returns to ra via jalr.

  7. Loads and stores are typed pointer dereferences. A load is rd = *(width *)(rs1 + offset) (I-type); a store is *(width *)(rs1 + offset) = rs2 (S-type). The pointer cast (uint8_t *, uint32_t *, uint64_t *) selects the access width for lb/lw/ld and sb/sw/sd.

  8. Every instruction follows the same fetch-decode-dispatch-execute loop. Only the field extraction and execute body differ by type — which is also where Project 4 hangs its dynamic-analysis counters.