RISC-V Machine Code and Emulation¶

Overview¶

This lecture connects RISC-V assembly to the binary machine code the processor actually executes, and introduces the design of a software emulator that runs that machine code. We examine the 32-bit instruction word, decode the R-type format by hand and in C using shifts and masks, and lay out the emulator's processor state (registers, program counter, and stack). By the end you should understand the fetch-decode-execute cycle well enough to begin building the Lab 6 emulator.

Learning Objectives¶

Explain the relationship between C, assembly, and machine code
Read a 32-bit RISC-V instruction word from memory in C using pointers and casts
Decode the six instruction fields of an R-type instruction (opcode, rd, funct3, rs1, rs2, funct7)
Use shift-and-mask bitwise operations to extract bit fields from an instruction word
Describe the emulator's processor state: 32 registers, the PC, and the emulated stack
Distinguish interpretation from emulation
Trace the fetch-decode-execute cycle and the role of jalr/ret in stopping the emulator
Plan an incremental strategy for extending the emulator to new instructions

Prerequisites¶

RISC-V assembly: registers, instructions, control flow, and the calling convention (Project 2 and the RISC-V guide)
Bitwise operators in C: &, |, ^, ~, <<, >>, and masking (Project 3)
Binary, hexadecimal, and two's complement number systems
C pointers, casts, struct, and fixed-width integer types (uint32_t, uint64_t)

1. From C to Assembly to Machine Code¶

A processor does not run C and it does not run assembly text. It runs machine code: a stream of binary instruction words sitting in memory. The translation chain is:

flowchart LR
    A["C source<br>(.c)"] -->|"gcc compiles"| B["Assembly<br>(.s)"]
    B -->|"as assembles"| C["Machine code<br>(.o, binary)"]
    C -->|"processor fetches"| D["Execution"]

    style C fill:#f9f,stroke:#333,stroke-width:2px

The compiler (gcc) turns C into assembly.
The assembler (as) turns assembly mnemonics into 32-bit binary instruction words.
The processor fetches each word from memory, decodes it, and executes it.

For an emulator, machine code is the input. We write a program that reads those 32-bit words and simulates what the hardware would do with them. The key realization that makes Lab 6 work: the assembly functions in your project are compiled into real machine code that lives at real memory addresses, so you can take the address of a function, read the bytes there, and decode them.

Bitwise Operators Bridge Assembly and Decoding¶

This lecture builds directly on the bitwise operators from Project 3. The same operations exist both as RISC-V instructions and as the C tools we use to take instructions apart:

Operation	C operator	RISC-V (reg / imm)	Use in decoding
AND	`&`	`and` / `andi`	Mask off unwanted upper bits
OR	`\\|`	`or` / `ori`	Combine split immediate pieces
XOR	`^`	`xor` / `xori`	Toggle/compare bits
Shift left logical	`<<`	`sll` / `slli`	Position a field
Shift right logical	`>>` (unsigned)	`srl` / `srli`	Bring a field to bit 0
Shift right arithmetic	`>>` (signed)	`sra` / `srai`	Sign-extend immediates

So and t0, t1, t2 (meaning t0 = t1 & t2) is an instruction the emulator must execute, and & is also the operator the emulator uses to pull t0, t1, and t2 out of that instruction's encoding.

2. The Processor and the Emulator Side by Side¶

A real RISC-V processor holds its state in hardware and reads instructions from memory. Our emulator holds the same state in software (a C struct) and reads the same machine code from memory.

flowchart LR
    subgraph PROC["RISC-V Processor (hardware)"]
        R1["REGS"]
        PC1["PC  (PC = PC + 4)"]
        EX1["execute(iw)"]
    end

    subgraph EMU["RISC-V Emulator (software)"]
        R2["regs[32]"]
        PC2["pc"]
        ST2["stack[]"]
        EX2["execute(iw)"]
    end

    subgraph MEM["Memory"]
        S["STACK"]
        D["DATA"]
        C["CODE<br>0x00B50533<br>0x00008067"]
    end

    PC1 -.fetch iw.-> C
    PC2 -.fetch iw.-> C

Both the hardware and the emulator do the same loop: read the instruction word (iw) that PC points to, execute it, and update PC (usually PC = PC + 4 to step to the next 4-byte instruction). The only difference is that the emulator's registers and PC are just variables in a struct, and "executing" an instruction is a switch/if in C that updates those variables.

The two machine-code values from the handwritten notes are real:

0x00B50533 encodes add a0, a0, a1
0x00008067 encodes ret (which is jalr x0, 0(ra))

3. Processor State¶

To emulate a processor we must represent everything the processor "remembers" between instructions. RISC-V's RV64 state is small:

Registers — 32 general-purpose registers, each a 64-bit value (uint64_t). RV64 registers are 64 bits even though instructions are 32 bits.
PC (program counter) — a 64-bit value (uint64_t) holding the address of the next instruction to execute.
Memory — divided into regions. We care about:
- STACK — local variables and saved registers; grows downward.
- CODE — the machine code being executed.
- DATA — globals and other static data.

        Memory (high address at top)
   +------------------+
   |      STACK       |  <- grows downward, sp points here
   |        |         |
   |        v         |
   |                  |
   |       DATA       |  <- globals
   |                  |
   |       CODE       |  <- machine code, pc points here
   |  0x00B50533 ...  |
   +------------------+
        (low address)

Why 32 Registers Need 5 Bits¶

A register field in an instruction must be able to name any of the 32 registers (x0–x31). It takes 5 bits to do that, because 2^5 = 32. This is exactly why each of the rd, rs1, and rs2 fields is 5 bits wide. A student question in lecture was "why 5 bits?" — the answer is: 5 bits index 32 registers, no more and no less.

The Emulator `struct`¶

We capture the entire processor state in one C struct (provided in the Lab 6 starter code):

#include <stdint.h>

#define NREGS       32      // RISC-V has 32 general-purpose registers
#define STACK_SIZE  8192    // 8 KB emulated stack (grow if you need more)

struct rv_state {
    uint64_t regs[NREGS];        // x0..x31, each 64 bits
    uint64_t pc;                 // program counter
    uint8_t  stack[STACK_SIZE];  // emulated stack memory (byte addressable)
};

regs[NREGS] holds the 32 registers. regs[0] corresponds to x0 (always 0).
pc is the program counter — the address of the next instruction.
stack[STACK_SIZE] is a byte array that is the emulated stack. The stack pointer (sp, which is regs[2]) will point into this array. 8 KB is enough for the Lab 6 programs, but recursion-heavy or local-variable-heavy programs may need a larger value.

4. The 32-bit Instruction Word¶

Every RISC-V instruction (in the base ISA) is exactly 32 bits = 4 bytes = 1 word. The bits are numbered from bit 31 (most significant) down to bit 0 (least significant). RISC-V defines six instruction formats that slice these 32 bits into different fields; the format is identified by the 7-bit opcode in bits [6:0], which is always in the same place.

R-type:  | funct7  | rs2 | rs1 | funct3 | rd       | opcode |   register ops
I-type:  |    imm[11:0]  | rs1 | funct3 | rd       | opcode |   addi, loads, jalr
S-type:  | imm[11:5]|rs2 | rs1 | funct3 | imm[4:0] | opcode |   stores
B-type:  |  imm     |rs2 | rs1 | funct3 | imm      | opcode |   branches
U-type:  |        imm[31:12]            | rd       | opcode |   lui, auipc
J-type:  |        imm (scattered)       | rd       | opcode |   jal, j

Design principle: the opcode ([6:0]) and the register fields (rd, rs1, rs2) sit in the same bit positions across every format. This keeps the hardware decoder simple, and it keeps our C decoder simple too: we can always grab the opcode the same way, then grab rd/rs1/rs2 the same way once we know the format uses them.

We decode in steps:

Look at the opcode to determine the instruction format.
Decode the rest of the word based on that format.
Use funct3 (and sometimes funct7) to pick the exact operation within the format.

5. Worked Example: Decoding `add a0, a0, a1`¶

The worked example from the lecture is add a0, a0, a1, which in RISC-V means:

add a0, a0, a1      # a0 = a0 + a1   (rd = rs1 + rs2)

The three operands map to instruction fields:

        add   a0  ,  a0  ,  a1
              rd     rs1    rs2

The assembler produces the instruction word:

iw = 0x00B50533

Step 1 — Convert hex to binary, MSB on the left¶

bit:  31                                                 0
      0000 0000 1011 0101 0000 0101 0011 0011
      0    0    B    5    0    5    3    3

Step 2 — Slice into the R-type fields¶

 funct7    rs2    rs1   funct3   rd     opcode
0000000  01011  01010   000    01010   0110011
[31:25]  [24:20][19:15] [14:12][11:7]   [6:0]

Step 3 — Interpret each field¶

Field	Bits	Binary	Decimal	Meaning
opcode	[6:0]	`0110011`	51 (0x33)	R-type format
rd	[11:7]	`01010`	10	`a0` (x10) — destination
funct3	[14:12]	`000`	0	selects ADD/SUB family
rs1	[19:15]	`01010`	10	`a0` (x10) — first source
rs2	[24:20]	`01011`	11	`a1` (x11) — second source
funct7	[31:25]	`0000000`	0	ADD (not SUB)

Reading it back: opcode says R-type; funct3 = 000 with funct7 = 0000000 means ADD; rd = a0, rs1 = a0, rs2 = a1. So the word 0x00B50533 is exactly add a0, a0, a1. The decimal 10 and 11 come straight from the ABI register table: a0 = x10, a1 = x11.

graph LR
    IW["0x00B50533"] --> F7["funct7=0000000<br>ADD"]
    IW --> R2["rs2=01011<br>a1 (11)"]
    IW --> R1["rs1=01010<br>a0 (10)"]
    IW --> F3["funct3=000<br>ADD/SUB family"]
    IW --> RD["rd=01010<br>a0 (10)"]
    IW --> OP["opcode=0110011<br>R-type"]

R-Type Operation Table¶

For all R-type instructions the opcode is 0110011. The exact operation is chosen by funct3 and funct7:

Instruction	funct7	funct3	Operation
`add`	`0000000`	`000`	`rd = rs1 + rs2`
`sub`	`0100000`	`000`	`rd = rs1 - rs2`
`sll`	`0000000`	`001`	`rd = rs1 << rs2`
`srl`	`0000000`	`101`	`rd = rs1 >> rs2` (logical)
`sra`	`0100000`	`101`	`rd = rs1 >> rs2` (arithmetic)
`or`	`0000000`	`110`	`rd = rs1 \\| rs2`
`and`	`0000000`	`111`	`rd = rs1 & rs2`
`xor`	`0000000`	`100`	`rd = rs1 ^ rs2`
`mul`	`0000001`	`000`	`rd = rs1 * rs2` (M extension)

Note how add, sub, and mul all share funct3 = 000; only funct7 distinguishes them. That is why decoding R-type sometimes needs both fields.

6. Reading the Instruction Word in C¶

Although RV64 data values are 64 bits, each instruction is 32 bits. To fetch an instruction we treat pc as a pointer to a 32-bit value and dereference it. Since the assembly functions in the project are real compiled machine code, we can point at them directly.

#include <stdint.h>
#include <stdio.h>

// add2_s is an assembly function linked into our program.
// Its address is the address of its first machine-code instruction.
extern uint64_t add2_s(uint64_t a0, uint64_t a1);

void decode_first_instructions(void) {
    // Take the address of the function and view it as 32-bit words.
    uint32_t *pc = (uint32_t *) add2_s;

    uint32_t iw = *pc;                 // fetch the first instruction word
    printf("pc = %p  iw = 0x%08X\n", (void *) pc, iw);

    pc = pc + 1;                       // advance one 32-bit word = 4 bytes
    iw = *pc;                          // fetch the second instruction word
    printf("pc = %p  iw = 0x%08X\n", (void *) pc, iw);
}

Key points:

(uint32_t *) add2_s casts the function pointer to a pointer-to-32-bit-word so that *pc reads exactly one instruction.
*pc dereferences pc to read the 32-bit instruction word from memory.
pc + 1 advances by one uint32_t, which is 4 bytes in pointer arithmetic — exactly the size of one instruction. (If pc were a uint8_t *, you would write pc + 4.)
Printing with %08X shows the word as 8 hex digits so you can compare it against your hand-decoded value.

In the emulator itself, the equivalent fetch using the pc field of rv_state is:

uint32_t iw = *((uint32_t *) rsp->pc);   // fetch the instruction at pc

7. Extracting Fields with Shift and Mask¶

Decoding is pure bit manipulation: shift the field down to bit 0, then mask off everything above it. The general recipe to extract a field of n bits starting at bit position start:

field = (iw >> start) & ((1 << n) - 1)

(1 << n) - 1 builds a mask of n ones. For example (1 << 3) - 1 = 0b111.

Masking, Illustrated¶

Suppose we want the 7-bit opcode, bits [6:0]. We do not even need to shift — we just mask off everything above bit 6:

iw    = ...1 1 . 0 1  0110011      (top bits unknown, low 7 shown)
mask  = 0 0 . . . . . 0 1111111    (& with 0x7F = 0b1111111)
-------------------------------------------------
result= 0 0 . . . . . 0 0110011    (only bits [6:0] survive)

Anywhere the mask has a 0, the result is 0 (AND with 0 is 0); anywhere the mask has a 1, the original bit passes through (AND with 1 is the bit). So masking with 0x7F keeps only the low 7 bits — exactly the opcode.

A `get_bits` Helper¶

Doing shift-and-mask by hand for six fields invites off-by-one errors, so the starter code provides a helper. Using one function for every field keeps the logic consistent and easy to debug:

#include <stdint.h>

// Extract `count` bits from `iw`, starting at bit position `start`.
static inline uint32_t get_bits(uint32_t iw, uint32_t start, uint32_t count) {
    uint32_t mask = (1u << count) - 1u;   // count ones
    return (iw >> start) & mask;
}

Field Extraction Functions¶

With get_bits, every R-type field becomes a one-liner:

static uint32_t get_opcode(uint32_t iw) { return get_bits(iw, 0, 7);  }
static uint32_t get_rd(uint32_t iw)     { return get_bits(iw, 7, 5);  }
static uint32_t get_funct3(uint32_t iw) { return get_bits(iw, 12, 3); }
static uint32_t get_rs1(uint32_t iw)    { return get_bits(iw, 15, 5); }
static uint32_t get_rs2(uint32_t iw)    { return get_bits(iw, 20, 5); }
static uint32_t get_funct7(uint32_t iw) { return get_bits(iw, 25, 7); }

Inline Equivalents¶

The same extractions written directly (handy to recognize when reading code):

uint32_t iw = 0x00B50533;

uint32_t opcode = iw         & 0x7F;   // bits [6:0]   = 0b1111111
uint32_t rd     = (iw >> 7)  & 0x1F;   // bits [11:7]  = 0b11111
uint32_t funct3 = (iw >> 12) & 0x7;    // bits [14:12] = 0b111
uint32_t rs1    = (iw >> 15) & 0x1F;   // bits [19:15] = 0b11111
uint32_t rs2    = (iw >> 20) & 0x1F;   // bits [24:20] = 0b11111
uint32_t funct7 = (iw >> 25) & 0x7F;   // bits [31:25] = 0b1111111

Running this on 0x00B50533 yields opcode=0x33, rd=10, funct3=0, rs1=10, rs2=11, funct7=0 — matching our hand decode. Printing intermediate values in hex while developing makes mistakes obvious.

8. Interpretation vs. Emulation¶

A natural question is whether we are building an interpreter or an emulator. Both terms apply, but with a useful distinction:

Interpretation generally means reading a high-level or intermediate representation (source code, bytecode, an AST) and carrying out its meaning, often without modeling a specific machine.
Emulation means faithfully reproducing the behavior of a specific target machine — its registers, memory model, and instruction semantics — so that programs written for that machine run as if on the real hardware.

Our project reads actual RISC-V machine code and reproduces the behavior of a RISC-V processor: registers, PC, stack, and per-instruction semantics. Even though it works by interpreting one instruction at a time, it is best described as an emulator because it models the target machine. In the lecture this is exactly the conclusion Greg reached: the program interprets instructions, but it is an emulator of a RISC-V CPU.

	Interpreter	Emulator
Input	Source / bytecode / AST	Target machine code
Models a specific CPU?	Not necessarily	Yes (registers, PC, memory)
Our Lab 6 project	reads & runs instructions	models a RISC-V CPU ✓

9. The Fetch-Decode-Execute Cycle¶

Every CPU — real or emulated — runs the same three-step cycle forever:

flowchart LR
    A["Fetch<br>iw = *pc"] --> B["Decode<br>opcode, rd, funct3, ..."]
    B --> C["Execute<br>update regs / memory"]
    C --> D["Update pc<br>(pc += 4 or branch/jump)"]
    D --> A

In the emulator this becomes a loop over the rv_state:

uint64_t rv_emulate(struct rv_state *rsp) {
    while (rsp->pc != 0) {              // 0 (null) PC means "stop"
        rv_one(rsp);                   // do one fetch-decode-execute
        rsp->regs[0] = 0;              // x0 is hardwired to 0
    }
    return rsp->regs[10];              // return value is in a0 (x10)
}

A single instruction step dispatches on the opcode:

void rv_one(struct rv_state *rsp) {
    uint32_t iw = *((uint32_t *) rsp->pc);   // FETCH
    uint32_t opcode = get_opcode(iw);        // DECODE (format)

    switch (opcode) {                        // EXECUTE (dispatch)
        case 0b0110011:  emu_r_type(rsp, iw);    break;  // R-type
        case 0b0010011:  emu_i_type(rsp, iw);    break;  // I-type arith
        case 0b1100111:  emu_jalr(rsp, iw);      break;  // jalr / ret
        // ... add more formats as you extend the emulator ...
        default:
            printf("Unknown opcode: 0x%X\n", opcode);
            exit(1);
    }
}

Two important details:

After each instruction we force regs[0] = 0. Register x0 is hardwired to zero on real hardware; if any instruction writes to it, the write must be discarded.
The format handler is responsible for updating pc. Most handlers do pc += 4; control instructions compute a new pc.

Executing an R-Type Instruction¶

void emu_r_type(struct rv_state *rsp, uint32_t iw) {
    uint32_t rd     = get_rd(iw);
    uint32_t rs1    = get_rs1(iw);
    uint32_t rs2    = get_rs2(iw);
    uint32_t funct3 = get_funct3(iw);
    uint32_t funct7 = get_funct7(iw);

    if (funct3 == 0b000 && funct7 == 0b0000000) {
        rsp->regs[rd] = rsp->regs[rs1] + rsp->regs[rs2];        // add
    } else if (funct3 == 0b000 && funct7 == 0b0100000) {
        rsp->regs[rd] = rsp->regs[rs1] - rsp->regs[rs2];        // sub
    } else if (funct3 == 0b000 && funct7 == 0b0000001) {
        rsp->regs[rd] = rsp->regs[rs1] * rsp->regs[rs2];        // mul
    } else if (funct3 == 0b111 && funct7 == 0b0000000) {
        rsp->regs[rd] = rsp->regs[rs1] & rsp->regs[rs2];        // and
    } else {
        printf("Unsupported R-type: funct3=%u funct7=%u\n", funct3, funct7);
        exit(1);
    }

    rsp->pc += 4;   // advance to next instruction
}

Because the registers are uint64_t, C arithmetic wraps on overflow exactly the way RISC-V hardware does, so no special handling is required for wrapping.

10. Initializing the Emulator and Stopping It¶

Before emulating a function we set up the initial state with an init function, then run the loop, then read the result.

void rv_init(struct rv_state *rsp, uint64_t (*func)(),
             uint64_t a0, uint64_t a1, uint64_t a2, uint64_t a3) {
    // Zero out all state first.
    for (int i = 0; i < NREGS; i++)
        rsp->regs[i] = 0;

    // pc points at the first instruction of the target function.
    rsp->pc = (uint64_t) func;

    // Arguments go in a0..a3 (x10..x13).
    rsp->regs[10] = a0;
    rsp->regs[11] = a1;
    rsp->regs[12] = a2;
    rsp->regs[13] = a3;

    // sp (x2) points to the TOP of the emulated stack (it grows down).
    rsp->regs[2] = (uint64_t) &rsp->stack[STACK_SIZE];

    // ra (x1) = 0 acts as a halt sentinel (see below).
    rsp->regs[1] = 0;
}

// Usage:
struct rv_state state;
rv_init(&state, (uint64_t (*)()) quadratic_s, 2, 4, 6, 8);
uint64_t result = rv_emulate(&state);
printf("Emu: %lu\n", result);

The Stack Pointer Points to the Top¶

The diagram from the lecture shows sp pointing near the high end of the stack[] array, because the stack grows downward toward lower addresses. We initialize sp = &stack[STACK_SIZE] (one past the last byte). As the program pushes data (addi sp, sp, -16), sp moves down through the array; as it pops (addi sp, sp, 16), sp moves back up.

   rv_state                  emulated stack[]
  +---------+               +------------------+  high addr
  | regs[32]|        sp --> | stack[STACK_SIZE]|  <- sp starts here
  |   pc    |               |       ...        |
  | stack[] |               |    (grows down)  |
  +---------+               |       ...        |
                            | stack[0]         |  low addr
                            +------------------+

How Emulation Stops: `jalr` and `ret`¶

The emulator runs while (pc != 0). We need the function's final ret to make pc become 0. Here is the chain:

ret is a pseudo-instruction for jalr x0, 0(ra) — "jump to the address in ra."
jalr rd, offset(rs1) sets pc = regs[rs1] + offset (and, if rd != 0, saves the return address in rd).
We initialized ra = 0. So when the top-level function returns, ret computes pc = regs[ra] + 0 = 0.
The loop condition pc != 0 becomes false and emulation stops. The result is sitting in a0 (regs[10]).

void emu_jalr(struct rv_state *rsp, uint32_t iw) {
    uint32_t rs1 = get_rs1(iw);
    // ret = jalr x0, 0(ra): rd is x0 (no link), offset is 0.
    rsp->pc = rsp->regs[rs1];   // pc = ra; for the top call ra == 0 -> stop
}

flowchart TD
    A["rv_init: ra = 0"] --> B["emulate instructions"]
    B --> C{"pc != 0 ?"}
    C -->|yes| B
    C -->|no| D["return regs[a0]"]
    E["ret = jalr x0, 0(ra)"] -->|"pc = ra = 0"| C

11. Extending the Emulator (Lab 6)¶

The Lab 6 starter gives you R-type decoding and the framework that compares your emulator's output against the C and assembly versions of each test program. Your job is to add enough instructions to run quadratic_s, midpoint_s, max3_s, and get_bitseq_s. The expected output looks like:

$ ./lab06 quadratic 2 4 6 8
C: 36
Asm: 36
Emu: 36

$ ./lab06 max3 3 8 5
C: 8
Asm: 8
Emu: 8

All three lines must agree. The instructions you will likely add include mv, sub, li, jal (and the j/call pseudo-instructions built on it), jalr (ret), and the conditional branches (beq, bne, blt, bge — the bCC family). The best approach is incremental: run a program, see which instruction is reported as unsupported, decode it, implement it, retest.

flowchart TD
    A["Pick a target program<br>(start with quadratic)"] --> B["Run it; see which<br>instruction is unsupported"]
    B --> C["Identify the format<br>from the opcode"]
    C --> D["Decode the fields<br>with get_bits"]
    D --> E["Implement the operation<br>and update pc"]
    E --> F{"Emu == C == Asm ?"}
    F -->|no| B
    F -->|yes| G["Move to next program"]

Beyond R-Type: Other Formats You Will Need¶

These are summarized here for reference; the immediates for I/B/J types must be reassembled from their fields and sign-extended before use (sign extension uses arithmetic right shift, exactly the sra idea from Project 3).

Format	Used by	Field notes
I-type	`addi`, `li`, `lw`, `lb`, `jalr`	12-bit immediate in `[31:20]`, sign-extended
B-type	`beq`, `bne`, `blt`, `bge`	scattered 13-bit immediate; if taken `pc += imm`, else `pc += 4`
J-type	`jal`, `j`, `call`	scattered 21-bit immediate; `pc += imm`

// Sign-extend the low (start+1) bits of value to a full 64-bit signed value.
static int64_t sign_extend(uint64_t value, int start) {
    int shift = 63 - start;
    return ((int64_t) (value << shift)) >> shift;  // arithmetic shift right
}

For example, an I-type addi is decoded as:

void emu_i_type(struct rv_state *rsp, uint32_t iw) {
    uint32_t rd     = get_rd(iw);
    uint32_t rs1    = get_rs1(iw);
    uint32_t funct3 = get_funct3(iw);
    int64_t  imm    = sign_extend(get_bits(iw, 20, 12), 11);

    if (funct3 == 0b000) {                         // addi (also li, mv)
        rsp->regs[rd] = rsp->regs[rs1] + imm;
    } else {
        printf("Unsupported I-type funct3=%u\n", funct3);
        exit(1);
    }
    rsp->pc += 4;
}

Note that li a0, 5 is addi a0, zero, 5 and mv a0, a1 is addi a0, a1, 0, so implementing addi correctly gives you li and mv for free.

Key Concepts¶

Concept	Definition	Example
Machine code	Binary encoding of instructions executed by the CPU	`0x00B50533`
Instruction word (`iw`)	The 32-bit value encoding one instruction	`add a0,a0,a1` → `0x00B50533`
opcode	7-bit field in `[6:0]` identifying the format	`0110011` = R-type
funct3 / funct7	Sub-fields selecting the exact operation	`funct3=000, funct7=0` → `add`
rd / rs1 / rs2	5-bit register fields (5 bits → 32 registers)	`rd=01010` → `a0`
Shift and mask	Extract a field: `(iw >> start) & mask`	`(iw>>12)&0x7` → funct3
`get_bits`	Helper to extract `count` bits at `start`	`get_bits(iw,7,5)` → rd
Emulator	Software that models a target CPU's state and instructions	`struct rv_state`
PC (program counter)	Address of the next instruction; usually `pc += 4`	`pc = (uint64_t) func`
Fetch-decode-execute	The per-instruction cycle every CPU runs	`iw=*pc; decode; execute`
Halt sentinel	`ra = 0` so the final `ret` makes `pc = 0`, stopping the loop	`while (pc != 0)`

Practice Problems¶

Problem 1: Decode an R-Type Instruction¶

Decode the instruction word 0x40A60633. What RISC-V assembly instruction does it represent?

Click to reveal solution

**Step 1: Convert to binary**

0x40A60633 = 0100 0000 1010 0110 0000 0110 0011 0011

**Step 2: Slice into R-type fields**

 funct7    rs2    rs1   funct3   rd     opcode
0100000  01010  01100   000    01100   0110011

**Step 3: Interpret** | Field | Binary | Decimal | Meaning | |-------|--------|---------|---------| | opcode | 0110011 | 51 | R-type | | rd | 01100 | 12 | a2 | | funct3 | 000 | 0 | ADD/SUB family | | rs1 | 01100 | 12 | a2 | | rs2 | 01010 | 10 | a0 | | funct7 | 0100000 | 32 | SUB (bit 30 set) | Since `funct3 = 000` and `funct7 = 0100000`, this is **SUB**. **Answer:** `sub a2, a2, a0`

Problem 2: Extract a Field in C¶

Write a C expression (without get_bits) that extracts rs1 from an instruction word iw, and one that extracts funct7.

Click to reveal solution

`rs1` is bits `[19:15]` — shift down by 15 and mask 5 bits:

uint32_t rs1 = (iw >> 15) & 0x1F;   // 0x1F = 0b11111

`funct7` is bits `[31:25]` — shift down by 25 and mask 7 bits:

uint32_t funct7 = (iw >> 25) & 0x7F;   // 0x7F = 0b1111111

Using the helper, these are `get_bits(iw, 15, 5)` and `get_bits(iw, 25, 7)`.

Problem 3: Why 5-Bit Register Fields?¶

The rd, rs1, and rs2 fields are each exactly 5 bits. Why 5? What is the maximum register number they can encode, and what happens if a design had only 4 bits?

Click to reveal solution

RISC-V has **32** general-purpose registers (`x0`–`x31`). To name any one of 32 things you need `ceil(log2(32)) = 5` bits, because `2^5 = 32`. A 5-bit field encodes values `0`–`31`, which maps exactly to `x0`–`x31`. With only **4 bits** you could address only `2^4 = 16` registers (`x0`–`x15`), which is not enough — you could not name `x16`–`x31` at all. So 5 bits is the minimum field width that covers all 32 registers.

Problem 4: Trace `rv_one` on an `add`¶

Given rsp->regs[10] = 5, rsp->regs[11] = 7, rsp->pc pointing at an instruction whose word is 0x00B50533, trace what rv_one does and give the resulting register and PC state.

Click to reveal solution

**Fetch:** `iw = 0x00B50533`. **Decode:** `opcode = 0x33` → R-type, so `emu_r_type` is called. Inside:

rd     = 10 (a0)
rs1    = 10 (a0)
rs2    = 11 (a1)
funct3 = 0
funct7 = 0      -> this is add

**Execute:**

rsp->regs[10] = rsp->regs[10] + rsp->regs[11];  // 5 + 7 = 12
rsp->pc += 4;

**Result:** `regs[10]` (a0) becomes `12`; `regs[11]` (a1) is unchanged at `7`; `pc` advanced by 4. Back in the loop, `regs[0]` is forced to 0 (no effect here since a0 was written, not x0).

Problem 5: Where Does the Stack Pointer Start?¶

In rv_init, why do we set sp = &stack[STACK_SIZE] rather than &stack[0]? What would go wrong with &stack[0]?

Click to reveal solution

The RISC-V stack grows **downward** (toward lower addresses). A function allocates stack space with `addi sp, sp, -N` (subtracting), and frees it with `addi sp, sp, +N` (adding). For there to be room to grow downward, `sp` must start at the **top** (high-address end) of the array, which is `&stack[STACK_SIZE]` (one past the last valid byte, the conventional "empty stack" position). If we started at `&stack[0]` (the bottom), the very first `addi sp, sp, -16` would move `sp` to `&stack[-16]` — *below* the array — and any store through `sp` would write out of bounds, corrupting memory or crashing. Starting at the top gives the full 8 KB of room to grow down.

Problem 6: How Does the Emulator Stop?¶

Explain the chain of events that causes rv_emulate's loop to terminate when the top-level function executes ret.

Click to reveal solution

1. `rv_init` sets `regs[1]` (`ra`) to `0`. 2. The loop runs `while (rsp->pc != 0)`. 3. `ret` is the pseudo-instruction `jalr x0, 0(ra)`. 4. The `jalr` handler sets `pc = regs[rs1] + offset`. Here `rs1 = ra` and `offset = 0`, so `pc = regs[ra] + 0 = 0`. 5. Control returns to the loop; the condition `pc != 0` is now false, so the loop exits. 6. The function's return value, which lives in `a0` (`regs[10]`), is returned by `rv_emulate`. The `ra = 0` initialization is the "halt sentinel": there is no real instruction at address 0, but we never fetch from it because the loop checks `pc != 0` first.

Summary¶

Machine code is binary, fixed at 32 bits. The compiler emits assembly, the assembler emits 32-bit instruction words, and the processor (or our emulator) fetches and executes those words directly.
An emulator mirrors the processor in software. Its state is a struct rv_state with 32 64-bit registers, a 64-bit pc, and an 8 KB stack[] array — the same state a real RISC-V CPU keeps.
The opcode ([6:0]) picks the format; funct3/funct7 pick the operation. R-type packs funct7 | rs2 | rs1 | funct3 | rd | opcode, with register fields 5 bits wide because 5 bits address 32 registers.
Decoding is shift-and-mask. (iw >> start) & mask pulls out each field; a single get_bits helper keeps the six extractions consistent and bug-free. Worked example: 0x00B50533 decodes to add a0, a0, a1.
Fetch the instruction word with a pointer cast. iw = *((uint32_t *) pc) reads one 32-bit instruction; advancing a uint32_t * by 1 steps forward 4 bytes.
The project is best called an emulator. It interprets one instruction at a time but models a specific machine (RISC-V), so "emulator" is the accurate term.
Fetch-decode-execute runs in a loop until pc == 0. rv_init sets ra = 0 so the top-level ret (jalr x0, 0(ra)) drives pc to 0 and stops the loop, leaving the answer in a0.
Extend the emulator incrementally. Run a target program, implement whatever instruction it reports as unsupported (mv, sub, li, jal, jalr, the bCC branches), and retest until Emu, C, and Asm outputs all match.

RISC-V Machine Code and Emulation¶

Overview¶

Learning Objectives¶

Prerequisites¶

1. From C to Assembly to Machine Code¶

Bitwise Operators Bridge Assembly and Decoding¶

2. The Processor and the Emulator Side by Side¶

3. Processor State¶

Why 32 Registers Need 5 Bits¶

The Emulator struct¶

4. The 32-bit Instruction Word¶

5. Worked Example: Decoding add a0, a0, a1¶

Step 1 — Convert hex to binary, MSB on the left¶

Step 2 — Slice into the R-type fields¶

Step 3 — Interpret each field¶

R-Type Operation Table¶

6. Reading the Instruction Word in C¶

7. Extracting Fields with Shift and Mask¶

Masking, Illustrated¶

A get_bits Helper¶

Field Extraction Functions¶

Inline Equivalents¶

8. Interpretation vs. Emulation¶

9. The Fetch-Decode-Execute Cycle¶

Executing an R-Type Instruction¶

10. Initializing the Emulator and Stopping It¶

The Stack Pointer Points to the Top¶

How Emulation Stops: jalr and ret¶

11. Extending the Emulator (Lab 6)¶

Beyond R-Type: Other Formats You Will Need¶

Key Concepts¶

Practice Problems¶

Problem 1: Decode an R-Type Instruction¶

Problem 2: Extract a Field in C¶

Problem 3: Why 5-Bit Register Fields?¶

Problem 4: Trace rv_one on an add¶

Problem 5: Where Does the Stack Pointer Start?¶

Problem 6: How Does the Emulator Stop?¶

Further Reading¶

Summary¶

The Emulator `struct`¶

5. Worked Example: Decoding `add a0, a0, a1`¶

A `get_bits` Helper¶

How Emulation Stops: `jalr` and `ret`¶

Problem 4: Trace `rv_one` on an `add`¶