RISC-V Machine Code and Emulation¶
Overview¶
This lecture connects RISC-V assembly to the binary machine code the processor actually executes, and introduces the design of a software emulator that runs that machine code. We examine the 32-bit instruction word, decode the R-type format by hand and in C using shifts and masks, and lay out the emulator's processor state (registers, program counter, and stack). By the end you should understand the fetch-decode-execute cycle well enough to begin building the Lab 6 emulator.
Learning Objectives¶
- Explain the relationship between C, assembly, and machine code
- Read a 32-bit RISC-V instruction word from memory in C using pointers and casts
- Decode the six instruction fields of an R-type instruction (opcode, rd, funct3, rs1, rs2, funct7)
- Use shift-and-mask bitwise operations to extract bit fields from an instruction word
- Describe the emulator's processor state: 32 registers, the PC, and the emulated stack
- Distinguish interpretation from emulation
- Trace the fetch-decode-execute cycle and the role of
jalr/retin stopping the emulator - Plan an incremental strategy for extending the emulator to new instructions
Prerequisites¶
- RISC-V assembly: registers, instructions, control flow, and the calling convention (Project 2 and the RISC-V guide)
- Bitwise operators in C:
&,|,^,~,<<,>>, and masking (Project 3) - Binary, hexadecimal, and two's complement number systems
- C pointers, casts,
struct, and fixed-width integer types (uint32_t,uint64_t)
1. From C to Assembly to Machine Code¶
A processor does not run C and it does not run assembly text. It runs machine code: a stream of binary instruction words sitting in memory. The translation chain is:
flowchart LR
A["C source<br>(.c)"] -->|"gcc compiles"| B["Assembly<br>(.s)"]
B -->|"as assembles"| C["Machine code<br>(.o, binary)"]
C -->|"processor fetches"| D["Execution"]
style C fill:#f9f,stroke:#333,stroke-width:2px
- The compiler (
gcc) turns C into assembly. - The assembler (
as) turns assembly mnemonics into 32-bit binary instruction words. - The processor fetches each word from memory, decodes it, and executes it.
For an emulator, machine code is the input. We write a program that reads those 32-bit words and simulates what the hardware would do with them. The key realization that makes Lab 6 work: the assembly functions in your project are compiled into real machine code that lives at real memory addresses, so you can take the address of a function, read the bytes there, and decode them.
Bitwise Operators Bridge Assembly and Decoding¶
This lecture builds directly on the bitwise operators from Project 3. The same operations exist both as RISC-V instructions and as the C tools we use to take instructions apart:
| Operation | C operator | RISC-V (reg / imm) | Use in decoding |
|---|---|---|---|
| AND | & |
and / andi |
Mask off unwanted upper bits |
| OR | \| |
or / ori |
Combine split immediate pieces |
| XOR | ^ |
xor / xori |
Toggle/compare bits |
| Shift left logical | << |
sll / slli |
Position a field |
| Shift right logical | >> (unsigned) |
srl / srli |
Bring a field to bit 0 |
| Shift right arithmetic | >> (signed) |
sra / srai |
Sign-extend immediates |
So and t0, t1, t2 (meaning t0 = t1 & t2) is an instruction the emulator must execute, and & is also the operator the emulator uses to pull t0, t1, and t2 out of that instruction's encoding.
2. The Processor and the Emulator Side by Side¶
A real RISC-V processor holds its state in hardware and reads instructions from memory. Our emulator holds the same state in software (a C struct) and reads the same machine code from memory.
flowchart LR
subgraph PROC["RISC-V Processor (hardware)"]
R1["REGS"]
PC1["PC (PC = PC + 4)"]
EX1["execute(iw)"]
end
subgraph EMU["RISC-V Emulator (software)"]
R2["regs[32]"]
PC2["pc"]
ST2["stack[]"]
EX2["execute(iw)"]
end
subgraph MEM["Memory"]
S["STACK"]
D["DATA"]
C["CODE<br>0x00B50533<br>0x00008067"]
end
PC1 -.fetch iw.-> C
PC2 -.fetch iw.-> C
Both the hardware and the emulator do the same loop: read the instruction word (iw) that PC points to, execute it, and update PC (usually PC = PC + 4 to step to the next 4-byte instruction). The only difference is that the emulator's registers and PC are just variables in a struct, and "executing" an instruction is a switch/if in C that updates those variables.
The two machine-code values from the handwritten notes are real:
0x00B50533encodesadd a0, a0, a10x00008067encodesret(which isjalr x0, 0(ra))
3. Processor State¶
To emulate a processor we must represent everything the processor "remembers" between instructions. RISC-V's RV64 state is small:
- Registers — 32 general-purpose registers, each a 64-bit value (
uint64_t). RV64 registers are 64 bits even though instructions are 32 bits. - PC (program counter) — a 64-bit value (
uint64_t) holding the address of the next instruction to execute. - Memory — divided into regions. We care about:
- STACK — local variables and saved registers; grows downward.
- CODE — the machine code being executed.
- DATA — globals and other static data.
Memory (high address at top)
+------------------+
| STACK | <- grows downward, sp points here
| | |
| v |
| |
| DATA | <- globals
| |
| CODE | <- machine code, pc points here
| 0x00B50533 ... |
+------------------+
(low address)
Why 32 Registers Need 5 Bits¶
A register field in an instruction must be able to name any of the 32 registers (x0–x31). It takes 5 bits to do that, because 2^5 = 32. This is exactly why each of the rd, rs1, and rs2 fields is 5 bits wide. A student question in lecture was "why 5 bits?" — the answer is: 5 bits index 32 registers, no more and no less.
The Emulator struct¶
We capture the entire processor state in one C struct (provided in the Lab 6 starter code):
#include <stdint.h>
#define NREGS 32 // RISC-V has 32 general-purpose registers
#define STACK_SIZE 8192 // 8 KB emulated stack (grow if you need more)
struct rv_state {
uint64_t regs[NREGS]; // x0..x31, each 64 bits
uint64_t pc; // program counter
uint8_t stack[STACK_SIZE]; // emulated stack memory (byte addressable)
};
regs[NREGS]holds the 32 registers.regs[0]corresponds tox0(always 0).pcis the program counter — the address of the next instruction.stack[STACK_SIZE]is a byte array that is the emulated stack. The stack pointer (sp, which isregs[2]) will point into this array. 8 KB is enough for the Lab 6 programs, but recursion-heavy or local-variable-heavy programs may need a larger value.
4. The 32-bit Instruction Word¶
Every RISC-V instruction (in the base ISA) is exactly 32 bits = 4 bytes = 1 word. The bits are numbered from bit 31 (most significant) down to bit 0 (least significant). RISC-V defines six instruction formats that slice these 32 bits into different fields; the format is identified by the 7-bit opcode in bits [6:0], which is always in the same place.
R-type: | funct7 | rs2 | rs1 | funct3 | rd | opcode | register ops
I-type: | imm[11:0] | rs1 | funct3 | rd | opcode | addi, loads, jalr
S-type: | imm[11:5]|rs2 | rs1 | funct3 | imm[4:0] | opcode | stores
B-type: | imm |rs2 | rs1 | funct3 | imm | opcode | branches
U-type: | imm[31:12] | rd | opcode | lui, auipc
J-type: | imm (scattered) | rd | opcode | jal, j
Design principle: the opcode ([6:0]) and the register fields (rd, rs1, rs2) sit in the same bit positions across every format. This keeps the hardware decoder simple, and it keeps our C decoder simple too: we can always grab the opcode the same way, then grab rd/rs1/rs2 the same way once we know the format uses them.
We decode in steps:
- Look at the opcode to determine the instruction format.
- Decode the rest of the word based on that format.
- Use funct3 (and sometimes funct7) to pick the exact operation within the format.
5. Worked Example: Decoding add a0, a0, a1¶
The worked example from the lecture is add a0, a0, a1, which in RISC-V means:
The three operands map to instruction fields:
The assembler produces the instruction word:
Step 1 — Convert hex to binary, MSB on the left¶
Step 2 — Slice into the R-type fields¶
funct7 rs2 rs1 funct3 rd opcode
0000000 01011 01010 000 01010 0110011
[31:25] [24:20][19:15] [14:12][11:7] [6:0]
Step 3 — Interpret each field¶
| Field | Bits | Binary | Decimal | Meaning |
|---|---|---|---|---|
| opcode | [6:0] | 0110011 |
51 (0x33) | R-type format |
| rd | [11:7] | 01010 |
10 | a0 (x10) — destination |
| funct3 | [14:12] | 000 |
0 | selects ADD/SUB family |
| rs1 | [19:15] | 01010 |
10 | a0 (x10) — first source |
| rs2 | [24:20] | 01011 |
11 | a1 (x11) — second source |
| funct7 | [31:25] | 0000000 |
0 | ADD (not SUB) |
Reading it back: opcode says R-type; funct3 = 000 with funct7 = 0000000 means ADD; rd = a0, rs1 = a0, rs2 = a1. So the word 0x00B50533 is exactly add a0, a0, a1. The decimal 10 and 11 come straight from the ABI register table: a0 = x10, a1 = x11.
graph LR
IW["0x00B50533"] --> F7["funct7=0000000<br>ADD"]
IW --> R2["rs2=01011<br>a1 (11)"]
IW --> R1["rs1=01010<br>a0 (10)"]
IW --> F3["funct3=000<br>ADD/SUB family"]
IW --> RD["rd=01010<br>a0 (10)"]
IW --> OP["opcode=0110011<br>R-type"]
R-Type Operation Table¶
For all R-type instructions the opcode is 0110011. The exact operation is chosen by funct3 and funct7:
| Instruction | funct7 | funct3 | Operation |
|---|---|---|---|
add |
0000000 |
000 |
rd = rs1 + rs2 |
sub |
0100000 |
000 |
rd = rs1 - rs2 |
sll |
0000000 |
001 |
rd = rs1 << rs2 |
srl |
0000000 |
101 |
rd = rs1 >> rs2 (logical) |
sra |
0100000 |
101 |
rd = rs1 >> rs2 (arithmetic) |
or |
0000000 |
110 |
rd = rs1 \| rs2 |
and |
0000000 |
111 |
rd = rs1 & rs2 |
xor |
0000000 |
100 |
rd = rs1 ^ rs2 |
mul |
0000001 |
000 |
rd = rs1 * rs2 (M extension) |
Note how add, sub, and mul all share funct3 = 000; only funct7 distinguishes them. That is why decoding R-type sometimes needs both fields.
6. Reading the Instruction Word in C¶
Although RV64 data values are 64 bits, each instruction is 32 bits. To fetch an instruction we treat pc as a pointer to a 32-bit value and dereference it. Since the assembly functions in the project are real compiled machine code, we can point at them directly.
#include <stdint.h>
#include <stdio.h>
// add2_s is an assembly function linked into our program.
// Its address is the address of its first machine-code instruction.
extern uint64_t add2_s(uint64_t a0, uint64_t a1);
void decode_first_instructions(void) {
// Take the address of the function and view it as 32-bit words.
uint32_t *pc = (uint32_t *) add2_s;
uint32_t iw = *pc; // fetch the first instruction word
printf("pc = %p iw = 0x%08X\n", (void *) pc, iw);
pc = pc + 1; // advance one 32-bit word = 4 bytes
iw = *pc; // fetch the second instruction word
printf("pc = %p iw = 0x%08X\n", (void *) pc, iw);
}
Key points:
(uint32_t *) add2_scasts the function pointer to a pointer-to-32-bit-word so that*pcreads exactly one instruction.*pcdereferencespcto read the 32-bit instruction word from memory.pc + 1advances by oneuint32_t, which is 4 bytes in pointer arithmetic — exactly the size of one instruction. (Ifpcwere auint8_t *, you would writepc + 4.)- Printing with
%08Xshows the word as 8 hex digits so you can compare it against your hand-decoded value.
In the emulator itself, the equivalent fetch using the pc field of rv_state is:
7. Extracting Fields with Shift and Mask¶
Decoding is pure bit manipulation: shift the field down to bit 0, then mask off everything above it. The general recipe to extract a field of n bits starting at bit position start:
(1 << n) - 1 builds a mask of n ones. For example (1 << 3) - 1 = 0b111.
Masking, Illustrated¶
Suppose we want the 7-bit opcode, bits [6:0]. We do not even need to shift — we just mask off everything above bit 6:
iw = ...1 1 . 0 1 0110011 (top bits unknown, low 7 shown)
mask = 0 0 . . . . . 0 1111111 (& with 0x7F = 0b1111111)
-------------------------------------------------
result= 0 0 . . . . . 0 0110011 (only bits [6:0] survive)
Anywhere the mask has a 0, the result is 0 (AND with 0 is 0); anywhere the mask has a 1, the original bit passes through (AND with 1 is the bit). So masking with 0x7F keeps only the low 7 bits — exactly the opcode.
A get_bits Helper¶
Doing shift-and-mask by hand for six fields invites off-by-one errors, so the starter code provides a helper. Using one function for every field keeps the logic consistent and easy to debug:
#include <stdint.h>
// Extract `count` bits from `iw`, starting at bit position `start`.
static inline uint32_t get_bits(uint32_t iw, uint32_t start, uint32_t count) {
uint32_t mask = (1u << count) - 1u; // count ones
return (iw >> start) & mask;
}
Field Extraction Functions¶
With get_bits, every R-type field becomes a one-liner:
static uint32_t get_opcode(uint32_t iw) { return get_bits(iw, 0, 7); }
static uint32_t get_rd(uint32_t iw) { return get_bits(iw, 7, 5); }
static uint32_t get_funct3(uint32_t iw) { return get_bits(iw, 12, 3); }
static uint32_t get_rs1(uint32_t iw) { return get_bits(iw, 15, 5); }
static uint32_t get_rs2(uint32_t iw) { return get_bits(iw, 20, 5); }
static uint32_t get_funct7(uint32_t iw) { return get_bits(iw, 25, 7); }
Inline Equivalents¶
The same extractions written directly (handy to recognize when reading code):
uint32_t iw = 0x00B50533;
uint32_t opcode = iw & 0x7F; // bits [6:0] = 0b1111111
uint32_t rd = (iw >> 7) & 0x1F; // bits [11:7] = 0b11111
uint32_t funct3 = (iw >> 12) & 0x7; // bits [14:12] = 0b111
uint32_t rs1 = (iw >> 15) & 0x1F; // bits [19:15] = 0b11111
uint32_t rs2 = (iw >> 20) & 0x1F; // bits [24:20] = 0b11111
uint32_t funct7 = (iw >> 25) & 0x7F; // bits [31:25] = 0b1111111
Running this on 0x00B50533 yields opcode=0x33, rd=10, funct3=0, rs1=10, rs2=11, funct7=0 — matching our hand decode. Printing intermediate values in hex while developing makes mistakes obvious.
8. Interpretation vs. Emulation¶
A natural question is whether we are building an interpreter or an emulator. Both terms apply, but with a useful distinction:
- Interpretation generally means reading a high-level or intermediate representation (source code, bytecode, an AST) and carrying out its meaning, often without modeling a specific machine.
- Emulation means faithfully reproducing the behavior of a specific target machine — its registers, memory model, and instruction semantics — so that programs written for that machine run as if on the real hardware.
Our project reads actual RISC-V machine code and reproduces the behavior of a RISC-V processor: registers, PC, stack, and per-instruction semantics. Even though it works by interpreting one instruction at a time, it is best described as an emulator because it models the target machine. In the lecture this is exactly the conclusion Greg reached: the program interprets instructions, but it is an emulator of a RISC-V CPU.
| Interpreter | Emulator | |
|---|---|---|
| Input | Source / bytecode / AST | Target machine code |
| Models a specific CPU? | Not necessarily | Yes (registers, PC, memory) |
| Our Lab 6 project | reads & runs instructions | models a RISC-V CPU ✓ |
9. The Fetch-Decode-Execute Cycle¶
Every CPU — real or emulated — runs the same three-step cycle forever:
flowchart LR
A["Fetch<br>iw = *pc"] --> B["Decode<br>opcode, rd, funct3, ..."]
B --> C["Execute<br>update regs / memory"]
C --> D["Update pc<br>(pc += 4 or branch/jump)"]
D --> A
In the emulator this becomes a loop over the rv_state:
uint64_t rv_emulate(struct rv_state *rsp) {
while (rsp->pc != 0) { // 0 (null) PC means "stop"
rv_one(rsp); // do one fetch-decode-execute
rsp->regs[0] = 0; // x0 is hardwired to 0
}
return rsp->regs[10]; // return value is in a0 (x10)
}
A single instruction step dispatches on the opcode:
void rv_one(struct rv_state *rsp) {
uint32_t iw = *((uint32_t *) rsp->pc); // FETCH
uint32_t opcode = get_opcode(iw); // DECODE (format)
switch (opcode) { // EXECUTE (dispatch)
case 0b0110011: emu_r_type(rsp, iw); break; // R-type
case 0b0010011: emu_i_type(rsp, iw); break; // I-type arith
case 0b1100111: emu_jalr(rsp, iw); break; // jalr / ret
// ... add more formats as you extend the emulator ...
default:
printf("Unknown opcode: 0x%X\n", opcode);
exit(1);
}
}
Two important details:
- After each instruction we force
regs[0] = 0. Registerx0is hardwired to zero on real hardware; if any instruction writes to it, the write must be discarded. - The format handler is responsible for updating
pc. Most handlers dopc += 4; control instructions compute a newpc.
Executing an R-Type Instruction¶
void emu_r_type(struct rv_state *rsp, uint32_t iw) {
uint32_t rd = get_rd(iw);
uint32_t rs1 = get_rs1(iw);
uint32_t rs2 = get_rs2(iw);
uint32_t funct3 = get_funct3(iw);
uint32_t funct7 = get_funct7(iw);
if (funct3 == 0b000 && funct7 == 0b0000000) {
rsp->regs[rd] = rsp->regs[rs1] + rsp->regs[rs2]; // add
} else if (funct3 == 0b000 && funct7 == 0b0100000) {
rsp->regs[rd] = rsp->regs[rs1] - rsp->regs[rs2]; // sub
} else if (funct3 == 0b000 && funct7 == 0b0000001) {
rsp->regs[rd] = rsp->regs[rs1] * rsp->regs[rs2]; // mul
} else if (funct3 == 0b111 && funct7 == 0b0000000) {
rsp->regs[rd] = rsp->regs[rs1] & rsp->regs[rs2]; // and
} else {
printf("Unsupported R-type: funct3=%u funct7=%u\n", funct3, funct7);
exit(1);
}
rsp->pc += 4; // advance to next instruction
}
Because the registers are uint64_t, C arithmetic wraps on overflow exactly the way RISC-V hardware does, so no special handling is required for wrapping.
10. Initializing the Emulator and Stopping It¶
Before emulating a function we set up the initial state with an init function, then run the loop, then read the result.
void rv_init(struct rv_state *rsp, uint64_t (*func)(),
uint64_t a0, uint64_t a1, uint64_t a2, uint64_t a3) {
// Zero out all state first.
for (int i = 0; i < NREGS; i++)
rsp->regs[i] = 0;
// pc points at the first instruction of the target function.
rsp->pc = (uint64_t) func;
// Arguments go in a0..a3 (x10..x13).
rsp->regs[10] = a0;
rsp->regs[11] = a1;
rsp->regs[12] = a2;
rsp->regs[13] = a3;
// sp (x2) points to the TOP of the emulated stack (it grows down).
rsp->regs[2] = (uint64_t) &rsp->stack[STACK_SIZE];
// ra (x1) = 0 acts as a halt sentinel (see below).
rsp->regs[1] = 0;
}
// Usage:
struct rv_state state;
rv_init(&state, (uint64_t (*)()) quadratic_s, 2, 4, 6, 8);
uint64_t result = rv_emulate(&state);
printf("Emu: %lu\n", result);
The Stack Pointer Points to the Top¶
The diagram from the lecture shows sp pointing near the high end of the stack[] array, because the stack grows downward toward lower addresses. We initialize sp = &stack[STACK_SIZE] (one past the last byte). As the program pushes data (addi sp, sp, -16), sp moves down through the array; as it pops (addi sp, sp, 16), sp moves back up.
rv_state emulated stack[]
+---------+ +------------------+ high addr
| regs[32]| sp --> | stack[STACK_SIZE]| <- sp starts here
| pc | | ... |
| stack[] | | (grows down) |
+---------+ | ... |
| stack[0] | low addr
+------------------+
How Emulation Stops: jalr and ret¶
The emulator runs while (pc != 0). We need the function's final ret to make pc become 0. Here is the chain:
retis a pseudo-instruction forjalr x0, 0(ra)— "jump to the address inra."jalr rd, offset(rs1)setspc = regs[rs1] + offset(and, ifrd != 0, saves the return address inrd).- We initialized
ra = 0. So when the top-level function returns,retcomputespc = regs[ra] + 0 = 0. - The loop condition
pc != 0becomes false and emulation stops. The result is sitting ina0(regs[10]).
void emu_jalr(struct rv_state *rsp, uint32_t iw) {
uint32_t rs1 = get_rs1(iw);
// ret = jalr x0, 0(ra): rd is x0 (no link), offset is 0.
rsp->pc = rsp->regs[rs1]; // pc = ra; for the top call ra == 0 -> stop
}
flowchart TD
A["rv_init: ra = 0"] --> B["emulate instructions"]
B --> C{"pc != 0 ?"}
C -->|yes| B
C -->|no| D["return regs[a0]"]
E["ret = jalr x0, 0(ra)"] -->|"pc = ra = 0"| C
11. Extending the Emulator (Lab 6)¶
The Lab 6 starter gives you R-type decoding and the framework that compares your emulator's output against the C and assembly versions of each test program. Your job is to add enough instructions to run quadratic_s, midpoint_s, max3_s, and get_bitseq_s. The expected output looks like:
All three lines must agree. The instructions you will likely add include mv, sub, li, jal (and the j/call pseudo-instructions built on it), jalr (ret), and the conditional branches (beq, bne, blt, bge — the bCC family). The best approach is incremental: run a program, see which instruction is reported as unsupported, decode it, implement it, retest.
flowchart TD
A["Pick a target program<br>(start with quadratic)"] --> B["Run it; see which<br>instruction is unsupported"]
B --> C["Identify the format<br>from the opcode"]
C --> D["Decode the fields<br>with get_bits"]
D --> E["Implement the operation<br>and update pc"]
E --> F{"Emu == C == Asm ?"}
F -->|no| B
F -->|yes| G["Move to next program"]
Beyond R-Type: Other Formats You Will Need¶
These are summarized here for reference; the immediates for I/B/J types must be reassembled from their fields and sign-extended before use (sign extension uses arithmetic right shift, exactly the sra idea from Project 3).
| Format | Used by | Field notes |
|---|---|---|
| I-type | addi, li, lw, lb, jalr |
12-bit immediate in [31:20], sign-extended |
| B-type | beq, bne, blt, bge |
scattered 13-bit immediate; if taken pc += imm, else pc += 4 |
| J-type | jal, j, call |
scattered 21-bit immediate; pc += imm |
// Sign-extend the low (start+1) bits of value to a full 64-bit signed value.
static int64_t sign_extend(uint64_t value, int start) {
int shift = 63 - start;
return ((int64_t) (value << shift)) >> shift; // arithmetic shift right
}
For example, an I-type addi is decoded as:
void emu_i_type(struct rv_state *rsp, uint32_t iw) {
uint32_t rd = get_rd(iw);
uint32_t rs1 = get_rs1(iw);
uint32_t funct3 = get_funct3(iw);
int64_t imm = sign_extend(get_bits(iw, 20, 12), 11);
if (funct3 == 0b000) { // addi (also li, mv)
rsp->regs[rd] = rsp->regs[rs1] + imm;
} else {
printf("Unsupported I-type funct3=%u\n", funct3);
exit(1);
}
rsp->pc += 4;
}
Note that li a0, 5 is addi a0, zero, 5 and mv a0, a1 is addi a0, a1, 0, so implementing addi correctly gives you li and mv for free.
Key Concepts¶
| Concept | Definition | Example |
|---|---|---|
| Machine code | Binary encoding of instructions executed by the CPU | 0x00B50533 |
Instruction word (iw) |
The 32-bit value encoding one instruction | add a0,a0,a1 → 0x00B50533 |
| opcode | 7-bit field in [6:0] identifying the format |
0110011 = R-type |
| funct3 / funct7 | Sub-fields selecting the exact operation | funct3=000, funct7=0 → add |
| rd / rs1 / rs2 | 5-bit register fields (5 bits → 32 registers) | rd=01010 → a0 |
| Shift and mask | Extract a field: (iw >> start) & mask |
(iw>>12)&0x7 → funct3 |
get_bits |
Helper to extract count bits at start |
get_bits(iw,7,5) → rd |
| Emulator | Software that models a target CPU's state and instructions | struct rv_state |
| PC (program counter) | Address of the next instruction; usually pc += 4 |
pc = (uint64_t) func |
| Fetch-decode-execute | The per-instruction cycle every CPU runs | iw=*pc; decode; execute |
| Halt sentinel | ra = 0 so the final ret makes pc = 0, stopping the loop |
while (pc != 0) |
Practice Problems¶
Problem 1: Decode an R-Type Instruction¶
Decode the instruction word 0x40A60633. What RISC-V assembly instruction does it represent?
Click to reveal solution
**Step 1: Convert to binary** **Step 2: Slice into R-type fields** **Step 3: Interpret** | Field | Binary | Decimal | Meaning | |-------|--------|---------|---------| | opcode | 0110011 | 51 | R-type | | rd | 01100 | 12 | a2 | | funct3 | 000 | 0 | ADD/SUB family | | rs1 | 01100 | 12 | a2 | | rs2 | 01010 | 10 | a0 | | funct7 | 0100000 | 32 | SUB (bit 30 set) | Since `funct3 = 000` and `funct7 = 0100000`, this is **SUB**. **Answer:** `sub a2, a2, a0`Problem 2: Extract a Field in C¶
Write a C expression (without get_bits) that extracts rs1 from an instruction word iw, and one that extracts funct7.
Click to reveal solution
`rs1` is bits `[19:15]` — shift down by 15 and mask 5 bits: `funct7` is bits `[31:25]` — shift down by 25 and mask 7 bits: Using the helper, these are `get_bits(iw, 15, 5)` and `get_bits(iw, 25, 7)`.Problem 3: Why 5-Bit Register Fields?¶
The rd, rs1, and rs2 fields are each exactly 5 bits. Why 5? What is the maximum register number they can encode, and what happens if a design had only 4 bits?
Click to reveal solution
RISC-V has **32** general-purpose registers (`x0`–`x31`). To name any one of 32 things you need `ceil(log2(32)) = 5` bits, because `2^5 = 32`. A 5-bit field encodes values `0`–`31`, which maps exactly to `x0`–`x31`. With only **4 bits** you could address only `2^4 = 16` registers (`x0`–`x15`), which is not enough — you could not name `x16`–`x31` at all. So 5 bits is the minimum field width that covers all 32 registers.Problem 4: Trace rv_one on an add¶
Given rsp->regs[10] = 5, rsp->regs[11] = 7, rsp->pc pointing at an instruction whose word is 0x00B50533, trace what rv_one does and give the resulting register and PC state.
Click to reveal solution
**Fetch:** `iw = 0x00B50533`. **Decode:** `opcode = 0x33` → R-type, so `emu_r_type` is called. Inside: **Execute:** **Result:** `regs[10]` (a0) becomes `12`; `regs[11]` (a1) is unchanged at `7`; `pc` advanced by 4. Back in the loop, `regs[0]` is forced to 0 (no effect here since a0 was written, not x0).Problem 5: Where Does the Stack Pointer Start?¶
In rv_init, why do we set sp = &stack[STACK_SIZE] rather than &stack[0]? What would go wrong with &stack[0]?
Click to reveal solution
The RISC-V stack grows **downward** (toward lower addresses). A function allocates stack space with `addi sp, sp, -N` (subtracting), and frees it with `addi sp, sp, +N` (adding). For there to be room to grow downward, `sp` must start at the **top** (high-address end) of the array, which is `&stack[STACK_SIZE]` (one past the last valid byte, the conventional "empty stack" position). If we started at `&stack[0]` (the bottom), the very first `addi sp, sp, -16` would move `sp` to `&stack[-16]` — *below* the array — and any store through `sp` would write out of bounds, corrupting memory or crashing. Starting at the top gives the full 8 KB of room to grow down.Problem 6: How Does the Emulator Stop?¶
Explain the chain of events that causes rv_emulate's loop to terminate when the top-level function executes ret.
Click to reveal solution
1. `rv_init` sets `regs[1]` (`ra`) to `0`. 2. The loop runs `while (rsp->pc != 0)`. 3. `ret` is the pseudo-instruction `jalr x0, 0(ra)`. 4. The `jalr` handler sets `pc = regs[rs1] + offset`. Here `rs1 = ra` and `offset = 0`, so `pc = regs[ra] + 0 = 0`. 5. Control returns to the loop; the condition `pc != 0` is now false, so the loop exits. 6. The function's return value, which lives in `a0` (`regs[10]`), is returned by `rv_emulate`. The `ra = 0` initialization is the "halt sentinel": there is no real instruction at address 0, but we never fetch from it because the loop checks `pc != 0` first.Further Reading¶
- RISC-V ISA Specification (Volume 1) — Chapter 2 (RV32I), Chapter 5 (RV64I), Chapter 24 (instruction listings), Chapter 25 (pseudo-instructions)
- RISC-V Emulation Guide
- Lab 6: RISC-V Emulation
- RISC-V Reference Guide
- Course Key Concepts
- Source: handwritten lecture notes (PDF)
- Computer Organization and Design: RISC-V Edition (Patterson & Hennessy), Chapter 2
Summary¶
-
Machine code is binary, fixed at 32 bits. The compiler emits assembly, the assembler emits 32-bit instruction words, and the processor (or our emulator) fetches and executes those words directly.
-
An emulator mirrors the processor in software. Its state is a
struct rv_statewith 32 64-bit registers, a 64-bitpc, and an 8 KBstack[]array — the same state a real RISC-V CPU keeps. -
The opcode (
[6:0]) picks the format; funct3/funct7 pick the operation. R-type packsfunct7 | rs2 | rs1 | funct3 | rd | opcode, with register fields 5 bits wide because 5 bits address 32 registers. -
Decoding is shift-and-mask.
(iw >> start) & maskpulls out each field; a singleget_bitshelper keeps the six extractions consistent and bug-free. Worked example:0x00B50533decodes toadd a0, a0, a1. -
Fetch the instruction word with a pointer cast.
iw = *((uint32_t *) pc)reads one 32-bit instruction; advancing auint32_t *by 1 steps forward 4 bytes. -
The project is best called an emulator. It interprets one instruction at a time but models a specific machine (RISC-V), so "emulator" is the accurate term.
-
Fetch-decode-execute runs in a loop until
pc == 0.rv_initsetsra = 0so the top-levelret(jalr x0, 0(ra)) drivespcto 0 and stops the loop, leaving the answer ina0. -
Extend the emulator incrementally. Run a target program, implement whatever instruction it reports as unsupported (
mv,sub,li,jal,jalr, thebCCbranches), and retest untilEmu,C, andAsmoutputs all match.