← Back to Course
# Processor Instruction Decoding ## CS 315 Computer Architecture --- ## What is Decoding? The processor loops forever: **fetch → decode → execute → write back**
flowchart LR PC[PC] --> IMEM[Instruction Memory] IMEM -->|"IW 32 bits"| DEC[Decode] DEC --> EX["Execute: RegFile + ALU"] EX --> WB[Write Back] WB --> PCU[PC update] PCU --> PC
- **Instruction memory** hands us a raw 32-bit value (IW) - **Decoding** splits IW into *operands* and *control signals* --- ## Two Products of Decoding | Decoder | Output | Consumed by | |---------|--------|-------------| | **RegDecoder** | `rs1`, `rs2`, `rd` | Register file | | **ImmDecoder** | sign-extended immediate | ALU input mux | | **InstDecoder** | `RFW`, `ALUOp`, `ALUSrcB`, ... | Muxes, ALU, RegFile |
In Lab 9
you
were the decoder — you manually typed register numbers and set the ALU op. Today we replace you with circuits.
--- ## Target Program The first program we want to run automatically: ```text first_s: li a0, 1 # addi a0, zero, 1 (I-type) li a1, 2 # addi a1, zero, 2 (I-type) add a2, a0, a1 # R-type unimp # end-of-program marker ``` We need exactly **two instruction forms**: - **I-type**: `addi` - **R-type**: `add` --- ## RISC-V Instruction Word Layout ```text 31 25 24 20 19 15 14 12 11 7 6 0 ┌────────┬───────┬───────┬──────┬───────┬────────┐ │ funct7 │ rs2 │ rs1 │funct3│ rd │ opcode │ └────────┴───────┴───────┴──────┴───────┴────────┘ 7 bits 5 bits 5 bits 3 bits 5 bits 7 bits ``` - **opcode** — broad instruction family - **funct3** — sub-divides within opcode - **funct7** — final disambiguation (e.g., `add` vs. `sub`) - **rd, rs1, rs2** — register operands --- ## The RegDecoder Register fields sit in **fixed bit positions** across all formats — no logic needed, just a splitter.
flowchart LR IW["IW (32)"] --> SP[Splitter] SP -->|"bits 11-7"| RD["rd (5)"] SP -->|"bits 19-15"| RS1["rs1 (5)"] SP -->|"bits 24-20"| RS2["rs2 (5)"]
- `rs1` → `ReadReg0` (RR0) → `RD0` - `rs2` → `ReadReg1` (RR1) → `RD1` - `rd` → `WriteReg` (WR) — destination for write-back
The splitter always produces all three fields. Unused fields (e.g.,
rs2
for
addi
) are simply ignored because control will not enable them.
--- ## Immediates Are Harder Two complications vs. registers: 1. **Bits are scattered** — each format reassembles them differently 2. **Must be sign-extended** to 64 bits ```text I-type imm[11:0] = IW[31:20] (addi, lw) S-type imm[11:5] = IW[31:25], imm[4:0] = IW[11:7] (sw, sd) B-type imm[12|10:5|4:1|11], implicit imm[0]=0 (beq, blt) U-type imm[31:12] = IW[31:12] (lui) J-type imm[20|10:1|11|19:12], implicit imm[0]=0 (jal) ``` For **Part 1** we only need the **I-type** immediate. --- ## Sign Extension A narrow immediate must fill 64 bits — replicate the sign bit upward.
flowchart LR IMM["imm (12 bits)"] --> LOW["low 12 bits of result"] SIGN["sign bit imm[11]"] --> MUX{"2-to-1 MUX"} Z["0x0...0 (52 bits)"] --> MUX O["0xF...F (52 bits)"] --> MUX MUX --> HIGH["upper 52 bits of result"] LOW --> OUT["imm (64 bits)"] HIGH --> OUT
```c int64_t sign_extend(uint64_t value, int sign_bit_index) { int shift = 63 - sign_bit_index; // e.g., 52 for I-type return ((int64_t)(value << shift)) >> shift; } ``` --- ## ImmDecoder: Two Design Options **Option 1 — one output per type** (selection in datapath): ```text IW --/32--> [ ImmDecoder ] --/64--> imm-i --/64--> imm-s --/64--> imm-b --/64--> imm-u --/64--> imm-j ``` **Option 2 — single selected output** (selection inside decoder): ```text IW --/32--> [ ImmDecoder ] --/64--> imm ImmSel ----> ``` For Part 1: use Option 1, wire only `imm-i` to the ALUSrcB mux. --- ## The InstDecoder: Three Stages `IW → control signals` in three steps:
flowchart LR A["1. Slice opcode, funct3, funct7"] --> B["2. Match each instruction (comparators + AND)"] B --> C["3. Map match to control signals"]
- **Stage 1**: splitter extracts discriminating fields - **Stage 2**: equality comparators + AND produce one-hot matches - **Stage 3**: priority encoder + ROM look up the control word --- ## Stage 1: Slice the Discriminating Fields
flowchart LR IW["IW (32)"] --> S[Splitter] S -->|"6-0"| OPC["opc (7)"] S -->|"14-12"| F3["f3 (3)"] S -->|"31-25"| F7["f7 (7)"]
RISC-V layers decoding: - **opcode** → format / broad family - **funct3** → sub-divides within opcode - **funct7** → last disambiguation (`add` vs. `sub`) The number of fields compared depends on the instruction. --- ## Stage 2: Instruction Detectors **ADDI** — I-type, opcode `0010011`, funct3 `000`, no funct7 check:
flowchart LR OPC[opc] --> C1["= 0010011"] F3[f3] --> C2["= 000"] C1 --> A((AND)) C2 --> A A --> ADDI["is_addi"]
**ADD** — R-type, must check all three fields:
flowchart LR OPC2[opc] --> D1["= 0110011"] F32[f3] --> D2["= 000"] F72[f7] --> D3["= 0000000"] D1 --> B((AND)) D2 --> B D3 --> B B --> ADD["is_add"]
--- ## Detectors in C The same logic you wrote in the emulator, now as gates: ```c uint32_t opc = iw & 0x7F; // bits 6-0 uint32_t f3 = (iw >> 12) & 0x7; // bits 14-12 uint32_t f7 = (iw >> 25) & 0x7F; // bits 31-25 int is_addi = (opc == 0b0010011) && (f3 == 0b000); int is_add = (opc == 0b0110011) && (f3 == 0b000) && (f7 == 0b0000000); ``` Each new instruction adds one more detector like these.
The spreadsheet keeps a "paste-ready"
0b…
column so you can copy constants directly into Digital comparator fields.
--- ## Stage 3: Priority Encoder → inum Feed one-hot detector outputs into a **priority encoder**:
flowchart LR A0["is_addi (in0)"] --> PE[Priority Encoder] A1["is_add (in1)"] --> PE X2["in2.."] --> PE PE -->|"3 bits"| INUM["inum"]
| inst | inum | |------|------| | `addi` | 0 | | `add` | 1 | **Why priority, not plain?** A plain encoder is undefined for zero or multiple inputs. A priority encoder produces a safe, deterministic output in both cases. --- ## Control Signals for Part 1 | Signal | Bits | Meaning | |--------|------|---------| | `RFW` | 1 | Register File Write enable | | `ALUOp` | 3 | ALU operation (`000`=add, `001`=sub, ...) | | `ALUSrcB` | 1 | `0`=register RD1, `1`=immediate | | inst | inum | RFW | ALUOp | ALUSrcB | |------|------|-----|-------|---------| | `addi` | 0 | 1 | 000 | 1 | | `add` | 1 | 1 | 000 | 0 | The only difference: **ALUSrcB** — does B come from a register or the immediate? --- ## Implementation A: Mux Tree Pack each instruction's control bits into a 5-bit constant `[RFW | ALUOp | ALUSrcB]`: ```text addi (inum 0): 1 000 1 = 0b10001 add (inum 1): 1 000 0 = 0b10000 ```
flowchart LR C0["0b10001 (addi)"] --> M{"MUX sel=inum"} C1["0b10000 (add)"] --> M M -->|"5 bits"| CTRL[control word] CTRL --> SP[Splitter] SP -->|"bit 0"| ASB[ALUSrcB] SP -->|"bits 3-1"| AOP[ALUOp] SP -->|"bit 4"| RFW[RFW]
Works, but grows by one hard-coded constant per instruction. --- ## Implementation B: Control ROM A mux-of-constants *is* a lookup table. Replace it with a **ROM**:
flowchart LR INUM["inum (3)"] --> ROM[ROM] ROM -->|"Q 5 bits"| CTRL[control word] CTRL --> SP[Splitter] SP -->|"bit 0"| ASB[ALUSrcB] SP -->|"bits 3-1"| AOP[ALUOp] SP -->|"bit 4"| RFW[RFW]
```text address data meaning 0 0b10001 addi: RFW=1 ALUOp=000 ALUSrcB=1 1 0b10000 add: RFW=1 ALUOp=000 ALUSrcB=0 2-7 0b00000 unused / safe default (RFW=0) ``` --- ## Why ROM is Better | | Mux tree | ROM | |---|----------|-----| | Add instruction | new mux input + hand-edit literal | add a row to spreadsheet | | Source of truth | scattered constants in schematic | one spreadsheet | | Automation | manual | Python script generates `.hex` |
The spreadsheet
Output bits
column is the ROM data at address INUM. Keep bit order consistent: low bit on the right.
--- ## The Control Spreadsheet One row per instruction: | Section | Columns | |---------|---------| | Identity | Instruction, Mnemonic, Format, INUM | | Inputs | opcode, funct3, funct7 (use `x` for don't-care) | | Control outputs | RFW, ALUOp, ALUSrcB, ... | | Convenience | `0b…` literals, Output bits (ROM data) | Build incrementally: get `addi` + `add` working, then add the next group. --- ## ROM Bit-Order Convention **Layout**: `[ RFW | ALUOp | ALUSrcB ]` — low bit on the right ```text bit position: 4 3 2 1 0 field: RFW ALUOp ALUSrcB addi → RFW=1, ALUOp=000, ALUSrcB=1 → 1_000_1 = 0b10001 add → RFW=1, ALUOp=000, ALUSrcB=0 → 1_000_0 = 0b10000 ``` After the ROM, a **splitter** recovers individual lines: - bits 0-0 → `ALUSrcB` - bits 3-1 → `ALUOp` - bit 4-4 → `RFW` --- ## Recipe for Adding a New Instruction
flowchart TD P[Pick instruction or group] --> C["1. Add/modify components"] C --> D["2. Add/modify datapath wires"] D --> M["3. Add/modify muxes"] M --> I["4. Update InstDecoder + spreadsheet/ROM"] I --> T[Test, then repeat] T --> P
--- ## The Mux Input-0 Convention When you add a **new control output** for a new instruction: - Wire the **original (pre-existing) path** to **input 0** of any new mux - Existing instructions default the new control line to `0` - Their behavior is automatically preserved **Example**: When adding `JAL`, put PC+4 on input 0 of the `PCsel` mux. Every non-jump instruction sets `PCsel=0` and keeps fetching sequentially.
You must fill in the new control column for
all existing
instructions, not just the new one.
--- ## End-to-End Trace: `add a2, a0, a1` IW = `0x00B50633` = `0000000 01011 01010 000 01100 0110011` **RegDecoder:** ```text rd = IW[11:7] = 01100 = 12 -> a2 rs1 = IW[19:15] = 01010 = 10 -> a0 (RD0 = 1) rs2 = IW[24:20] = 01011 = 11 -> a1 (RD1 = 2) ``` **InstDecoder:** ```text opc=0110011, f3=000, f7=0000000 -> is_add=1 -> inum=1 ROM[1] = 0b10000 -> RFW=1, ALUOp=000, ALUSrcB=0 ``` **Execute:** `ALU.A=1, ALU.B=RD1=2, result=3 -> a2=3` --- ## Contrast: `addi a0, zero, 1` Same pipeline, different control word: ```text opc=0010011, f3=000 -> is_addi=1 -> inum=0 ROM[0] = 0b10001 -> RFW=1, ALUOp=000, ALUSrcB=1 ``` **Execute:** `ALU.A=RD0=0 (zero), ALU.B=imm-i=1, result=1 -> a0=1`
**`add`** - ALUSrcB = **0** (register) - B = RD1 = a1
**`addi`** - ALUSrcB = **1** (immediate) - B = imm-i = 1
--- ## Extending to `sub` `sub` shares opcode (`0110011`) and funct3 (`000`) with `add` — only **funct7** differs: ```text add: funct7 = 0000000 sub: funct7 = 0100000 ``` Without the funct7 comparator both detectors fire — wrong ALU op runs. New ROM row for `sub` (inum 2, ALUOp=001): ```text RFW=1, ALUOp=001, ALUSrcB=0 -> 0b10010 ``` This is why RISC-V layers opcode → funct3 → funct7. --- ## Extending to `lw`: 4-Step Walkthrough **1. Components** — add Data Memory (RAM) **2. Datapath** — route ALU result to RAM address; RAM data output toward RegFile write-data **3. Muxes** — add `M2R` (Mem2Reg) mux: input 0 = ALU result, input 1 = memory data **4. InstDecoder** — add detector `is_lw = (opc==0000011) & (f3==010)`; new spreadsheet column `M2R`; existing instructions get `M2R=0` ```text lw row: RFW=1, ALUOp=000(add), ALUSrcB=1, M2R=1 ``` --- ## Key Concepts Reference | Concept | Summary | |---------|---------| | **IW** | 32-bit value from instruction memory | | **RegDecoder** | Splitter: `rd=IW[11:7]`, `rs1=IW[19:15]`, `rs2=IW[24:20]` | | **ImmDecoder** | Reassembles scattered bits + sign-extends to 64 bits | | **Sign extension** | Replicate sign bit into all upper positions | | **InstDecoder** | Comparators + AND → priority encoder → ROM | | **inum** | Compact internal instruction number (ROM address) | | **Control word** | Concatenated control bits; `[RFW\|ALUOp\|ALUSrcB]` | | **Control ROM** | `inum` → control word lookup table | --- ## Project06 Incremental Grading Project06 requires **four directories**: `part1`, `part2`, `part3`, `final` Each directory has matching spreadsheet sheets — concrete evidence of incremental development.
Build one instruction-group at a time. A small, working decoder is far easier to debug than one large decoder built all at once.
The spreadsheet → ROM pipeline is the **single source of truth**. A Python script generates the ROM `.hex` file from the spreadsheet automatically. --- ## Summary 1. **RegDecoder** is just a splitter — register fields are in fixed positions across all formats 2. **ImmDecoder** reassembles scattered bits and sign-extends to 64 bits; two interface options (per-type vs. selected) 3. **InstDecoder** identifies instructions via comparators + AND gates on `opcode`, `funct3`, `funct7` 4. **Priority encoder** collapses one-hot matches into compact `inum` 5. **Control ROM** maps `inum` → control word; contents come from the spreadsheet 6. **Adding an instruction** = 4-step recipe: components, datapath, muxes, decoder; new mux input 0 = original path