Processor Instruction Decoding¶

Overview¶

This lecture turns the partial single-cycle processor from Lab 9 into a machine that can execute programs automatically — without you manually typing register numbers and ALU operations into the datapath. The missing piece is decoding: taking the raw 32-bit instruction word (IW) fetched from instruction memory and (1) extracting the register and immediate operands the datapath needs, and (2) generating the control signals that steer the muxes, the ALU, and the register file. We build three decoders — the RegDecoder, the ImmDecoder, and the InstDecoder — and develop a spreadsheet-driven, ROM-backed methodology for the control logic so that adding a new instruction is a small, repeatable edit rather than a redesign. This is the conceptual backbone of Project06.

Learning Objectives¶

Explain the role of decoding in the fetch–decode–execute cycle of a single-cycle processor
Build a RegDecoder that extracts rs1, rs2, and rd from the instruction word with a splitter
Build an ImmDecoder and compare the two design options (per-type outputs vs. a selected output)
Sign-extend a narrow immediate to a full 64-bit operand and explain why it is required
Identify a specific instruction by comparing the opcode, funct3, and funct7 fields with equality comparators and AND gates
Use a priority encoder to collapse one-hot instruction matches into a compact instruction number (inum)
Map inum to control signals with either a mux tree or a ROM, and split the control word into individual control lines
Drive the control spreadsheet that generates the ROM contents, and follow the 4-step recipe for adding new instructions

Prerequisites¶

The single-cycle processor components: PC, instruction memory (ROM), register file, ALU (Lab 10, Project06 Part 1)
RISC-V instruction formats — R, I, S, B, U, J — and which bit fields carry the opcode, funct3, funct7, registers, and immediates (Lab03, Project04)
Two's complement and sign extension (Project03, Lab: Bit Manipulation)
Digital combinational components: splitters, equality comparators, multiplexers, decoders, priority encoders, and ROM (Lab09, Project05)
Bit manipulation: masking and shifting to extract bit fields (Lab: Bit Manipulation)

1. Where Decoding Fits in the Processor¶

A single-cycle processor repeats the same loop forever: fetch the instruction at the PC, figure out what it means, do it, and advance the PC. The hardware version of "figure out what it means" is decoding.

flowchart LR
    PC[PC] --> IMEM[Instruction Memory]
    IMEM -->|IW 32 bits| DEC[Decode]
    DEC --> EX[Execute: RegFile + ALU]
    EX --> WB[Write Back]
    WB --> PCU[PC update]
    PCU --> PC

    style DEC fill:#f9f,stroke:#333,stroke-width:2px

The instruction memory hands us a single 32-bit value — the instruction word (IW). By itself the IW is just bits. Decoding splits it into two completely different kinds of information:

Kind of information	Who produces it	Who consumes it
Operands — `rs1`, `rs2`, `rd`, immediate	RegDecoder, ImmDecoder	RegFile, ALU input muxes
Control — RFW, ALUOp, ALUSrcB, ...	InstDecoder	RegFile write-enable, ALU, muxes

The key mental model for the whole lecture is two short "equations":

RegDecoder / ImmDecoder:   IW  -->  rs1, rs2, rd, imm     (operand data)
InstDecoder:               IW  -->  control signals       (what to do)

In Lab 9 you were the decoder: you read the instruction, typed the register numbers into the read/write selectors, and set the ALU operation by hand. Today we replace you with three circuits so the processor can run a program on its own.

The first program we want to run automatically is the one from the Project06 guide:

first_s:
    li   a0, 1          # really: addi a0, zero, 1   (I-type)
    li   a1, 2          # really: addi a1, zero, 2   (I-type)
    add  a2, a0, a1     # R-type
    unimp               # end-of-program marker

To run this we need exactly two instruction forms working: an I-type (addi) and an R-type (add). That is our target for the first decoder.

2. The RegDecoder¶

The RegDecoder answers: which registers does this instruction touch? Across the R, I, S, B, and J formats the register fields live in fixed bit positions, so extraction is trivial — it is one splitter, no logic.

IW (32 bits)
  bits  6-0   opcode
  bits 11-7   rd      <- destination register (5 bits)
  bits 14-12  funct3
  bits 19-15  rs1     <- first source register (5 bits)
  bits 24-20  rs2     <- second source register (5 bits)
  bits 31-25  funct7

flowchart LR
    IW["IW (32)"] --> SP[Splitter]
    SP -->|"bits 11-7"| RD["rd (5)"]
    SP -->|"bits 19-15"| RS1["rs1 (5)"]
    SP -->|"bits 24-20"| RS2["rs2 (5)"]

These three 5-bit outputs wire straight into the register file (see Project06 Part 1):

rs1 → ReadReg0 (RR0), producing RD0
rs2 → ReadReg1 (RR1), producing RD1
rd → WriteReg (WR), the destination for the write-back

A subtle but important point: the splitter always produces all three fields, even for instructions that do not use them. An addi has no rs2, but bits 24-20 still contain something. That is fine — those bits are simply ignored downstream because the control logic will not enable anything that reads rs2 for an addi. Decoders extract; control decides what matters.

3. The ImmDecoder¶

Immediates are harder than registers for two reasons:

The bits are scattered. RISC-V deliberately keeps the sign bit and most operand bits in the same positions across formats to simplify hardware, but the immediate bits are split up and re-ordered differently for each format. An I-type immediate is a clean 12-bit field; a B-type and J-type immediate are chopped into pieces and reassembled with an implicit low-order zero.
They must be sign-extended. The encoded immediate is narrow (12 or 20 bits) but the datapath is 64 bits wide. A negative immediate like -1 is 0xFFF in 12 bits and must become 0xFFFFFFFFFFFFFFFF in 64 bits.

Immediate layouts by format¶

I-type   imm[11:0]  = IW[31:20]                         (e.g. addi, lw, jalr)
S-type   imm[11:5]  = IW[31:25],  imm[4:0] = IW[11:7]   (e.g. sw, sd)
B-type   imm[12|10:5|4:1|11], implicit imm[0]=0          (e.g. beq, blt)
U-type   imm[31:12] = IW[31:12]                          (e.g. lui)
J-type   imm[20|10:1|11|19:12], implicit imm[0]=0        (e.g. jal)

For the first processor we only need the I-type immediate (for addi). Later, JAL needs the J-type immediate and the branches need the B-type immediate. The ImmDecoder is built once to produce all of them.

Sign extension¶

Sign extension copies the immediate's most-significant bit (its sign bit) into every higher bit position. In hardware this is a mux that selects the upper bits:

sign bit s = imm[top]
   if s == 0:  upper bits = 0x000...0   (positive: zero-fill)
   if s == 1:  upper bits = 0xFFF...F   (negative: one-fill)

flowchart LR
    IMM["imm (12 bits)"] --> LOW["low 12 bits of result"]
    SIGN["sign bit imm[11]"] --> MUX{"2-to-1 MUX"}
    Z["0x0...0 (52 bits)"] --> MUX
    O["0xF...F (52 bits)"] --> MUX
    MUX --> HIGH["upper 52 bits of result"]
    LOW --> OUT["imm (64 bits)"]
    HIGH --> OUT

In Digital this is one mux per sign-extender: input 0 is all zeros, input 1 is all ones, and the sign bit is the selector. The C model is exactly the sign_extend you wrote in the emulator:

int64_t sign_extend(uint64_t value, int sign_bit_index) {
    int shift = 63 - sign_bit_index;       // e.g. 63 - 11 = 52 for I-type
    return ((int64_t)(value << shift)) >> shift;  // arithmetic right shift fills with sign
}

Two design options for the ImmDecoder¶

The lecture presented two interface choices for packaging the ImmDecoder. They compute the same immediates; they differ in how the result reaches the datapath.

Option #1 — one output per immediate type. The box takes the 32-bit IW and produces a separate 64-bit output for each format: imm-i, imm-s, imm-b, imm-u, imm-j. The datapath (the ALUSrcB mux) then picks which one to use.

        +------------------+
        |                  |--/64--> imm-i
        |                  |--/64--> imm-s
IW --/32| ImmDecoder       |--/64--> imm-b
        |   (Option #1)    |--/64--> imm-u
        |                  |--/64--> imm-j
        +------------------+

Option #2 — one output, selected internally. The box takes the 32-bit IW and an ImmSel control input that says which format to decode, and produces a single 64-bit imm output.

              ImmSel
                |
        +-------v----------+
IW --/32|   ImmDecoder     |--/64--> imm
        |   (Option #2)    |
        +------------------+

	Option #1 (per-type outputs)	Option #2 (selected output)
Extra control line	none	needs `ImmSel` from InstDecoder
Wiring at top level	one wire per type into a mux	a single `imm` wire
Selection happens	in the datapath mux	inside the ImmDecoder
Mux fan-in	grows with formats	constant

For the first processor (only addi needs an immediate) the simplest move is to take Option #1 and wire just the imm-i output to the ALUSrcB mux. As more formats come online, either option works; the instructor planned to write up an alternate version of the guide using Option #2. The trade-off is "where does the format selection live" — in the datapath, or inside the decoder driven by a control line.

4. The InstDecoder: Recognizing an Instruction¶

The InstDecoder is the interesting one. Its job is IW → control signals. It works in three stages:

Slice out the discriminating fields of the IW (opcode, funct3, funct7).
Match those fields against the encodings of each supported instruction to produce a one-hot "this is instruction k" signal.
Map the match to control-line values.

This section covers stages 1 and 2. Section 5 covers stage 3.

Slicing the fields¶

A splitter pulls out exactly the bits that distinguish one instruction from another:

IW (32 bits)
   bits  6-0   --> opc   (opcode, 7 bits)
   bits 14-12  --> f3    (funct3, 3 bits)
   bits 31-25  --> f7    (funct7, 7 bits)

flowchart LR
    IW["IW (32)"] --> S[Splitter]
    S -->|"6-0"| OPC["opc (7)"]
    S -->|"14-12"| F3["f3 (3)"]
    S -->|"31-25"| F7["f7 (7)"]

Why these three fields? RISC-V layers its decoding:

opcode picks the format / broad family (I-type arithmetic vs. R-type arithmetic vs. loads vs. branches ...).
funct3 sub-divides within an opcode (add/sub vs. sll vs. slt ... all share the R-type opcode but differ in funct3).
funct7 disambiguates the last remaining ambiguity (e.g. add vs. sub share opcode and funct3, and differ only in funct7).

So the number of fields you must compare depends on the instruction.

Matching with equality comparators + AND¶

Each supported instruction becomes a small "detector": compare the necessary fields to the instruction's encoding with equality comparators, then AND the results. The output is 1 exactly when the current IW is that instruction.

ADDI is an I-type. Its opcode is 0010011 and its funct3 is 000. We do not care about funct7 (I-type does not have one — those bits are part of the immediate). Two comparators, one AND:

opc ==(7) 0010011 ----+
                      AND ---> is_addi
f3  ==(3)    000   ---+

flowchart LR
    OPC[opc] --> C1["= 0010011 (7-bit)"]
    F3[f3] --> C2["= 000 (3-bit)"]
    C1 --> A((AND))
    C2 --> A
    A --> ADDI["is_addi"]

ADD is an R-type. Its opcode is 0110011, funct3 is 000, and funct7 is 0000000. Because R-type arithmetic packs many operations under one opcode, we must check all three fields. Three comparators, one 3-input AND:

opc ==(7) 0110011 ----+
f3  ==(3)    000   ----AND ---> is_add
f7  ==(7) 0000000 ----+

flowchart LR
    OPC[opc] --> C1["= 0110011 (7-bit)"]
    F3[f3] --> C2["= 000 (3-bit)"]
    F7[f7] --> C3["= 0000000 (7-bit)"]
    C1 --> A((AND))
    C2 --> A
    C3 --> A
    A --> ADD["is_add"]

This is exactly the hand-decode you did in the emulator, rendered as gates. In C the same two detectors are:

uint32_t opc = iw         & 0x7F;   // bits  6-0
uint32_t f3  = (iw >> 12) & 0x7;    // bits 14-12
uint32_t f7  = (iw >> 25) & 0x7F;   // bits 31-25

int is_addi = (opc == 0b0010011) && (f3 == 0b000);
int is_add  = (opc == 0b0110011) && (f3 == 0b000) && (f7 == 0b0000000);

Each instruction you add to the processor adds one more detector like these. The detectors fan into the next stage.

The handwritten "0b" convention¶

In the notes the comparison constants are written with a leading 0b (e.g. 0b 0010011). That is not a circuit element — it is a reminder that the spreadsheet keeps a "paste-ready" column so you can copy 0b0010011 straight into a Digital comparator's value field without re-typing it in binary by hand.

5. From One-Hot Matches to Control Signals¶

We now have a one-hot bundle of detector outputs (is_addi, is_add, ...). The cleanest way to turn that into control lines is in two steps: encode the matches into a small instruction number (inum), then look up the control word for that inum.

Step 1: Priority encoder → inum¶

Feed the detector outputs into a priority encoder. With is_addi on input 0 and is_add on input 1, the encoder outputs a 3-bit inum:

        Priority Encoder
   is_addi --> in0 \
   is_add  --> in1  |--> inum (3 bits)
   (unused)-> in2   |
   (unused)-> in3  /
   (unused)-> in4 /

flowchart LR
    A0["is_addi"] --> PE[Priority Encoder]
    A1["is_add"] --> PE
    X2["(in2)"] --> PE
    X3["(in3)"] --> PE
    X4["(in4)"] --> PE
    PE -->|"3 bits"| INUM["inum"]

Why a priority encoder and not a plain one? A plain binary encoder assumes exactly one input is high. The priority encoder is robust: if zero inputs are high (an unrecognized instruction) or — by mistake — more than one, it produces a defined output (the highest-priority active input, or zero). It also conveniently gives inum = 0 when nothing matches, which we can treat as a safe default. Each supported instruction is assigned a fixed inum:

inst	inum
`addi`	0
`add`	1

inum is the compact "name" of the instruction inside the processor. Everything past this point is driven by inum, not by the raw IW.

Step 2: inum → control word¶

Now we map inum to the actual control-line values. The lecture showed two equivalent implementations.

For the first processor the control signals are:

Signal	Bits	Meaning
`RFW`	1	Register File Write enable — write the result back to `rd`?
`ALUOp`	3	Which ALU operation (`000`=add, `001`=sub, ...)
`ALUSrcB`	1	ALU B operand: `0` = register `RD1`, `1` = immediate

The control values per instruction:

inst	inum	RFW (1)	ALUOp (3)	ALUSrcB (1)
`addi`	0	1	000	1
`add`	1	1	000	0

Read the rows as intent:

addi a0, zero, 1 — compute zero + 1 in the ALU and write a0. So RFW=1 (yes, write), ALUOp=000 (add), ALUSrcB=1 (B comes from the immediate 1).
add a2, a0, a1 — compute a0 + a1 and write a2. So RFW=1, ALUOp=000 (add), ALUSrcB=0 (B comes from register a1 = RD1).

The only difference between the two is ALUSrcB: where does the second ALU operand come from?

6. Implementation A — Mux Tree¶

The first implementation packs each instruction's control bits into a constant and uses a mux selected by inum.

Concatenate the control bits into one word, lowest bit position on the right (this ordering convention matters — keep it consistent everywhere). With the layout [ RFW | ALUOp | ALUSrcB ] the word is 5 bits wide (1 + 3 + 1):

                bit:  4   3 2 1   0
                     RFW   ALUOp  ALUSrcB
addi (inum 0):        1    0 0 0    1      = 1_0001
add  (inum 1):        1    0 0 0    0      = 1_0000

A mux selects between these two constants using inum:

   1_0001 --> in0 \
   1_0000 --> in1  >-- (sel = inum) --> control (5 bits)
                  /

flowchart LR
    C0["1_0001 (addi)"] --> M{"MUX (sel = inum)"}
    C1["1_0000 (add)"] --> M
    M -->|"5 bits"| CTRL["control word"]

Then a splitter breaks the 5-bit control word back into individual control lines. The bit ranges follow the layout above:

control (5 bits)
   bit  0-0  --> ALUSrcB
   bits 3-1  --> ALUOp
   bit  4-4  --> RFW

flowchart LR
    CTRL["control (5)"] --> SP[Splitter]
    SP -->|"0-0"| ASB["ALUSrcB"]
    SP -->|"3-1"| AOP["ALUOp"]
    SP -->|"4-4"| RFW["RFW"]

Those three lines wire to the datapath:

ALUSrcB → the ALUSrcB mux selector
ALUOp → the ALU operation input
RFW → the register file WriteEn

This works, but the mux input list grows by one constant per instruction, and you must hand-edit a 5-bit binary literal for each. That is where the ROM comes in.

7. Implementation B — Control ROM¶

The cleaner implementation replaces the mux tree with a ROM. The insight: a mux selected by inum that returns a constant control word is a lookup table — and a ROM is exactly a lookup table in hardware.

inum --/3--> [  ROM  ] --Q--/5--> control

flowchart LR
    INUM["inum (3)"] --> ROM["ROM"]
    ROM -->|"Q (5 bits)"| CTRL["control"]
    CTRL --> SP[Splitter]
    SP -->|"0-0"| ASB["ALUSrcB"]
    SP -->|"3-1"| AOP["ALUOp"]
    SP -->|"4-4"| RFW["RFW"]

The ROM address is inum (3 bits → up to 8 instructions).
The ROM data width equals the number of control bits (here, 5).
The ROM contents are the control words from the spreadsheet, indexed by inum:

address (inum)   data (control word)
    0            0b10001     <- addi:  RFW=1 ALUOp=000 ALUSrcB=1
    1            0b10000     <- add:   RFW=1 ALUOp=000 ALUSrcB=0
    2..7         0b00000     <- unused / safe default (no write)

The ROM output Q feeds the same splitter as before to recover ALUSrcB, ALUOp, and RFW. The ROM and mux implementations are functionally identical; the ROM is preferred because:

Adding an instruction is just adding a row to the ROM contents — no new mux input, no re-wiring.
The contents come straight from the spreadsheet's "Output bits" column.
A small Python script can generate the ROM .hex file directly from the spreadsheet (provided for Project06).

This is why the lecture stressed the spreadsheet → ROM pipeline: the spreadsheet is the single source of truth, and the ROM is its hardware realization.

The control spreadsheet¶

The spreadsheet systematizes everything above. One row per instruction, organized into sections:

Section	Columns	Purpose
Identity	`Instruction`, `Mnemonic`, `Format`, `INUM`	name, 4-char abbrev, format, lookup key
Inputs	`opcode`, `funct3`, `funct7`, `funct6`	field encodings (use `x` for don't-care)
Control outputs	`RFW`, `ALUOp`, `ALUSrcB`, ...	what each component should do
Convenience	`0b…` columns, `Output bits`	paste-ready binary + the concatenated control word

The Output bits column is the concatenation of the control outputs in the agreed bit order — exactly the value that goes into the ROM at address INUM. Build it incrementally: get addi and add working, then add the next instruction.

8. The Recipe for Adding a New Instruction¶

The whole design is built so that growth is mechanical. To add a new instruction (or a group), follow these four steps (from the notes):

Add new components if needed, or modify existing components. (e.g. a new comparator block, a wider ALU, a Branch Unit.)
Add or modify the datapath — the value wires/buses that carry data between components.
Often this requires new or modified muxes — a new operand source usually means a new mux input or a wider selector.
Update the InstDecoder, possibly with new control-line outputs — add the detector, assign an inum, and fill in the spreadsheet row (and any new control columns) so the ROM produces the right control word.

flowchart TD
    P[Pick instruction or group] --> C[1. Add/modify components]
    C --> D[2. Add/modify datapath wires]
    D --> M[3. Add/modify muxes]
    M --> I[4. Update InstDecoder + spreadsheet/ROM]
    I --> T[Test, then repeat]
    T --> P

A crucial habit when you add a new control output: you must fill in that column for the existing instructions too, not just the new one. The convention that saves you here is to wire each new mux so that input 0 is the original (pre-existing) path. Then existing instructions almost always get the new control line set to 0, which keeps their behavior unchanged. For example, when JAL/JALR introduce a PCsel mux to choose PC+4 vs. a jump target, put PC+4 on input 0; every non-jump instruction sets PCsel = 0 and continues to fetch sequentially.

Incremental development is graded. Project06 requires part1, part2, part3, and final directories with matching spreadsheet sheets — concrete evidence that you grew the design one instruction-group at a time rather than building one giant decoder up front (which is far harder to debug).

9. End-to-End Trace: Executing `add a2, a0, a1`¶

Suppose a0 = 1, a1 = 2, and the PC points at add a2, a0, a1. Its machine encoding is 0x00B50633. Let us walk every decoder.

IW = 0x00B50633 = 0000000 01011 01010 000 01100 0110011
                  funct7  rs2   rs1   f3  rd    opcode

RegDecoder (splitter):

rd  = IW[11:7]  = 01100 = 12  -> x12 = a2
rs1 = IW[19:15] = 01010 = 10  -> x10 = a0
rs2 = IW[24:20] = 01011 = 11  -> x11 = a1

So RD0 = reg[a0] = 1 and RD1 = reg[a1] = 2; the write target is a2.

ImmDecoder: produces immediates, but add will ignore them (ALUSrcB=0).

InstDecoder:

opc = IW[6:0]   = 0110011   -> matches ADD opcode
f3  = IW[14:12] = 000       -> matches ADD funct3
f7  = IW[31:25] = 0000000   -> matches ADD funct7
=> is_add = 1  =>  inum = 1
ROM[1] = 0b10000  ->  RFW=1, ALUOp=000, ALUSrcB=0

Datapath with those controls:

ALU.A = RD0 = 1
ALU.B = (ALUSrcB=0) ? RD1 : imm  = RD1 = 2
ALU.R = (ALUOp=000 add) = 1 + 2 = 3
RegFile: WR=a2, WD=3, WE=RFW=1   ->  a2 <- 3
PC <- PC + 4

Result: a2 = 3, exactly what add a2, a0, a1 should do — and the processor did it with no manual help. Contrast addi a0, zero, 1: same flow but inum=0, ROM[0]=0b10001, so ALUSrcB=1 selects the immediate, ALU.B = imm-i = 1, and a0 <- zero + 1 = 1.

Key Concepts¶

Concept	Definition	Example
Instruction word (IW)	The 32-bit value fetched from instruction memory	`0x00B50633` for `add a2,a0,a1`
RegDecoder	Splitter that extracts `rs1`, `rs2`, `rd` from the IW	`rd = IW[11:7]`
ImmDecoder	Builds and sign-extends per-format immediates	`imm-i = IW[31:20]`, sign-extended to 64 bits
Sign extension	Replicating the sign bit into upper bits	`0xFFF` (12-bit −1) → `0xFFFF...FFFF`
InstDecoder	Maps IW to control signals	`IW → RFW, ALUOp, ALUSrcB`
opcode / funct3 / funct7	Fields that identify an instruction	`add`: `0110011 / 000 / 0000000`
Instruction detector	Comparators + AND that fire for one instruction	`is_add = (opc=..) & (f3=..) & (f7=..)`
Priority encoder	Encodes one-hot matches into a binary `inum`	`is_add=1` → `inum=1`
inum	Compact internal instruction number	`addi=0`, `add=1`
Control word	Concatenated control bits, low bit on the right	`addi` → `0b10001`
Control ROM	Lookup table: `inum` address → control word data	`ROM[1] = 0b10000`
RFW / ALUOp / ALUSrcB	Register-write / ALU-op / ALU-B-source signals	`ALUSrcB=1` selects the immediate

Practice Problems¶

Problem 1: Extract the fields¶

For the instruction word IW = 0x00208133 (which is add x2, x1, x2), give opcode, funct3, funct7, rd, rs1, and rs2.

Click to reveal solution

Write the 32 bits and split by field. `0x00208133`:

0000000 00010 00001 000 00010 0110011
funct7  rs2   rs1   f3  rd    opcode

opcode = 0110011 = 0x33   (R-type)
funct3 = 000
funct7 = 0000000
rd     = 00010 = 2   -> x2
rs1    = 00001 = 1   -> x1
rs2    = 00010 = 2   -> x2

This is `add x2, x1, x2`. Note the RegDecoder is purely a splitter — no logic, just slicing fixed bit ranges.

Problem 2: Build the detector for `sub`¶

sub is R-type with opcode 0110011, funct3 000, funct7 0100000. Notice it shares opcode and funct3 with add. Draw/describe the detector and explain why funct7 is essential here.

Click to reveal solution

opc ==(7) 0110011 ----+
f3  ==(3)    000   ----AND ---> is_sub
f7  ==(7) 0100000 ----+

`add` and `sub` are *identical* in opcode and funct3 — both `0110011 / 000`. The only field that distinguishes them is `funct7`: `0000000` for `add` versus `0100000` for `sub`. If you omitted the funct7 comparator, the `add` and `sub` detectors would both fire for either instruction, the priority encoder would mis-encode, and the wrong ALU operation would run. This is exactly why RISC-V layers decoding: opcode → funct3 → funct7.

Problem 3: Read the control word¶

The control layout is [ RFW(bit4) | ALUOp(bits 3-1) | ALUSrcB(bit0) ]. Decode the control word 0b11010 into RFW, ALUOp, and ALUSrcB. What does it tell the processor to do?

Click to reveal solution

Split `0b11010` (= bits `1 101 0`):

bit  0-0  ALUSrcB = 0      -> ALU B comes from register RD1
bits 3-1  ALUOp   = 101    -> ALU operation #5 (some op, depends on ALU encoding)
bit  4-4  RFW     = 1      -> write the ALU result back to rd

So: take both operands from registers, run ALU op `101`, and write the result to the destination register. It is a register-register (R-type-style) data-processing instruction whose ALU op is `101`. (In the Part 1 ALU, ops are `000`=add, `001`=sub, `010`=mul, `011`=sll, `100`=srl; `101` would be the next operation you add.)

Problem 4: ROM contents¶

You support three instructions: addi (inum 0), add (inum 1), and sub (inum 2). Using the layout from Problem 3 and ALU ops add=000, sub=001, write the ROM contents for addresses 0–3.

Click to reveal solution

Fill the control table, then concatenate `[RFW | ALUOp | ALUSrcB]`: | inst | inum | RFW | ALUOp | ALUSrcB | control word | |------|------|-----|-------|---------|--------------| | addi | 0 | 1 | 000 | 1 | `0b10001` | | add | 1 | 1 | 000 | 0 | `0b10000` | | sub | 2 | 1 | 001 | 0 | `0b10010` | ROM contents:

address 0:  0b10001   (addi)
address 1:  0b10000   (add)
address 2:  0b10010   (sub)
address 3:  0b00000   (unused: RFW=0, no write -> safe default)

`sub` differs from `add` only in `ALUOp` (`001` vs `000`), which becomes bits 3-1 = `001`, so its word is `0b10010`. The unused slot has `RFW=0` so an unrecognized `inum` cannot corrupt a register.

Problem 5: Why a priority encoder?¶

Your detectors are wired to a plain binary encoder. A bug causes both is_add and is_addi to fire at once. What goes wrong, and how does a priority encoder change the outcome?

Click to reveal solution

A plain binary encoder is only defined for **one-hot** inputs. If `is_addi` (input 0) and `is_add` (input 1) are both high, the encoder's output is unspecified — it might OR the codes together and produce a garbage `inum` (e.g. `00 | 01`), addressing the wrong ROM slot and running the wrong instruction. It also can't distinguish "no match" from "match input 0," since both look like `0`. A **priority encoder** assigns priorities. With multiple inputs active it deterministically outputs the code of the highest-priority active input (commonly the highest-numbered). So the result is always a valid, predictable `inum`. It also gives a defined output (typically 0) when nothing matches, which pairs nicely with a safe `RFW=0` default in ROM address 0's neighbors. The real fix is the bug (overlapping detectors), but the priority encoder fails *gracefully* instead of silently.

Problem 6: Add `lw` end-to-end¶

lw (load word) is an I-type (opcode 0000011, funct3 010) that computes address = rs1 + imm and writes the loaded value to rd. Walk through the 4-step recipe at a high level: what component, datapath, mux, and decoder changes are needed?

Click to reveal solution

**1. Components.** Add Data Memory (a RAM component) to hold loadable values. **2. Datapath.** Route the ALU result (the computed address `rs1 + imm`) to the RAM address input, and route the RAM data output back toward the register file write-data line. **3. Muxes.** The value written to `rd` can now come from the ALU *or* from memory. Add a write-data-source mux (Mem2Reg / M2R, or expand the existing WDsel mux): input 0 = ALU result (existing path), input 1 = memory data. Also ensure `ALUSrcB=1` so the ALU adds the immediate (address = `rs1 + imm`). **4. InstDecoder / spreadsheet.** Add a detector `is_lw = (opc==0000011) & (f3==010)`, assign it a new `inum`, and add a control column for the new mux (call it `M2R`). In the spreadsheet, set the *existing* instructions' `M2R = 0` (they take the ALU result on input 0) and `lw`'s row to `RFW=1, ALUOp=000(add), ALUSrcB=1, M2R=1`. Regenerate the ROM contents. Then test. The pattern is identical for every new instruction: detector → inum → spreadsheet row (+ any new control column, with existing rows defaulting to 0) → ROM → test.

Summary¶

Decoding splits the 32-bit IW into operands and control. The RegDecoder and ImmDecoder produce data (rs1, rs2, rd, immediates); the InstDecoder produces control signals (RFW, ALUOp, ALUSrcB).
The RegDecoder is just a splitter — register fields sit in fixed bit positions across formats, so rd = IW[11:7], rs1 = IW[19:15], rs2 = IW[24:20] need no logic.
The ImmDecoder reassembles scattered immediate bits and sign-extends them to 64 bits. Two interface options exist: per-type outputs (Option #1) with selection in the datapath, or a single output chosen by an ImmSel control line (Option #2).
The InstDecoder identifies an instruction by comparing opcode, funct3, and (when needed) funct7 with equality comparators and AND gates, producing a one-hot match per supported instruction.
A priority encoder collapses the one-hot matches into a small inum, which becomes the instruction's internal name and the address into the control lookup.
The control word is produced by inum — first shown as a mux tree over constant control words, then more cleanly as a ROM whose contents come straight from the control spreadsheet; a splitter recovers the individual control lines.
The spreadsheet → ROM pipeline is the source of truth. Each instruction is one row; the Output bits column is the ROM data at address INUM. Keep the bit-order convention consistent (low bit on the right).
Adding an instruction is a 4-step, repeatable edit — components, datapath, muxes, decoder — and you build the processor incrementally (new mux input 0 = original path, so existing instructions default new control lines to 0), which Project06 grades directly.

Processor Instruction Decoding¶

Overview¶

Learning Objectives¶

Prerequisites¶

1. Where Decoding Fits in the Processor¶

2. The RegDecoder¶

3. The ImmDecoder¶

Immediate layouts by format¶

Sign extension¶

Two design options for the ImmDecoder¶

4. The InstDecoder: Recognizing an Instruction¶

Slicing the fields¶

Matching with equality comparators + AND¶

The handwritten "0b" convention¶

5. From One-Hot Matches to Control Signals¶

Step 1: Priority encoder → inum¶

Step 2: inum → control word¶

6. Implementation A — Mux Tree¶

7. Implementation B — Control ROM¶

The control spreadsheet¶

8. The Recipe for Adding a New Instruction¶

9. End-to-End Trace: Executing add a2, a0, a1¶

Key Concepts¶

Practice Problems¶

Problem 1: Extract the fields¶

Problem 2: Build the detector for sub¶

Problem 3: Read the control word¶

Problem 4: ROM contents¶

Problem 5: Why a priority encoder?¶

Problem 6: Add lw end-to-end¶

Further Reading¶

Summary¶

9. End-to-End Trace: Executing `add a2, a0, a1`¶

Problem 2: Build the detector for `sub`¶

Problem 6: Add `lw` end-to-end¶