Lab: Processor ALU¶
Overview¶
This hands-on lab session continues building the single-cycle RISC-V processor from Lab09, focusing on the three components you wire together by hand before any control logic exists: the Register File, the ALU (Arithmetic Logic Unit), and the supporting combinational glue that connects them. We start by mapping a RISC-V instruction like add t0, t1, t2 onto the register-file control signals (WR, RR0, RR1, RD0, RD1), then build a decoder with enable so exactly one register updates per clock cycle. We design the ALU as a bank of combinational functional units (ADD, SUB, MUL, SLL, SRL) whose results are selected by a final multiplexer driven by the 3-bit ALUOp. Along the way we correct a common mistake — trying to multiply two full 64-bit operands — and finish with a 4-bit equality comparator built from XNOR gates, the building block you will reuse in the branch unit later.
Learning Objectives¶
- Map a RISC-V three-operand instruction onto Register File control signals (
WR,RR0,RR1,RD0,RD1) - Distinguish combinational logic (ALU, decoders, comparators) from sequential logic (PC, register file) and know which needs a
CLK - Build a decoder with enable that produces a one-hot write signal gated by
WE - Read register values out of a register file using a multiplexer with
x0hardwired to zero - Design an ALU as a parallel bank of functional units selected by a result multiplexer driven by
ALUOp - Implement subtraction as
A + (-B)and explain why multiplication uses only the lower 32 bits of each operand - Manage data-path widths (32-bit vs 64-bit) correctly using splitters
- Build an N-bit equality comparator from XNOR gates and an AND tree
Prerequisites¶
- Combinational logic: gates, truth tables, sum-of-products, multiplexers, decoders (Lab08)
- Sequential logic: D flip-flops, N-bit registers with
CLK/CLR/EN, counters (Lab09) - Adders: half adder, full adder, ripple-carry adder, subtraction via two's complement (Lab08)
- RISC-V register conventions and the 64-bit register set
x0–x31(Project04) - Familiarity with the Digital schematic simulator (components, splitters, tunnels, probes)
- The Lab09 spec: see /assignments/lab09/
1. Where the ALU Fits in the Processor¶
Recall the major components of the single-cycle processor we are assembling incrementally across Lab09 and Lab10:
flowchart LR
PC[PC Register] --> IM[Instruction Memory ROM]
IM --> ID[Instruction Decode]
ID --> RF[Register File]
RF -->|RD0| ALU[ALU]
RF -->|RD1| MUX[ALUSrcB MUX]
IMM[Immediate] --> MUX
MUX --> ALU
ALU -->|R| RF
ALU --> DM[Data Memory RAM]
PC --> ADD4["PC + 4"]
ADD4 --> PC
style ALU fill:#f9f,stroke:#333,stroke-width:2px
style RF fill:#bbf,stroke:#333,stroke-width:2px
Today's focus: the Register File (a sequential component, it holds state) and the ALU (a combinational component, no state), plus the decoders and comparators that glue them together.
Combinational vs Sequential¶
This distinction drives almost every wiring decision in the lab.
| Property | Combinational | Sequential |
|---|---|---|
| Holds state? | No | Yes |
Needs CLK? |
No | Yes |
| Output depends on | Current inputs only | Inputs and stored state |
| Examples here | ALU, decoder, comparator, MUX | PC register, register file |
| Circuit shape | Directed acyclic graph (no loops) | Has feedback through storage |
The takeaway: the ALU has no CLK input. It is pure combinational logic — give it A, B, and ALUOp and the result R settles after the gate delays. The register file and PC are sequential — they only change on a clock edge.
2. From Instruction to Register-File Signals¶
Start with the canonical R-type instruction and ask: which register-file control line does each operand drive?
For add t0, t1, t2 the processor must:
- Read the two source registers
rs1(t1) andrs2(t2). These selectors driveRR0andRR1. The register file responds on its read-data outputsRD0andRD1. - Compute
RD0 + RD1in the ALU. - Write the result back into the destination register
rd(t0). That selector drivesWR, and the write only happens whenWE(write enable) is high.
The whole datapath for one R-type instruction is just:
RR0 = rs1 ──► RegFile ──► RD0 ──┐
├──► ALU ──► R ──► WD ──► RegFile[WR] (if WE=1)
RR1 = rs2 ──► RegFile ──► RD1 ──┘
WR = rd
The Register File Interface¶
| Signal | Width | Direction | Meaning |
|---|---|---|---|
RR0 |
5 bits | in | Selects which register to read onto RD0 |
RR1 |
5 bits | in | Selects which register to read onto RD1 |
WR |
5 bits | in | Selects the destination register to write |
WE |
1 bit | in | Write enable: only write when high |
WD |
64 bits | in | Data to write into register WR |
CLK |
1 bit | in | Clock — writes happen on the edge |
CLR |
1 bit | in | Clear all registers to zero |
RD0 |
64 bits | out | Value of register RR0 |
RD1 |
64 bits | out | Value of register RR1 |
x0–x31 |
64 bits | out | Each register's value (for the dashboard) |
The selectors are 5 bits because there are 32 registers and 2^5 = 32. We have two read ports (RR0/RD0 and RR1/RD1) and one write port (WR/WD/WE) so a single instruction can read two operands and write one result in the same clock cycle.
Naming gotcha corrected in lab: there are 32 registers,
x0throughx31— notx0throughx32.x0is the zero register and is read-only;x1–x31are writable.
3. The Register File Write Path: Decoder with Enable¶
The hard part of the write path is: given a 5-bit WR selector, update exactly one of the 31 writable registers, and only when WE is asserted. This is a decoder with enable.
A plain 5-to-32 decoder takes the 5-bit selector and drives exactly one of 32 outputs high (one-hot encoding). But a plain decoder always asserts one output. We do not want any register to update when WE is low, so we AND each decoder output with WE.
Decoder with Enable
sel (5) ──►┌──────────┐ 0 ──►──[ AND ]──► r0 (write-enable for x0)
│ │ 1 ──►──[ AND ]──► r1
│ 5-to-32 │ 2 ──►──[ AND ]──► r2
│ decoder │ . . .
│ │31 ──►──[ AND ]──► r31
└──────────┘
▲
WE ────────────────────────┘ (one input of every AND gate)
Each AND gate has two inputs: one decoder output line and the shared WE. The result r0..r31 is a gated one-hot signal: at most one line is high, and only when WE = 1. Line ri becomes the enable input (EN) of register xi.
flowchart LR
SEL["WR (5 bits)"] --> DEC["5-to-32 Decoder"]
DEC -->|line 0| A0["AND"]
DEC -->|line 1| A1["AND"]
DEC -->|line 2| A2["AND"]
DEC -->|line 31| A31["AND"]
WE["WE"] --> A0
WE --> A1
WE --> A2
WE --> A31
A0 --> R0["x0 EN"]
A1 --> R1["x1 EN"]
A2 --> R2["x2 EN"]
A31 --> R31["x31 EN"]
Why gate with WE?¶
Without the WE gate, the decoder would always enable one register, so that register would be overwritten on every clock edge — even during instructions that should not write (a branch, a store, or a stalled cycle). Gating with WE makes the write conditional. This is the same idea as the global EN that lets you pause the whole processor: when EN = 0, no state element updates.
Common Mistakes¶
- Forgetting the
WEgate. Symptom: registers change on cycles where nothing should be written. Everyriline must pass through an AND withWE. - Writing to
x0.x0must always read as zero. The clean fix is to never enablex0's register — either omitr0entirely or force it low. The emulator will also be updated to reject writes tox0. - Wrong selector width. Using 4 bits instead of 5 only addresses 16 registers. You need 5 bits for 32.
Alternate Approach (MUX-based gating)¶
There is a second way to gate the write, shown in lab as the "Alt Approach": instead of ANDing every decoder line with WE, feed WR through a small MUX so the effective write selector is forced to a no-op value when WE is low. Both styles work; the AND-gated decoder is the clearer of the two and is what most students used.
Tunnels in Digital: tunnels are fine for clean point-to-point connections (e.g., routing each
xito the dashboard), but use them sparingly. Overusing tunnels hides the data path and makes bugs harder to find.
4. The Register File Read Path: MUX Selection¶
Reading is simpler than writing. To produce RD0, take all 32 register values as inputs to a 32-input, 64-bit-wide multiplexer and use RR0 (5 bits) as the select line. Do the same with a second MUX driven by RR1 to produce RD1.
The critical detail: x0 is hardwired to the constant 0. Input 0 of the read MUX is not a register output — it is a 64-bit constant zero. In hardware a constant is literally wired: each bit is tied to ground (0) or to the supply voltage (1).
x0 = 0 ─── constant, every bit tied to ground
x1 = Q ───────► from register x1's flip-flops
x2 = Q ───────► from register x2
.
x31 = Q ──────► from register x31
In Digital you drop a Constant component set to 0 (64-bit). In Verilog the same idea is a literal:
// x0 is hardwired to zero in hardware
wire [63:0] x0 = 64'd0; // 64-bit constant zero
// A 4-bit constant 0101 (decimal 5) is fixed at construction:
wire [3:0] five = 4'b0101; // bit3=0, bit2=1, bit1=0, bit0=1
These constants are not programmable — they are decided when the circuit is built. That is exactly why x0 can never be anything but zero: reading it routes the MUX to a hardwired constant, and (from Section 3) we never enable writing to it.
flowchart LR
Z["Constant 0 (64-bit)"] --> M["32-input 64-bit MUX"]
X1["x1 register"] --> M
X2["x2 register"] --> M
X31["x31 register"] --> M
RR0["RR0 (5 bits)"] -->|select| M
M --> RD0["RD0 (64 bits)"]
5. The ALU: Interface and Operations¶
The ALU is the arithmetic heart of the processor. Its block interface for Lab09:
| Signal | Width | Meaning |
|---|---|---|
A |
64 bits | First operand |
B |
64 bits | Second operand |
ALUOp |
3 bits | Selects the operation |
R |
64 bits | Result |
ALUOp is 3 bits because we need to choose among five operations (2^3 = 8 codes available, five used):
ALUOp |
Operation | Result R |
RISC-V instructions |
|---|---|---|---|
0b000 |
add |
A + B |
add, addi, address calc, branch target |
0b001 |
sub |
A - B |
sub, comparisons |
0b010 |
mul |
A * B (low 64 bits) |
mul |
0b011 |
sll |
A << B |
sll, slli |
0b100 |
srl |
A >> B (logical) |
srl, srli |
The ALU does more than data-processing math: load/store instructions use add/sub to compute target addresses, and branches use add to compute the branch target address. So a correct adder is load-bearing across the whole processor.
Spec note: the Lab09 guide originally listed inconsistent
ALUOpvalues; the encoding above (add=000, sub=001, mul=010, sll=011, srl=100) is the one to implement. Youralu.digmust have inputsA,B,ALUOpand outputR.
6. ALU Internal Structure: Parallel Units + Output MUX¶
The standard ALU design is a bank of functional units that all compute in parallel, followed by a single multiplexer that selects which result to expose on R. Because the ALU is combinational, every unit computes on every cycle; the MUX simply picks the one we want.
┌──────┐
A,B ────►│ ADD │──► A+B ──┐
└──────┘ │
┌──────┐ │
A,B ────►│ SUB │──► A-B ──┤
└──────┘ │ ┌─────────┐
┌──────┐ ├──►│ result │──► R
A,B ────►│ MUL │──► A*B ──┤ │ MUX │
└──────┘ │ └─────────┘
┌──────┐ │ ▲
A,B ────►│ SLL │──► A<<B ─┤ │
└──────┘ │ ALUOp (3)
┌──────┐ │
A,B ────►│ SRL │──► A>>B ─┘
└──────┘
flowchart LR
A["A (64)"] --> ADD
A --> SUB
A --> MUL
A --> SLL
A --> SRL
B["B (64)"] --> ADD
B --> SUB
B --> MUL
B --> SLL
B --> SRL
ADD["ADD"] --> MUX["Result MUX"]
SUB["SUB"] --> MUX
MUL["MUL"] --> MUX
SLL["SLL"] --> MUX
SRL["SRL"] --> MUX
OP["ALUOp (3)"] -->|select| MUX
MUX --> R["R (64)"]
In Digital you use the built-in Add, Sub, Mul, and shift components (or your own from Lab08) at 64-bit width, and a 64-bit MUX with a 3-bit select. The C model of the ALU makes the structure obvious:
#include <stdint.h>
// ALUOp encodings
#define ALU_ADD 0b000
#define ALU_SUB 0b001
#define ALU_MUL 0b010
#define ALU_SLL 0b011
#define ALU_SRL 0b100
uint64_t alu(uint64_t A, uint64_t B, uint8_t ALUOp) {
switch (ALUOp) {
case ALU_ADD: return A + B;
case ALU_SUB: return A - B;
case ALU_MUL: return A * B; // low 64 bits of product
case ALU_SLL: return A << (B & 63); // shift amount is low 6 bits
case ALU_SRL: return A >> (B & 63); // logical (A is unsigned)
default: return 0;
}
}
The switch is the software analog of the output MUX: in hardware all five branches "execute" simultaneously and the MUX selects one; in software we evaluate only the selected branch. Either way, ALUOp is the selector.
Subtraction = A + (−B)¶
You do not need a separate subtractor circuit. Subtraction reuses the adder via two's complement: A - B = A + (~B) + 1. In a hardware adder, invert every bit of B and set the initial carry-in to 1.
A subtract-failing test was diagnosed in lab; the usual culprit is forgetting the +1 (the carry-in), which gives A + ~B = A - B - 1, off by one.
7. Multiplication: The 64-bit Trap and the Fix¶
The naive instinct is to feed both full 64-bit operands into a multiplier. This is the mistake crossed out in red in the lab notes.
╳ ┌──────────────┐
A ─┤ │
(64) │ 64x64 MUL │──► (would be 128-bit product!)
B ─┤ │
(64) └──────────────┘
✗ Do not build a custom 64x64 multiplier
Two problems:
- The product is 128 bits. Multiplying two 64-bit numbers can produce up to a 128-bit result, but our ALU result
Ris only 64 bits. - Building a custom 64×64 multiplier is wasteful and error-prone. Use the standard library component, not a hand-rolled relay-based circuit.
The fix used in lab: multiply only the lower 32 bits of each operand. The product of two 32-bit values fits in 64 bits, which exactly matches R. Use splitters to take bits 31–0 of A and bits 31–0 of B, feed those two 32-bit values into the multiplier, and take the 64-bit result.
bits 31-0
A (64) ──►[splitter]──► A[31:0] (32) ──┐
│ ┌──────┐
├─►│ MUL │──► R (64)
B (64) ──►[splitter]──► B[31:0] (32) ──┘ │32x32 │
bits 31-0 └──────┘
flowchart LR
A["A (64)"] --> SA["splitter: A[31:0]"]
B["B (64)"] --> SB["splitter: B[31:0]"]
SA -->|"32 bits"| MUL["32x32 MUL"]
SB -->|"32 bits"| MUL
MUL -->|"64-bit product"| R["R (64)"]
In C this is exactly:
// Multiply low 32 bits of each operand; 32x32 -> 64 fits in R
uint64_t mul_low32(uint64_t A, uint64_t B) {
uint32_t a = (uint32_t)A; // low 32 bits of A
uint32_t b = (uint32_t)B; // low 32 bits of B
return (uint64_t)a * (uint64_t)b; // 64-bit product
}
An alternative ALU that produces a full 128-bit result (using a wider output and two product halves) was discussed but deferred. For Lab09, the 32×32→64 approach is what you implement.
Width management with splitters¶
Mixing 32-bit and 64-bit wires is the #1 source of red error wires in Digital. Rules of thumb:
- Use a splitter to extract a bit range (e.g., take bits 31–0 of a 64-bit wire to get a 32-bit wire).
- Use a splitter/merger the other direction to combine narrow wires into a wide one.
- Match component data-bit settings to your wire widths. A 32-bit MUL output cannot drive a 64-bit wire directly without widening.
8. The Equality Comparator (Building Block for Branches)¶
The last component in lab is a 4-bit equality comparator. Two values are equal when every bit position matches. "Bits match" is exactly the XNOR function (it outputs 1 when both inputs are equal). AND all the per-bit XNOR results together and you get a single eq signal.
XOR vs XNOR truth table¶
| a | b | XOR (a≠b) | XNOR (a=b) |
|---|---|---|---|
| 0 | 0 | 0 | 1 |
| 0 | 1 | 1 | 0 |
| 1 | 0 | 1 | 0 |
| 1 | 1 | 0 | 1 |
XNOR is the bit-equality test: XNOR(a,b) = 1 iff a == b. (XOR is the bit-difference test, useful for the not-equal/bne case.)
4-bit comparator structure¶
A (4) ─►[splitter]─► a3 a2 a1 a0
B (4) ─►[splitter]─► b3 b2 b1 b0
XNOR(a0,b0) ─┐
XNOR(a1,b1) ─┤
XNOR(a2,b2) ─┼──[ AND ]──► eq (1 iff A == B)
XNOR(a3,b3) ─┘
flowchart LR
A["A (4)"] --> SA["splitter"]
B["B (4)"] --> SB["splitter"]
SA --> X0["XNOR bit0"]
SB --> X0
SA --> X1["XNOR bit1"]
SB --> X1
SA --> X2["XNOR bit2"]
SB --> X2
SA --> X3["XNOR bit3"]
SB --> X3
X0 --> AND["AND (all bits equal)"]
X1 --> AND
X2 --> AND
X3 --> AND
AND --> EQ["eq"]
The Boolean expression for one bit's equality is eq_i = (āᵢ·b̄ᵢ) + (aᵢ·bᵢ), which is precisely XNOR. The full word equality is the AND of all four:
This scales: an N-bit comparator is N XNOR gates feeding one N-input AND. You will reuse this comparator (and its sub-based cousins for <, >=) in the Branch Unit when you add beq, bne, blt, and bge later. For now it cements the combinational-design pattern: per-bit gate, then combine.
9. Putting It Together: Running a Program by Hand¶
In Lab09 there is no control unit yet, so you play the role of the control logic, toggling the inputs manually. The top-level circuit lab09.dig must expose inputs CLK, CLR, RR0, RR1, WR, WE, ALUSrcB, ALUOp, Imm and outputs T0, T1. ALUSrcB is the select on a MUX that chooses whether ALU input B comes from RD1 (a register) or from Imm (an immediate).
Worked example — execute addi t0, t0, 1 so that T0 becomes 1:
Goal: t0 = t0 + 1 (t0 starts at 0)
1. Set CLR=1, pulse CLK to clear all registers (t0 = 0).
2. Set RR0 = t0's number (read current t0 onto RD0 = 0)
3. Set ALUSrcB = 1 (B comes from Imm, not RD1)
4. Set Imm = 1
5. Set ALUOp = 0b000 (add) -> R = RD0 + Imm = 0 + 1 = 1
6. Set WR = t0's number, WE = 1 (destination is t0)
7. Pulse CLK (on the edge, R=1 is written into t0)
8. Observe T0 = 1
The four programs the autograder checks build on this same recipe:
| Program | Setup | Result |
|---|---|---|
addi t0, t0, 1 |
add, ALUSrcB=1, Imm=1 |
T0 = 1 |
li t1, 2 |
add (li = addi t1, x0, 2), Imm=2 |
T1 = 2 |
addi t0, t0, -1 |
add, Imm=-1 |
T0 = -1 (0xFFFFFFFFFFFFFFFF) |
li t0,1; li t1,1; sub t0,t0,t1 |
two li, then sub with ALUOp=001, ALUSrcB=0 |
T0 = 0 |
Notice li is a pseudo-instruction: li t1, 2 is really addi t1, x0, 2 (add the immediate to the always-zero register). And addi t0, t0, -1 exercises that your adder and immediate path correctly handle two's-complement negatives — -1 in 64-bit two's complement is all ones.
flowchart LR
CLR["CLR / CLK"] --> RF["Register File"]
RR0["RR0 = rs1"] --> RF
RF -->|RD0| ALU["ALU"]
RF -->|RD1| MUXB["ALUSrcB MUX"]
IMM["Imm"] --> MUXB
SRCB["ALUSrcB"] -->|select| MUXB
MUXB -->|B| ALU
OP["ALUOp"] --> ALU
ALU -->|R = WD| RF
WR["WR = rd"] --> RF
WE["WE"] --> RF
RF --> T0["T0"]
RF --> T1["T1"]
Key Concepts¶
| Concept | Definition | Example |
|---|---|---|
| Combinational logic | Output depends only on current inputs; no state, no clock | ALU, decoder, MUX, comparator |
| Sequential logic | Output depends on inputs and stored state; clocked | PC register, register file |
| Register file | Storage of 32 registers with 2 read ports, 1 write port | RR0/RR1 → RD0/RD1, WR/WD/WE |
| Decoder with enable | One-hot decode of a selector, gated by an enable line | 5-to-32 decoder, each output ANDed with WE |
| One-hot encoding | Exactly one line high at a time | Decoder output selecting one register |
| Hardwired constant | A value fixed by wiring bits to 0 (ground) or 1 (supply) | x0 = 64'd0, 4'b0101 |
| ALU | Bank of functional units + output MUX selected by ALUOp |
ALUOp=000 → A+B |
| Subtraction via two's complement | A - B = A + (~B) + 1 |
Reuse the adder, carry-in = 1 |
| Low-32 multiply | Multiply only bits 31–0 so the 64-bit product fits R |
(uint32_t)A * (uint32_t)B |
| Splitter | Extracts/combines bit ranges to manage wire widths | take A[31:0] from a 64-bit wire |
| XNOR equality | Per-bit equality test; AND all bits for word equality | eq = AND(XNOR(aᵢ,bᵢ)) |
Practice Problems¶
Problem 1: Map the instruction to signals¶
For sub a2, a0, a1, identify which register number drives WR, RR0, and RR1, and what ALUOp should be. (RISC-V ABI: a0=x10, a1=x11, a2=x12.)
Click to reveal solution
The ALU computes `R = RD0 - RD1`, which is written to `x12` (`a2`) on the next clock edge because `WE = 1`.Problem 2: Why gate the decoder with WE?¶
A student builds a 5-to-32 decoder and wires each output directly to a register's EN, with no WE gate. The program runs but every register changes every cycle. Explain the bug and the fix.
Click to reveal solution
A plain decoder **always** drives exactly one output high. With no `WE` gate, the selected register's `EN` is high on every clock edge, so it is overwritten every cycle — even on instructions that should not write (branches, stores, or stall cycles). **Fix:** AND each decoder output line with `WE`: Now `ri` is high only when `WE = 1`, so writes are conditional. This is a *gated one-hot* signal: at most one line high, and only when writing is requested.Problem 3: Trace the ALU¶
Given A = 0x0000000000000007, B = 0x0000000000000003, compute R for each ALUOp: 000, 001, 010, 011, 100.
Click to reveal solution
Check `sll`: `7 = 0b111`, shift left by 3 = `0b111000 = 56`. Check `srl`: `7 >> 3 = 0` because all three set bits shift off the bottom.Problem 4: The subtract off-by-one bug¶
A student's sub produces results that are always one too small (A - B - 1). What did they forget, and how does two's-complement subtraction work?
Click to reveal solution
They forgot the **carry-in of 1**. Two's-complement subtraction is: The `~B` (bitwise NOT) is the one's complement; adding 1 makes it the two's complement, i.e. `-B`. If you invert `B` but leave carry-in at 0, you compute `A + ~B = A + (-B - 1) = A - B - 1`. **Fix:** set the adder's carry-in to 1 when subtracting (or, in Digital, simply use the built-in **Sub** component, which handles this internally).Problem 5: Why not multiply full 64-bit operands?¶
Explain the two problems with feeding both full 64-bit operands into a multiplier, and how the lab solution avoids them.
Click to reveal solution
**Problem 1 — result width.** Multiplying two 64-bit numbers can produce up to a **128-bit** product, but the ALU result `R` is only 64 bits. The upper 64 bits would be lost (overflow), and a 128-bit result needs a wider output than we have. **Problem 2 — cost.** A custom 64×64 multiplier is large and error-prone; you should use the standard library component, not a hand-built relay circuit. **Solution.** Multiply only the **lower 32 bits** of each operand. A 32×32 product fits exactly in 64 bits: Use splitters to extract bits 31–0 of `A` and `B`. In C: `(uint64_t)(uint32_t)A * (uint64_t)(uint32_t)B`.Problem 6: Build a 2-bit equality comparator¶
Using XNOR and AND gates, write the Boolean expression for a 2-bit comparator eq where A = a1a0 and B = b1b0. Then give the truth-table row count and the eq value when A = 0b10, B = 0b10.
Click to reveal solution
Equivalently, expanding XNOR: The truth table has `2^(2+2) = 16` rows (all combinations of two 2-bit inputs). For `A = 0b10, B = 0b10`: bit0 `0==0 → XNOR=1`, bit1 `1==1 → XNOR=1`, so `eq = 1 AND 1 = 1`. The values are equal, as expected.Further Reading¶
- Processor Design guides: Part 1, Part 2, Part 3
- Lab09 assignment spec: Lab 09
- Course key concepts: Key Concepts, Key Concepts (All)
- Source PDF: CS315-01 2025-10-29 Lab Processor ALU
- RISC-V Instruction Set Manual
- Digital simulator documentation
- Two's complement (Wikipedia)
Summary¶
-
The ALU is combinational, the register file is sequential. Only sequential components (PC, register file) take a
CLK; the ALU settles purely fromA,B, andALUOp. -
An instruction maps directly onto register-file signals:
rd → WR,rs1 → RR0,rs2 → RR1, with results read onRD0/RD1and written fromWDwhenWE = 1. -
The write path is a decoder with enable: a 5-to-32 one-hot decoder whose outputs are ANDed with
WEso exactly one register updates, and only when writing is requested. -
The read path is a multiplexer per read port, with
x0hardwired to a 64-bit constant zero — constants are fixed wiring, never programmable. -
The ALU is a bank of parallel functional units (ADD, SUB, MUL, SLL, SRL) feeding one output MUX selected by the 3-bit
ALUOp(add=000, sub=001, mul=010, sll=011, srl=100). -
Subtraction reuses the adder as
A + (~B) + 1; forgetting the+1carry-in is the classic off-by-one bug. -
Multiplication uses only the lower 32 bits of each operand so the 64-bit product fits
R; never build a custom 64×64 multiplier, and use splitters to manage 32-bit vs 64-bit widths. -
Equality comparison is per-bit XNOR ANDed together, the combinational pattern you reuse to build the branch unit later.