Lab: Processor ALU¶

Overview¶

This hands-on lab session continues building the single-cycle RISC-V processor from Lab09, focusing on the three components you wire together by hand before any control logic exists: the Register File, the ALU (Arithmetic Logic Unit), and the supporting combinational glue that connects them. We start by mapping a RISC-V instruction like add t0, t1, t2 onto the register-file control signals (WR, RR0, RR1, RD0, RD1), then build a decoder with enable so exactly one register updates per clock cycle. We design the ALU as a bank of combinational functional units (ADD, SUB, MUL, SLL, SRL) whose results are selected by a final multiplexer driven by the 3-bit ALUOp. Along the way we correct a common mistake — trying to multiply two full 64-bit operands — and finish with a 4-bit equality comparator built from XNOR gates, the building block you will reuse in the branch unit later.

Learning Objectives¶

Map a RISC-V three-operand instruction onto Register File control signals (WR, RR0, RR1, RD0, RD1)
Distinguish combinational logic (ALU, decoders, comparators) from sequential logic (PC, register file) and know which needs a CLK
Build a decoder with enable that produces a one-hot write signal gated by WE
Read register values out of a register file using a multiplexer with x0 hardwired to zero
Design an ALU as a parallel bank of functional units selected by a result multiplexer driven by ALUOp
Implement subtraction as A + (-B) and explain why multiplication uses only the lower 32 bits of each operand
Manage data-path widths (32-bit vs 64-bit) correctly using splitters
Build an N-bit equality comparator from XNOR gates and an AND tree

Prerequisites¶

Combinational logic: gates, truth tables, sum-of-products, multiplexers, decoders (Lab08)
Sequential logic: D flip-flops, N-bit registers with CLK/CLR/EN, counters (Lab09)
Adders: half adder, full adder, ripple-carry adder, subtraction via two's complement (Lab08)
RISC-V register conventions and the 64-bit register set x0–x31 (Project04)
Familiarity with the Digital schematic simulator (components, splitters, tunnels, probes)
The Lab09 spec: see /assignments/lab09/

1. Where the ALU Fits in the Processor¶

Recall the major components of the single-cycle processor we are assembling incrementally across Lab09 and Lab10:

flowchart LR
    PC[PC Register] --> IM[Instruction Memory ROM]
    IM --> ID[Instruction Decode]
    ID --> RF[Register File]
    RF -->|RD0| ALU[ALU]
    RF -->|RD1| MUX[ALUSrcB MUX]
    IMM[Immediate] --> MUX
    MUX --> ALU
    ALU -->|R| RF
    ALU --> DM[Data Memory RAM]
    PC --> ADD4["PC + 4"]
    ADD4 --> PC

    style ALU fill:#f9f,stroke:#333,stroke-width:2px
    style RF fill:#bbf,stroke:#333,stroke-width:2px

Today's focus: the Register File (a sequential component, it holds state) and the ALU (a combinational component, no state), plus the decoders and comparators that glue them together.

Combinational vs Sequential¶

This distinction drives almost every wiring decision in the lab.

Property	Combinational	Sequential
Holds state?	No	Yes
Needs `CLK`?	No	Yes
Output depends on	Current inputs only	Inputs and stored state
Examples here	ALU, decoder, comparator, MUX	PC register, register file
Circuit shape	Directed acyclic graph (no loops)	Has feedback through storage

The takeaway: the ALU has no CLK input. It is pure combinational logic — give it A, B, and ALUOp and the result R settles after the gate delays. The register file and PC are sequential — they only change on a clock edge.

2. From Instruction to Register-File Signals¶

Start with the canonical R-type instruction and ask: which register-file control line does each operand drive?

        rd    rs1   rs2
 add    t0,   t1,   t2          R = RD0 + RD1
         |     |     |
         v     v     v
        WR    RR0   RR1

For add t0, t1, t2 the processor must:

Read the two source registers rs1 (t1) and rs2 (t2). These selectors drive RR0 and RR1. The register file responds on its read-data outputs RD0 and RD1.
Compute RD0 + RD1 in the ALU.
Write the result back into the destination register rd (t0). That selector drives WR, and the write only happens when WE (write enable) is high.

The whole datapath for one R-type instruction is just:

   RR0 = rs1 ──► RegFile ──► RD0 ──┐
                                   ├──► ALU ──► R ──► WD ──► RegFile[WR]  (if WE=1)
   RR1 = rs2 ──► RegFile ──► RD1 ──┘
   WR  = rd

The Register File Interface¶

Signal	Width	Direction	Meaning
`RR0`	5 bits	in	Selects which register to read onto `RD0`
`RR1`	5 bits	in	Selects which register to read onto `RD1`
`WR`	5 bits	in	Selects the destination register to write
`WE`	1 bit	in	Write enable: only write when high
`WD`	64 bits	in	Data to write into register `WR`
`CLK`	1 bit	in	Clock — writes happen on the edge
`CLR`	1 bit	in	Clear all registers to zero
`RD0`	64 bits	out	Value of register `RR0`
`RD1`	64 bits	out	Value of register `RR1`
`x0`–`x31`	64 bits	out	Each register's value (for the dashboard)

The selectors are 5 bits because there are 32 registers and 2^5 = 32. We have two read ports (RR0/RD0 and RR1/RD1) and one write port (WR/WD/WE) so a single instruction can read two operands and write one result in the same clock cycle.

Naming gotcha corrected in lab: there are 32 registers, x0 through x31 — not x0 through x32. x0 is the zero register and is read-only; x1–x31 are writable.

3. The Register File Write Path: Decoder with Enable¶

The hard part of the write path is: given a 5-bit WR selector, update exactly one of the 31 writable registers, and only when WE is asserted. This is a decoder with enable.

A plain 5-to-32 decoder takes the 5-bit selector and drives exactly one of 32 outputs high (one-hot encoding). But a plain decoder always asserts one output. We do not want any register to update when WE is low, so we AND each decoder output with WE.

 Decoder with Enable

   sel (5) ──►┌──────────┐ 0 ──►──[ AND ]──► r0   (write-enable for x0)
              │          │ 1 ──►──[ AND ]──► r1
              │  5-to-32 │ 2 ──►──[ AND ]──► r2
              │  decoder │ . . .
              │          │31 ──►──[ AND ]──► r31
              └──────────┘
                              ▲
   WE ────────────────────────┘  (one input of every AND gate)

Each AND gate has two inputs: one decoder output line and the shared WE. The result r0..r31 is a gated one-hot signal: at most one line is high, and only when WE = 1. Line ri becomes the enable input (EN) of register xi.

flowchart LR
    SEL["WR (5 bits)"] --> DEC["5-to-32 Decoder"]
    DEC -->|line 0| A0["AND"]
    DEC -->|line 1| A1["AND"]
    DEC -->|line 2| A2["AND"]
    DEC -->|line 31| A31["AND"]
    WE["WE"] --> A0
    WE --> A1
    WE --> A2
    WE --> A31
    A0 --> R0["x0 EN"]
    A1 --> R1["x1 EN"]
    A2 --> R2["x2 EN"]
    A31 --> R31["x31 EN"]

Why gate with WE?¶

Without the WE gate, the decoder would always enable one register, so that register would be overwritten on every clock edge — even during instructions that should not write (a branch, a store, or a stalled cycle). Gating with WE makes the write conditional. This is the same idea as the global EN that lets you pause the whole processor: when EN = 0, no state element updates.

Common Mistakes¶

Forgetting the WE gate. Symptom: registers change on cycles where nothing should be written. Every ri line must pass through an AND with WE.
Writing to x0. x0 must always read as zero. The clean fix is to never enable x0's register — either omit r0 entirely or force it low. The emulator will also be updated to reject writes to x0.
Wrong selector width. Using 4 bits instead of 5 only addresses 16 registers. You need 5 bits for 32.

Alternate Approach (MUX-based gating)¶

There is a second way to gate the write, shown in lab as the "Alt Approach": instead of ANDing every decoder line with WE, feed WR through a small MUX so the effective write selector is forced to a no-op value when WE is low. Both styles work; the AND-gated decoder is the clearer of the two and is what most students used.

Tunnels in Digital: tunnels are fine for clean point-to-point connections (e.g., routing each xi to the dashboard), but use them sparingly. Overusing tunnels hides the data path and makes bugs harder to find.

4. The Register File Read Path: MUX Selection¶

Reading is simpler than writing. To produce RD0, take all 32 register values as inputs to a 32-input, 64-bit-wide multiplexer and use RR0 (5 bits) as the select line. Do the same with a second MUX driven by RR1 to produce RD1.

   x0  ──►┐
   x1  ──►│  32-input
   x2  ──►│   64-bit  ──► RD0
    .     │    MUX
    .  ──►│
   x31 ──►┘
            ▲
   RR0 (5) ─┘   (select)

The critical detail: x0 is hardwired to the constant 0. Input 0 of the read MUX is not a register output — it is a 64-bit constant zero. In hardware a constant is literally wired: each bit is tied to ground (0) or to the supply voltage (1).

   x0 = 0    ─── constant, every bit tied to ground
   x1 = Q ───────► from register x1's flip-flops
   x2 = Q ───────► from register x2
    .
   x31 = Q ──────► from register x31

In Digital you drop a Constant component set to 0 (64-bit). In Verilog the same idea is a literal:

// x0 is hardwired to zero in hardware
wire [63:0] x0 = 64'd0;        // 64-bit constant zero

// A 4-bit constant 0101 (decimal 5) is fixed at construction:
wire [3:0]  five = 4'b0101;    // bit3=0, bit2=1, bit1=0, bit0=1

These constants are not programmable — they are decided when the circuit is built. That is exactly why x0 can never be anything but zero: reading it routes the MUX to a hardwired constant, and (from Section 3) we never enable writing to it.

flowchart LR
    Z["Constant 0 (64-bit)"] --> M["32-input 64-bit MUX"]
    X1["x1 register"] --> M
    X2["x2 register"] --> M
    X31["x31 register"] --> M
    RR0["RR0 (5 bits)"] -->|select| M
    M --> RD0["RD0 (64 bits)"]

5. The ALU: Interface and Operations¶

The ALU is the arithmetic heart of the processor. Its block interface for Lab09:

            ALUOp (3)
              │
        ┌─────┴─────┐
  A ───►│           │
 (64)   │    ALU    │───► R (64)
  B ───►│           │
 (64)   └───────────┘

Signal	Width	Meaning
`A`	64 bits	First operand
`B`	64 bits	Second operand
`ALUOp`	3 bits	Selects the operation
`R`	64 bits	Result

ALUOp is 3 bits because we need to choose among five operations (2^3 = 8 codes available, five used):

`ALUOp`	Operation	Result `R`	RISC-V instructions
`0b000`	`add`	`A + B`	`add`, `addi`, address calc, branch target
`0b001`	`sub`	`A - B`	`sub`, comparisons
`0b010`	`mul`	`A * B` (low 64 bits)	`mul`
`0b011`	`sll`	`A << B`	`sll`, `slli`
`0b100`	`srl`	`A >> B` (logical)	`srl`, `srli`

The ALU does more than data-processing math: load/store instructions use add/sub to compute target addresses, and branches use add to compute the branch target address. So a correct adder is load-bearing across the whole processor.

Spec note: the Lab09 guide originally listed inconsistent ALUOp values; the encoding above (add=000, sub=001, mul=010, sll=011, srl=100) is the one to implement. Your alu.dig must have inputs A, B, ALUOp and output R.

6. ALU Internal Structure: Parallel Units + Output MUX¶

The standard ALU design is a bank of functional units that all compute in parallel, followed by a single multiplexer that selects which result to expose on R. Because the ALU is combinational, every unit computes on every cycle; the MUX simply picks the one we want.

            ┌──────┐
   A,B ────►│ ADD  │──► A+B ──┐
            └──────┘          │
            ┌──────┐          │
   A,B ────►│ SUB  │──► A-B ──┤
            └──────┘          │   ┌─────────┐
            ┌──────┐          ├──►│  result │──► R
   A,B ────►│ MUL  │──► A*B ──┤   │   MUX   │
            └──────┘          │   └─────────┘
            ┌──────┐          │        ▲
   A,B ────►│ SLL  │──► A<<B ─┤        │
            └──────┘          │     ALUOp (3)
            ┌──────┐          │
   A,B ────►│ SRL  │──► A>>B ─┘
            └──────┘

flowchart LR
    A["A (64)"] --> ADD
    A --> SUB
    A --> MUL
    A --> SLL
    A --> SRL
    B["B (64)"] --> ADD
    B --> SUB
    B --> MUL
    B --> SLL
    B --> SRL
    ADD["ADD"] --> MUX["Result MUX"]
    SUB["SUB"] --> MUX
    MUL["MUL"] --> MUX
    SLL["SLL"] --> MUX
    SRL["SRL"] --> MUX
    OP["ALUOp (3)"] -->|select| MUX
    MUX --> R["R (64)"]

In Digital you use the built-in Add, Sub, Mul, and shift components (or your own from Lab08) at 64-bit width, and a 64-bit MUX with a 3-bit select. The C model of the ALU makes the structure obvious:

#include <stdint.h>

// ALUOp encodings
#define ALU_ADD 0b000
#define ALU_SUB 0b001
#define ALU_MUL 0b010
#define ALU_SLL 0b011
#define ALU_SRL 0b100

uint64_t alu(uint64_t A, uint64_t B, uint8_t ALUOp) {
    switch (ALUOp) {
        case ALU_ADD: return A + B;
        case ALU_SUB: return A - B;
        case ALU_MUL: return A * B;          // low 64 bits of product
        case ALU_SLL: return A << (B & 63);  // shift amount is low 6 bits
        case ALU_SRL: return A >> (B & 63);  // logical (A is unsigned)
        default:      return 0;
    }
}

The switch is the software analog of the output MUX: in hardware all five branches "execute" simultaneously and the MUX selects one; in software we evaluate only the selected branch. Either way, ALUOp is the selector.

Subtraction = A + (−B)¶

You do not need a separate subtractor circuit. Subtraction reuses the adder via two's complement: A - B = A + (~B) + 1. In a hardware adder, invert every bit of B and set the initial carry-in to 1.

   A - B  ≡  A + (NOT B) + 1
            └── invert B ──┘ └ carry-in = 1

A subtract-failing test was diagnosed in lab; the usual culprit is forgetting the +1 (the carry-in), which gives A + ~B = A - B - 1, off by one.

7. Multiplication: The 64-bit Trap and the Fix¶

The naive instinct is to feed both full 64-bit operands into a multiplier. This is the mistake crossed out in red in the lab notes.

   ╳  ┌──────────────┐
   A ─┤              │
 (64) │  64x64 MUL   │──► (would be 128-bit product!)
   B ─┤              │
 (64) └──────────────┘
   ✗ Do not build a custom 64x64 multiplier

Two problems:

The product is 128 bits. Multiplying two 64-bit numbers can produce up to a 128-bit result, but our ALU result R is only 64 bits.
Building a custom 64×64 multiplier is wasteful and error-prone. Use the standard library component, not a hand-rolled relay-based circuit.

The fix used in lab: multiply only the lower 32 bits of each operand. The product of two 32-bit values fits in 64 bits, which exactly matches R. Use splitters to take bits 31–0 of A and bits 31–0 of B, feed those two 32-bit values into the multiplier, and take the 64-bit result.

                  bits 31-0
   A (64) ──►[splitter]──► A[31:0] (32) ──┐
                                          │  ┌──────┐
                                          ├─►│ MUL  │──► R (64)
   B (64) ──►[splitter]──► B[31:0] (32) ──┘  │32x32 │
                  bits 31-0                   └──────┘

flowchart LR
    A["A (64)"] --> SA["splitter: A[31:0]"]
    B["B (64)"] --> SB["splitter: B[31:0]"]
    SA -->|"32 bits"| MUL["32x32 MUL"]
    SB -->|"32 bits"| MUL
    MUL -->|"64-bit product"| R["R (64)"]

In C this is exactly:

// Multiply low 32 bits of each operand; 32x32 -> 64 fits in R
uint64_t mul_low32(uint64_t A, uint64_t B) {
    uint32_t a = (uint32_t)A;          // low 32 bits of A
    uint32_t b = (uint32_t)B;          // low 32 bits of B
    return (uint64_t)a * (uint64_t)b;  // 64-bit product
}

An alternative ALU that produces a full 128-bit result (using a wider output and two product halves) was discussed but deferred. For Lab09, the 32×32→64 approach is what you implement.

Width management with splitters¶

Mixing 32-bit and 64-bit wires is the #1 source of red error wires in Digital. Rules of thumb:

Use a splitter to extract a bit range (e.g., take bits 31–0 of a 64-bit wire to get a 32-bit wire).
Use a splitter/merger the other direction to combine narrow wires into a wide one.
Match component data-bit settings to your wire widths. A 32-bit MUL output cannot drive a 64-bit wire directly without widening.

8. The Equality Comparator (Building Block for Branches)¶

The last component in lab is a 4-bit equality comparator. Two values are equal when every bit position matches. "Bits match" is exactly the XNOR function (it outputs 1 when both inputs are equal). AND all the per-bit XNOR results together and you get a single eq signal.

XOR vs XNOR truth table¶

a	b	XOR (a≠b)	XNOR (a=b)
0	0	0	1
0	1	1	0
1	0	1	0
1	1	0	1

XNOR is the bit-equality test: XNOR(a,b) = 1 iff a == b. (XOR is the bit-difference test, useful for the not-equal/bne case.)

4-bit comparator structure¶

   A (4) ─►[splitter]─► a3 a2 a1 a0
   B (4) ─►[splitter]─► b3 b2 b1 b0

   XNOR(a0,b0) ─┐
   XNOR(a1,b1) ─┤
   XNOR(a2,b2) ─┼──[ AND ]──► eq   (1 iff A == B)
   XNOR(a3,b3) ─┘

flowchart LR
    A["A (4)"] --> SA["splitter"]
    B["B (4)"] --> SB["splitter"]
    SA --> X0["XNOR bit0"]
    SB --> X0
    SA --> X1["XNOR bit1"]
    SB --> X1
    SA --> X2["XNOR bit2"]
    SB --> X2
    SA --> X3["XNOR bit3"]
    SB --> X3
    X0 --> AND["AND (all bits equal)"]
    X1 --> AND
    X2 --> AND
    X3 --> AND
    AND --> EQ["eq"]

The Boolean expression for one bit's equality is eq_i = (āᵢ·b̄ᵢ) + (aᵢ·bᵢ), which is precisely XNOR. The full word equality is the AND of all four:

   eq = XNOR(a0,b0) · XNOR(a1,b1) · XNOR(a2,b2) · XNOR(a3,b3)

This scales: an N-bit comparator is N XNOR gates feeding one N-input AND. You will reuse this comparator (and its sub-based cousins for <, >=) in the Branch Unit when you add beq, bne, blt, and bge later. For now it cements the combinational-design pattern: per-bit gate, then combine.

9. Putting It Together: Running a Program by Hand¶

In Lab09 there is no control unit yet, so you play the role of the control logic, toggling the inputs manually. The top-level circuit lab09.dig must expose inputs CLK, CLR, RR0, RR1, WR, WE, ALUSrcB, ALUOp, Imm and outputs T0, T1. ALUSrcB is the select on a MUX that chooses whether ALU input B comes from RD1 (a register) or from Imm (an immediate).

Worked example — execute addi t0, t0, 1 so that T0 becomes 1:

Goal: t0 = t0 + 1   (t0 starts at 0)

1. Set CLR=1, pulse CLK to clear all registers (t0 = 0).
2. Set RR0 = t0's number   (read current t0 onto RD0 = 0)
3. Set ALUSrcB = 1         (B comes from Imm, not RD1)
4. Set Imm = 1
5. Set ALUOp = 0b000       (add)  ->  R = RD0 + Imm = 0 + 1 = 1
6. Set WR = t0's number, WE = 1   (destination is t0)
7. Pulse CLK               (on the edge, R=1 is written into t0)
8. Observe T0 = 1

The four programs the autograder checks build on this same recipe:

Program	Setup	Result
`addi t0, t0, 1`	add, `ALUSrcB=1`, `Imm=1`	`T0 = 1`
`li t1, 2`	add (li = `addi t1, x0, 2`), `Imm=2`	`T1 = 2`
`addi t0, t0, -1`	add, `Imm=-1`	`T0 = -1` (`0xFFFFFFFFFFFFFFFF`)
`li t0,1; li t1,1; sub t0,t0,t1`	two `li`, then `sub` with `ALUOp=001`, `ALUSrcB=0`	`T0 = 0`

Notice li is a pseudo-instruction: li t1, 2 is really addi t1, x0, 2 (add the immediate to the always-zero register). And addi t0, t0, -1 exercises that your adder and immediate path correctly handle two's-complement negatives — -1 in 64-bit two's complement is all ones.

flowchart LR
    CLR["CLR / CLK"] --> RF["Register File"]
    RR0["RR0 = rs1"] --> RF
    RF -->|RD0| ALU["ALU"]
    RF -->|RD1| MUXB["ALUSrcB MUX"]
    IMM["Imm"] --> MUXB
    SRCB["ALUSrcB"] -->|select| MUXB
    MUXB -->|B| ALU
    OP["ALUOp"] --> ALU
    ALU -->|R = WD| RF
    WR["WR = rd"] --> RF
    WE["WE"] --> RF
    RF --> T0["T0"]
    RF --> T1["T1"]

Key Concepts¶

Concept	Definition	Example
Combinational logic	Output depends only on current inputs; no state, no clock	ALU, decoder, MUX, comparator
Sequential logic	Output depends on inputs and stored state; clocked	PC register, register file
Register file	Storage of 32 registers with 2 read ports, 1 write port	`RR0/RR1 → RD0/RD1`, `WR/WD/WE`
Decoder with enable	One-hot decode of a selector, gated by an enable line	5-to-32 decoder, each output ANDed with `WE`
One-hot encoding	Exactly one line high at a time	Decoder output selecting one register
Hardwired constant	A value fixed by wiring bits to 0 (ground) or 1 (supply)	`x0 = 64'd0`, `4'b0101`
ALU	Bank of functional units + output MUX selected by `ALUOp`	`ALUOp=000 → A+B`
Subtraction via two's complement	`A - B = A + (~B) + 1`	Reuse the adder, carry-in = 1
Low-32 multiply	Multiply only bits 31–0 so the 64-bit product fits `R`	`(uint32_t)A * (uint32_t)B`
Splitter	Extracts/combines bit ranges to manage wire widths	take `A[31:0]` from a 64-bit wire
XNOR equality	Per-bit equality test; AND all bits for word equality	`eq = AND(XNOR(aᵢ,bᵢ))`

Practice Problems¶

Problem 1: Map the instruction to signals¶

For sub a2, a0, a1, identify which register number drives WR, RR0, and RR1, and what ALUOp should be. (RISC-V ABI: a0=x10, a1=x11, a2=x12.)

Click to reveal solution

 sub a2, a0, a1
       rd  rs1 rs2

 WR   = a2 = x12   (destination)
 RR0  = a0 = x10   (rs1 -> RD0)
 RR1  = a1 = x11   (rs2 -> RD1)
 ALUOp = 0b001     (subtract: R = RD0 - RD1)
 ALUSrcB = 0       (B comes from RD1, not Imm)
 WE   = 1          (we are writing a result)

The ALU computes `R = RD0 - RD1`, which is written to `x12` (`a2`) on the next clock edge because `WE = 1`.

Problem 2: Why gate the decoder with WE?¶

A student builds a 5-to-32 decoder and wires each output directly to a register's EN, with no WE gate. The program runs but every register changes every cycle. Explain the bug and the fix.

Click to reveal solution

A plain decoder **always** drives exactly one output high. With no `WE` gate, the selected register's `EN` is high on every clock edge, so it is overwritten every cycle — even on instructions that should not write (branches, stores, or stall cycles). **Fix:** AND each decoder output line with `WE`:

   ri = decoder_line_i  AND  WE

Now `ri` is high only when `WE = 1`, so writes are conditional. This is a *gated one-hot* signal: at most one line high, and only when writing is requested.

Problem 3: Trace the ALU¶

Given A = 0x0000000000000007, B = 0x0000000000000003, compute R for each ALUOp: 000, 001, 010, 011, 100.

Click to reveal solution

 ALUOp 000 (add):  R = A + B  = 7 + 3 = 0x0A           (10)
 ALUOp 001 (sub):  R = A - B  = 7 - 3 = 0x04           (4)
 ALUOp 010 (mul):  R = A * B  = 7 * 3 = 0x15           (21)
 ALUOp 011 (sll):  R = A << B = 7 << 3 = 0x38          (56)
 ALUOp 100 (srl):  R = A >> B = 7 >> 3 = 0x00          (0)

Check `sll`: `7 = 0b111`, shift left by 3 = `0b111000 = 56`. Check `srl`: `7 >> 3 = 0` because all three set bits shift off the bottom.

Problem 4: The subtract off-by-one bug¶

A student's sub produces results that are always one too small (A - B - 1). What did they forget, and how does two's-complement subtraction work?

Click to reveal solution

They forgot the **carry-in of 1**. Two's-complement subtraction is:

   A - B = A + (~B) + 1

The `~B` (bitwise NOT) is the one's complement; adding 1 makes it the two's complement, i.e. `-B`. If you invert `B` but leave carry-in at 0, you compute `A + ~B = A + (-B - 1) = A - B - 1`. **Fix:** set the adder's carry-in to 1 when subtracting (or, in Digital, simply use the built-in **Sub** component, which handles this internally).

Problem 5: Why not multiply full 64-bit operands?¶

Explain the two problems with feeding both full 64-bit operands into a multiplier, and how the lab solution avoids them.

Click to reveal solution

**Problem 1 — result width.** Multiplying two 64-bit numbers can produce up to a **128-bit** product, but the ALU result `R` is only 64 bits. The upper 64 bits would be lost (overflow), and a 128-bit result needs a wider output than we have. **Problem 2 — cost.** A custom 64×64 multiplier is large and error-prone; you should use the standard library component, not a hand-built relay circuit. **Solution.** Multiply only the **lower 32 bits** of each operand. A 32×32 product fits exactly in 64 bits:

   A[31:0] (32) ──┐
                  ├──► MUL ──► R (64)   product always fits
   B[31:0] (32) ──┘

Use splitters to extract bits 31–0 of `A` and `B`. In C: `(uint64_t)(uint32_t)A * (uint64_t)(uint32_t)B`.

Problem 6: Build a 2-bit equality comparator¶

Using XNOR and AND gates, write the Boolean expression for a 2-bit comparator eq where A = a1a0 and B = b1b0. Then give the truth-table row count and the eq value when A = 0b10, B = 0b10.

Click to reveal solution

   eq = XNOR(a0, b0) AND XNOR(a1, b1)

Equivalently, expanding XNOR:

   eq = ((a0·b0) + (a0'·b0')) · ((a1·b1) + (a1'·b1'))

The truth table has `2^(2+2) = 16` rows (all combinations of two 2-bit inputs). For `A = 0b10, B = 0b10`: bit0 `0==0 → XNOR=1`, bit1 `1==1 → XNOR=1`, so `eq = 1 AND 1 = 1`. The values are equal, as expected.

Summary¶

The ALU is combinational, the register file is sequential. Only sequential components (PC, register file) take a CLK; the ALU settles purely from A, B, and ALUOp.
An instruction maps directly onto register-file signals: rd → WR, rs1 → RR0, rs2 → RR1, with results read on RD0/RD1 and written from WD when WE = 1.
The write path is a decoder with enable: a 5-to-32 one-hot decoder whose outputs are ANDed with WE so exactly one register updates, and only when writing is requested.
The read path is a multiplexer per read port, with x0 hardwired to a 64-bit constant zero — constants are fixed wiring, never programmable.
The ALU is a bank of parallel functional units (ADD, SUB, MUL, SLL, SRL) feeding one output MUX selected by the 3-bit ALUOp (add=000, sub=001, mul=010, sll=011, srl=100).
Subtraction reuses the adder as A + (~B) + 1; forgetting the +1 carry-in is the classic off-by-one bug.
Multiplication uses only the lower 32 bits of each operand so the 64-bit product fits R; never build a custom 64×64 multiplier, and use splitters to manage 32-bit vs 64-bit widths.
Equality comparison is per-bit XNOR ANDed together, the combinational pattern you reuse to build the branch unit later.

Lab: Processor ALU¶

Overview¶

Learning Objectives¶

Prerequisites¶

1. Where the ALU Fits in the Processor¶

Combinational vs Sequential¶

2. From Instruction to Register-File Signals¶

The Register File Interface¶

3. The Register File Write Path: Decoder with Enable¶

Why gate with WE?¶

Common Mistakes¶

Alternate Approach (MUX-based gating)¶

4. The Register File Read Path: MUX Selection¶

5. The ALU: Interface and Operations¶

6. ALU Internal Structure: Parallel Units + Output MUX¶

Subtraction = A + (−B)¶

7. Multiplication: The 64-bit Trap and the Fix¶

Width management with splitters¶

8. The Equality Comparator (Building Block for Branches)¶

XOR vs XNOR truth table¶

4-bit comparator structure¶

9. Putting It Together: Running a Program by Hand¶

Key Concepts¶

Practice Problems¶

Problem 1: Map the instruction to signals¶

Problem 2: Why gate the decoder with WE?¶

Problem 3: Trace the ALU¶

Problem 4: The subtract off-by-one bug¶

Problem 5: Why not multiply full 64-bit operands?¶

Problem 6: Build a 2-bit equality comparator¶

Further Reading¶

Summary¶