← Back to Course
# Lab: Processor ALU ## CS 315 Computer Architecture --- ## What We Are Building
flowchart LR PC["PC Register"] --> IM["Instruction Memory"] IM --> ID["Instruction Decode"] ID --> RF["Register File"] RF -->|"RD0"| ALU["ALU"] RF -->|"RD1"| MUX["ALUSrcB MUX"] IMM["Immediate"] --> MUX MUX --> ALU ALU -->|"R"| RF style ALU fill:#f9f,stroke:#333,stroke-width:2px style RF fill:#bbf,stroke:#333,stroke-width:2px
**Today's focus**: Register File + ALU --- ## Learning Objectives - Map a RISC-V instruction onto Register File signals (`WR`, `RR0`, `RR1`) - Distinguish **combinational** vs **sequential** logic - Build a **decoder with enable** for the write path - Design the ALU as parallel functional units + output MUX - Implement subtraction as `A + (~B) + 1` - Handle 64-bit vs 32-bit widths with splitters - Build an N-bit equality comparator from XNOR gates --- ## Combinational vs Sequential | Property | Combinational | Sequential | |----------|---------------|------------| | Holds state? | No | Yes | | Needs `CLK`? | No | Yes | | Output depends on | Current inputs only | Inputs **and** state | | Examples | ALU, decoder, MUX | PC, register file |
The
ALU has no CLK
— it is pure combinational logic. Give it A, B, ALUOp and R settles after gate delays.
--- ## From Instruction to Register Signals ```text rd rs1 rs2 add t0, t1, t2 -> R = RD0 + RD1 | | | v v v WR RR0 RR1 ``` For `add t0, t1, t2` the processor: 1. **Reads** `t1` and `t2` via `RR0`/`RR1` → `RD0`/`RD1` 2. **Computes** `RD0 + RD1` in the ALU 3. **Writes** result into `t0` via `WR`/`WD` when `WE=1` --- ## Register File Interface | Signal | Width | Meaning | |--------|-------|---------| | `RR0`, `RR1` | 5 bits | Select registers to read | | `RD0`, `RD1` | 64 bits | Read data outputs | | `WR` | 5 bits | Destination register | | `WE` | 1 bit | Write enable | | `WD` | 64 bits | Data to write | | `CLK` / `CLR` | 1 bit | Clock / clear |
5-bit selectors because 2
5
= 32 registers. Two read ports + one write port = one instruction per cycle.
--- ## Write Path: Decoder with Enable A plain 5-to-32 decoder drives exactly one output high — but it *always* drives one output. We must gate each line with `WE`: ```text WR (5) ──►┌──────────┐ 0 ──►[ AND ]──► r0 (EN for x0) │ 5-to-32 │ 1 ──►[ AND ]──► r1 │ decoder │ . . . │ │31 ──►[ AND ]──► r31 └──────────┘ ▲ WE ───────────────────────────────┘ ``` Result: **gated one-hot** — at most one line high, only when `WE=1` --- ## Decoder with Enable (Diagram)
flowchart LR SEL["WR (5 bits)"] --> DEC["5-to-32 Decoder"] WE["WE"] --> A0["AND"] & A1["AND"] & A2["AND"] & A31["AND"] DEC -->|"line 0"| A0 DEC -->|"line 1"| A1 DEC -->|"line 2"| A2 DEC -->|"line 31"| A31 A0 --> R0["x0 EN"] A1 --> R1["x1 EN"] A2 --> R2["x2 EN"] A31 --> R31["x31 EN"]
--- ## Common Write-Path Mistakes
Forgetting WE gate
: one register gets overwritten every clock cycle, even during branches or stores.
- **No WE gate** → selected register updates on every edge - **Writing to x0** → x0 must always be zero; never enable its register - **Wrong selector width** → 4 bits only addresses 16 registers; need 5 --- ## Read Path: MUX Selection To produce `RD0`: 32-input, 64-bit-wide MUX with `RR0` as select. Same for `RD1`/`RR1`.
flowchart LR Z["Constant 0 (64-bit)"] --> M["32-input 64-bit MUX"] X1["x1 register"] --> M X2["x2 register"] --> M X31["x31 register"] --> M RR0["RR0 (5 bits)"] -->|"select"| M M --> RD0["RD0 (64 bits)"]
**x0 is hardwired to constant 0** — tied to ground, not a register --- ## Hardwired Constants in Verilog ```verilog // x0 hardwired to zero — not a register, just ground wire [63:0] x0 = 64'd0; // A 4-bit constant 5 is fixed at construction: wire [3:0] five = 4'b0101; ``` Constants are **not programmable** — fixed when the circuit is built. That is why `x0` can never be anything but zero. --- ## ALU Interface ```text ALUOp (3) | +-----+-----+ A --->| | (64) | ALU |---> R (64) B --->| | (64) +-----------+ ``` | `ALUOp` | Operation | RISC-V instructions | |---------|-----------|---------------------| | `000` | `A + B` (add) | `add`, `addi`, address calc | | `001` | `A - B` (sub) | `sub`, comparisons | | `010` | `A * B` (mul) | `mul` | | `011` | `A << B` (sll) | `sll`, `slli` | | `100` | `A >> B` (srl) | `srl`, `srli` | --- ## ALU Internal Structure
flowchart LR A["A (64)"] --> ADD & SUB & MUL & SLL & SRL B["B (64)"] --> ADD & SUB & MUL & SLL & SRL ADD["ADD"] --> MUX["Result MUX"] SUB["SUB"] --> MUX MUL["MUL"] --> MUX SLL["SLL"] --> MUX SRL["SRL"] --> MUX OP["ALUOp (3)"] -->|"select"| MUX MUX --> R["R (64)"]
All units compute in parallel; MUX selects the result we want. --- ## ALU in C (the Software Analog) ```c #define ALU_ADD 0b000 #define ALU_SUB 0b001 #define ALU_MUL 0b010 #define ALU_SLL 0b011 #define ALU_SRL 0b100 uint64_t alu(uint64_t A, uint64_t B, uint8_t ALUOp) { switch (ALUOp) { case ALU_ADD: return A + B; case ALU_SUB: return A - B; case ALU_MUL: return A * B; // low 64 bits case ALU_SLL: return A << (B & 63); case ALU_SRL: return A >> (B & 63); default: return 0; } } ``` The `switch` is the software analog of the output MUX. --- ## Subtraction = A + (~B) + 1 No separate subtractor needed — reuse the adder via two's complement: ```text A - B = A + (~B) + 1 invert B carry-in = 1 ```
Classic bug: forgetting the +1 carry-in gives A + ~B = A - B - 1 (off by one).
In Digital: simply use the built-in **Sub** component, which handles this internally. --- ## Multiplication: The 64-bit Trap **Wrong approach**: feed full 64-bit operands into a multiplier. - Two 64-bit values produce a **128-bit** product — doesn't fit in `R` - Building a custom 64x64 multiplier is wasteful and error-prone **Correct approach**: multiply only the lower 32 bits of each operand.
flowchart LR A["A (64)"] --> SA["splitter: A[31:0]"] B["B (64)"] --> SB["splitter: B[31:0]"] SA -->|"32 bits"| MUL["32x32 MUL"] SB -->|"32 bits"| MUL MUL -->|"64-bit product"| R["R (64)"]
--- ## Multiplication in C ```c // Multiply low 32 bits; 32x32 -> 64 always fits in R uint64_t mul_low32(uint64_t A, uint64_t B) { uint32_t a = (uint32_t)A; // low 32 bits of A uint32_t b = (uint32_t)B; // low 32 bits of B return (uint64_t)a * (uint64_t)b; } ``` Use **splitters** in Digital to extract `A[31:0]` and `B[31:0]` from the 64-bit wires. --- ## Width Management with Splitters Mixing 32-bit and 64-bit wires is the #1 source of red error wires in Digital. | Situation | Tool | |-----------|------| | Extract bit range (64→32) | splitter | | Combine narrow into wide (32→64) | splitter/merger | | Component data-bit mismatch | adjust component settings |
A 32-bit MUL output cannot drive a 64-bit wire directly without widening — always match widths.
--- ## Equality Comparator: XNOR Two values are equal when **every** bit position matches. XNOR tests bit equality: | a | b | XOR (a≠b) | XNOR (a=b) | |---|---|-----------|------------| | 0 | 0 | 0 | 1 | | 0 | 1 | 1 | 0 | | 1 | 0 | 1 | 0 | | 1 | 1 | 0 | 1 | `XNOR(a, b) = 1` iff `a == b` --- ## 4-bit Equality Comparator
flowchart LR A["A (4)"] --> SA["splitter"] B["B (4)"] --> SB["splitter"] SA --> X0["XNOR bit0"] SB --> X0 SA --> X1["XNOR bit1"] SB --> X1 SA --> X2["XNOR bit2"] SB --> X2 SA --> X3["XNOR bit3"] SB --> X3 X0 & X1 & X2 & X3 --> AND["AND (4-input)"] AND --> EQ["eq"]
```text eq = XNOR(a0,b0) AND XNOR(a1,b1) AND XNOR(a2,b2) AND XNOR(a3,b3) ``` --- ## Comparator Scales to N Bits An N-bit comparator is N XNOR gates feeding one N-input AND. ```text eq = AND of XNOR(aᵢ, bᵢ) for all i in 0..N-1 ```
You will reuse this comparator (and sub-based cousins for <, >=) in the Branch Unit for beq, bne, blt, bge.
--- ## Top-Level Circuit: lab09.dig Inputs exposed: `CLK`, `CLR`, `RR0`, `RR1`, `WR`, `WE`, `ALUSrcB`, `ALUOp`, `Imm` Outputs: `T0` (= x5), `T1` (= x6)
flowchart LR CLK["CLK/CLR"] --> RF["Register File"] RR0["RR0"] --> RF RF -->|"RD0"| ALU["ALU"] RF -->|"RD1"| MUXB["ALUSrcB MUX"] IMM["Imm"] --> MUXB SRCB["ALUSrcB"] -->|"select"| MUXB MUXB -->|"B"| ALU OP["ALUOp"] --> ALU ALU -->|"R = WD"| RF WR["WR"] --> RF WE["WE"] --> RF RF --> T0["T0"] & T1["T1"]
--- ## Running a Program by Hand No control unit yet — you play the role of control logic. Example: `addi t0, t0, 1` ```text 1. CLR=1, pulse CLK -> all registers = 0 2. RR0 = 5 (t0) -> RD0 = 0 3. ALUSrcB = 1 -> B comes from Imm 4. Imm = 1 5. ALUOp = 000 -> R = 0 + 1 = 1 6. WR = 5, WE = 1 -> destination is t0 7. Pulse CLK -> t0 = 1 8. Observe T0 = 1 ``` --- ## Autograder Test Programs | Program | Key Signals | Expected Result | |---------|-------------|-----------------| | `addi t0, t0, 1` | `ALUSrcB=1, Imm=1, ALUOp=000` | `T0 = 1` | | `li t1, 2` | (`addi t1, x0, 2`) `Imm=2` | `T1 = 2` | | `addi t0, t0, -1` | `Imm=-1, ALUOp=000` | `T0 = 0xFFF...F` | | `li t0,1; li t1,1; sub t0,t0,t1` | two `li`, then `ALUOp=001, ALUSrcB=0` | `T0 = 0` |
li t1, 2
is a pseudo-instruction: really
addi t1, x0, 2
— add immediate to the always-zero x0.
--- ## Key Concept Reference | Concept | Key Point | |---------|-----------| | Combinational vs Sequential | ALU has no CLK; register file does | | Register file signals | `rd→WR`, `rs1→RR0`, `rs2→RR1` | | Decoder with enable | One-hot outputs ANDed with `WE` | | x0 hardwired | Constant 0, never written | | ALU structure | Parallel units, output MUX by `ALUOp` | | Subtraction | `A + (~B) + 1`, carry-in = 1 | | Multiply | Use `A[31:0] * B[31:0]` → 64-bit result | | Equality | N XNOR gates + N-input AND | --- ## Practice: Map the Instruction For `sub a2, a0, a1` — identify `WR`, `RR0`, `RR1`, `ALUOp`: ```text sub a2, a0, a1 rd rs1 rs2 WR = x12 (a2, destination) RR0 = x10 (a0, rs1 -> RD0) RR1 = x11 (a1, rs2 -> RD1) ALUOp = 001 (subtract: R = RD0 - RD1) ALUSrcB = 0 (B from RD1, not Imm) WE = 1 ``` --- ## Practice: Trace the ALU Given `A = 7`, `B = 3`: ```text ALUOp 000 (add): R = 7 + 3 = 10 (0x0A) ALUOp 001 (sub): R = 7 - 3 = 4 (0x04) ALUOp 010 (mul): R = 7 * 3 = 21 (0x15) ALUOp 011 (sll): R = 7 << 3 = 56 (0x38) ALUOp 100 (srl): R = 7 >> 3 = 0 (0x00) ``` Check `sll`: `7 = 0b111`, shift left 3 = `0b111000 = 56` Check `srl`: all three set bits shift off the bottom → 0 --- ## Summary 1. **ALU is combinational** (no CLK); register file is sequential (has CLK) 2. **R-type signals**: `rd→WR`, `rs1→RR0`, `rs2→RR1`; write when `WE=1` 3. **Decoder with enable**: 5-to-32 decoder, each output ANDed with `WE` 4. **Read path**: 32-input MUX per port; `x0` hardwired to constant 0 5. **ALU**: parallel functional units (ADD/SUB/MUL/SLL/SRL) + output MUX by `ALUOp` 6. **Subtraction**: `A + (~B) + 1`; forgetting `+1` causes off-by-one 7. **Multiply**: use `A[31:0] * B[31:0]` — 64-bit result fits `R` 8. **Equality comparator**: per-bit XNOR, AND all bits — reused in branch unit