← Back to Course
# Processor Components ## CS 315 Computer Architecture --- ## Where We Are Everything so far feeds into processor design: - **C coding** → the programs we execute - **RISC-V assembly & machine code** → what the hardware reads - **RISC-V emulator** → software fetch-decode-execute - **Cache design** → fast memory access - **Digital design** → gates, MUXes, decoders, flip-flops
Today: build the
hardware
version of fetch-decode-execute.
--- ## ISA vs. Microarchitecture
ISA
= the hardware/software
contract
— what instructions exist and what they do.
Microarchitecture
= the
implementation
— how the hardware carries it out.
```text Software (compilers, programs) ───────────────────────────────────── ← ISA (the interface) Hardware (the processor) ``` - Software is written *to* the ISA - Hardware is built *to satisfy* the ISA - Same program runs on any conforming processor --- ## ISA vs. Microarchitecture — Comparison | | ISA | Microarchitecture | |---|-----|-------------------| | **Nature** | Specification / interface | Implementation | | **Answers** | *What* does it do? | *How* is it done? | | **Visible to SW?** | Yes | No (hidden) | | **Examples** | RISC-V, x86, ARM | single-cycle, pipelined | | **Change breaks SW?** | Yes | No | One ISA can have **many** microarchitectures. --- ## Digital Design Approaches | Approach | What it is | Example | |----------|------------|---------| | **Schematic entry** | Draw and wire components visually | Digital simulator | | **HDL** | Describe hardware in code, synthesize | Verilog, VHDL |
CS 315 uses
schematic entry in Digital
— you can see and trace the data path directly.
--- ## Moore's Law and Processor Complexity > **Moore's Law:** Transistor count doubles roughly every 1.5 years. More transistors → more room for complexity:
flowchart LR M["Moore's Law
2x transistors / ~1.5 yr"] --> C["More silicon"] C --> A["Caches"] C --> B["Multi-core"] C --> D["GPUs"] C --> E["Neural engines"]
CS 315 starts with the **simplest** design: a single-cycle processor. --- ## Single-Cycle vs. Later Designs | Design | Key Property | |--------|-------------| | **Single-cycle** | One complete instruction per clock cycle | | **Multi-cycle** | Break instruction into stages, reuse hardware | | **Pipelined** | Overlap stages of multiple instructions | **Course roadmap:** ```text Lab 9 → Lab 10 → Project 6 (components) (extend) (full processor) ``` --- ## Single-Cycle Processor: Major Components
flowchart LR PC["PC\nReg"] --> ROM["Inst\nMemory"] ROM --> DEC["Inst\nDecoder"] ROM --> RDEC["Reg/Imm\nDecoder"] RDEC --> RF["Register\nFile"] DEC --> RF DEC --> ALU RF --> ALU["ALU"] ALU --> RAM["Data\nMemory"] RAM --> RF ALU --> RF PC --> ADD["PC+4"] ADD --> PC
--- ## Component Summary | Component | Role | |-----------|------| | **PC register** (64-bit) | Address of current instruction | | **+4 adder** | Advance to next instruction | | **Instruction memory (ROM)** | Stores 32-bit machine code | | **Instruction decoder** | Produces control lines from IW | | **Register decoder** | Extracts rs1, rs2, rd fields | | **Immediate decoder** | Extracts/sign-extends immediates | | **Register file** | 2 read ports + 1 write port | | **ALU** | Arithmetic, logic, address math | | **Data memory (RAM)** | Loads and stores | --- ## One Clock Cycle, Step by Step
flowchart LR A["PC selects\ninstruction"] --> B["Fetch IW\nfrom ROM"] B --> C["Decode IW\ncontrol+reg+imm"] C --> D["Read regs\nRD0, RD1"] D --> E["ALU\ncomputes"] E --> F["RAM access\nload/store"] F --> G["Write back\non rising edge"] G --> H["PC = PC+4\nor branch"]
All state updates (register writes, memory writes, PC advance) happen on the
rising clock edge
.
--- ## The Clock and "Complete Instruction" ```text ____ ____ ____ | | | | | | _____| |____| |____| |____ ^ ^ ^ complete complete complete instr 1 instr 2 instr 3 ``` - Between edges: signals propagate **combinationally** - The period must accommodate the **slowest** instruction - Single-cycle is simple but not fast — every instruction pays for the worst case --- ## The Register File — Specification - **32 logical registers**: X0–X31 (64 bits each) - **Two read ports**: read two registers simultaneously - **One write port**: write one register per cycle - **X0 is hardwired to 0**: reads always return 0, writes discarded - Result: **31 physical registers** (X1–X31), X0 wired to constant 0
2
5
= 32
→ selectors are
5 bits
wide
RV64 registers → data buses are
64 bits
wide
--- ## Register File — Interface | Signal | Width | Meaning | |--------|-------|---------| | **RR0** | 5 | Read register 0 — selects output on RD0 | | **RR1** | 5 | Read register 1 — selects output on RD1 | | **RD0** | 64 | Value of register named by RR0 | | **RD1** | 64 | Value of register named by RR1 | | **WR** | 5 | Write register — destination | | **WD** | 64 | Write data — value to store | | **WE** | 1 | Write enable — write only when WE=1 | | **CLK** | 1 | Writes happen on rising edge | | **CLR** | 1 | Synchronously reset all registers to 0 | --- ## Why Two Read Ports? ```asm add a2, a0, a1 # a2 = a0 + a1 ``` In one cycle: read **two** sources and write **one** destination.
flowchart LR RR0["RR0 = a0"] --> RF["Register File"] RR1["RR1 = a1"] --> RF RF --> RD0["RD0 = value of a0"] RF --> RD1["RD1 = value of a1"] RD0 --> ALU RD1 --> ALU ALU --> WD["WD = a0 + a1"] WD --> RF WR["WR = a2, WE=1"] --> RF
Reads are **combinational**. Writes are **synchronous** (rising edge, WE=1). --- ## Adding CLR to a Digital Register Digital's built-in register has no CLR input. We **wrap** it: 1. **2-to-1 MUX** on `D`: when CLR=1, load 0; when CLR=0, load real D 2. **OR gate** on `EN`: `EN_in = EN OR CLR` — ensures clear triggers a write ```text D_in = CLR ? 0 : D (MUX) EN_in = EN OR CLR (OR gate) ``` | CLR | EN | Register loads on rising edge | |-----|----|-------------------------------| | 0 | 0 | holds current value | | 0 | 1 | D (normal write) | | 1 | 0 | 0 (cleared) | | 1 | 1 | 0 (CLR wins) | --- ## CLR Wrapper — Schematic
flowchart LR D["D (data in)"] --> MUX{"MUX\n0: D\n1: 0"} Z["0 (constant)"] --> MUX CLR["CLR"] --> MUX MUX --> RD["D_in"] EN["EN"] --> OR{"OR"} CLR --> OR OR --> RE["EN_in"] CLK["CLK"] --> RC["CLK"] subgraph REG["Wrapped 64-bit Register"] RD RC RE Q["Q (out)"] end Q --> OUT["output"]
This wrapped register is used for the **PC** and for every physical register X1–X31. --- ## Register File — Write Path Only one register written per cycle, only when WE=1. **5-to-32 decoder with enable:** - Input: WR (5 bits) - Enable: WE - Output: 32 one-hot lines → each drives one register's EN ```text WR (5) ──┐ ├─► 5-to-32 Decoder ──► EN_x1 → X1.EN WE (1) ──┘ with Enable ──► EN_x2 → X2.EN ──► ... ──► EN_x31 → X31.EN WD (64) ──────────────────────────► D of every register (shared) CLK ──────────────────────────────► CLK of every register (shared) ``` --- ## Register File — Read Path Two **independent 32-to-1 MUXes** (64 bits wide each):
flowchart TD subgraph REGS["Physical Registers"] X0["X0 = 0 (hardwired)"] X1["X1 ... X31"] end X0 --> M0["RD0 MUX 32:1"] X1 --> M0 X0 --> M1["RD1 MUX 32:1"] X1 --> M1 RR0["RR0 (sel)"] --> M0 RR1["RR1 (sel)"] --> M1 M0 --> RD0["RD0 (64)"] M1 --> RD1["RD1 (64)"]
RR0 and RR1 select different registers — both read **simultaneously and combinationally**. --- ## Register File — Full Structure
flowchart LR RR0in["RR0 (5)"] --> RM0["RD0 MUX"] RR1in["RR1 (5)"] --> RM1["RD1 MUX"] WRin["WR (5)"] --> DEC["Decoder+EN"] WEin["WE (1)"] --> DEC WDin["WD (64)"] --> BUS["shared D"] CLKin["CLK"] --> CLKB["shared CLK"] DEC --> REGS["Regs X1..X31"] BUS --> REGS CLKB --> REGS X0c["X0 = 0"] --> RM0 X0c --> RM1 REGS --> RM0 REGS --> RM1 RM0 --> RD0out["RD0 (64)"] RM1 --> RD1out["RD1 (64)"]
--- ## X0: Hardwired Zero
X0 is
not
a physical register — its output is the constant 0.
No flip-flop, no D input, no enable line
Reads of X0 always return 0
Writes to X0 are silently ignored (decoder never selects it)
**Why?** The RISC-V ISA guarantees X0 = 0. Building a flip-flop for it wastes hardware. Result: **32 logical registers** but only **31 physical registers**. --- ## Trace: `add a2, a0, a1` Assume `a0=5` (X10), `a1=3` (X11), dest `a2` (X12), WE=1: ```text RR0 = 10 RR1 = 11 RD0 = 5 RD1 = 3 ← combinational reads ALU: A=5, B=3, ALUop=000 (add) → R = 8 WR = 12, WD = 8, WE = 1 Decoder asserts EN only on line 12. X12 latches 8 on the RISING CLOCK EDGE. ```
Reads and ALU computation settle during the cycle. The state change happens
only at the rising edge
.
--- ## The ALU (Preview) The ALU is **purely combinational** — no clock, no state. | ALUop | Operation | |-------|-----------| | `000` | add | | `001` | sub | | `010` | mul | | `011` | sll (shift left logical) | | `100` | srl (shift right logical) | - Inputs: A (64), B (64), ALUop (3) - Output: R (64) - Also computes **branch/memory addresses** (address arithmetic = addition) --- ## Data Memory (Preview) - **RAM** component in Digital - Holds the **stack** (function params, preserved registers) - Address from ALU is a **byte address** → shift right by 3 for 64-bit word index
Data memory and the immediate extender are
out of scope for Lab 9
. They arrive in Lab 10 / Project 6.
--- ## Lab 9 Deliverables (due Mon Nov 3) Build and test core components: - **64-bit PC register with CLR** - **Register file**: 32 logical / 31 physical registers, 2 reads + 1 write - **ALU**: add, sub, mul, sll, srl - **Dashboard**: splitters, tunnels, probes showing register/PC state Key autograder filenames: - `lab09.dig` — inputs: `CLK`, `CLR`, `RR0`, `RR1`, `WR`, `WE`, `ALUSrcB`, `ALUOp`, `Imm`; outputs: `T0`, `T1` - `alu.dig` — inputs: `A`, `B`, `ALUOp`; output: `R` --- ## Lab 9 Test Programs ```asm addi t0, t0, 1 # T0 = 1 li t1, 2 # T1 = 2 addi t0, t0, -1 # T0 = 0xFFFFFFFFFFFFFFFF (-1 in 64-bit) li t0, 1 # T0 = 1 li t1, 1 # T1 = 1 sub t0, t0, t1 # T0 = 0 ``` Drive inputs manually to simulate each instruction. --- ## Project 6 Roadmap (due Nov 17) Extend Lab 9 into a full **single-cycle RISC-V processor**: | Part | Component | |------|-----------| | Part 1 | PC + instruction memory + decoder | | Part 2 | Register file + register decoder | | Part 3 | ALU + data memory + load/store | | Final | Complete processor with branch control |
Incremental submissions required:
part1
,
part2
,
part3
,
final
--- ## Key Concepts | Concept | Key Point | |---------|-----------| | **ISA** | SW/HW contract — what instructions do | | **Microarchitecture** | Implementation — how they are done | | **Single-cycle** | One instruction per clock cycle | | **Moore's Law** | 2x transistors / ~1.5 yr | | **PC register** | 64-bit; advances by PC+4 each cycle | | **Register file** | 2 read ports (combinational) + 1 write (synchronous) | | **X0** | Hardwired 0; 32 logical / 31 physical regs | | **CLR wrapper** | MUX + OR around Digital register | | **Decoder+EN** | One-hot write select, gated by WE | | **Read MUX tree** | Two 32:1 MUXes for simultaneous reads | --- ## Summary 1. **ISA is the interface** between software and hardware; microarchitecture is the implementation 2. **Single-cycle processor**: PC → ROM → decode → register file → ALU → RAM, all in one clock period 3. **Rising clock edge** latches all state: register writes, memory writes, PC advance 4. **Register file**: 2 combinational read ports (32:1 MUXes) + 1 synchronous write port (decoder+EN) 5. **X0 is hardwired to 0** — 31 physical registers, not 32 6. **CLR wrapper**: `D_in = CLR ? 0 : D`, `EN_in = EN OR CLR` — used for PC and every register 7. **Lab 9** builds PC register, register file, and ALU — foundation for Project 6's full processor