Processor Components

# Processor Components

## CS 315 Computer Architecture

---

## Where We Are

Everything so far feeds into processor design:

- **C coding** → the programs we execute
- **RISC-V assembly & machine code** → what the hardware reads
- **RISC-V emulator** → software fetch-decode-execute
- **Cache design** → fast memory access
- **Digital design** → gates, MUXes, decoders, flip-flops

<div class="info-box">
Today: build the <strong>hardware</strong> version of fetch-decode-execute.
</div>

---

## ISA vs. Microarchitecture

<div class="highlight-box">
<strong>ISA</strong> = the hardware/software <em>contract</em> — what instructions exist and what they do.
<br>
<strong>Microarchitecture</strong> = the <em>implementation</em> — how the hardware carries it out.
</div>

```text
        Software (compilers, programs)
   ─────────────────────────────────────  ← ISA (the interface)
        Hardware (the processor)
```

- Software is written *to* the ISA
- Hardware is built *to satisfy* the ISA
- Same program runs on any conforming processor

---

## ISA vs. Microarchitecture — Comparison

| | ISA | Microarchitecture |
|---|-----|-------------------|
| **Nature** | Specification / interface | Implementation |
| **Answers** | *What* does it do? | *How* is it done? |
| **Visible to SW?** | Yes | No (hidden) |
| **Examples** | RISC-V, x86, ARM | single-cycle, pipelined |
| **Change breaks SW?** | Yes | No |

One ISA can have **many** microarchitectures.

---

## Digital Design Approaches

| Approach | What it is | Example |
|----------|------------|---------|
| **Schematic entry** | Draw and wire components visually | Digital simulator |
| **HDL** | Describe hardware in code, synthesize | Verilog, VHDL |

<div class="info-box">
CS 315 uses <strong>schematic entry in Digital</strong> — you can see and trace the data path directly.
</div>

---

## Moore's Law and Processor Complexity

> **Moore's Law:** Transistor count doubles roughly every 1.5 years.

More transistors → more room for complexity:

<div class="mermaid">
flowchart LR
    M["Moore's Law<br/>2x transistors / ~1.5 yr"] --> C["More silicon"]
    C --> A["Caches"]
    C --> B["Multi-core"]
    C --> D["GPUs"]
    C --> E["Neural engines"]
</div>

CS 315 starts with the **simplest** design: a single-cycle processor.

---

## Single-Cycle vs. Later Designs

| Design | Key Property |
|--------|-------------|
| **Single-cycle** | One complete instruction per clock cycle |
| **Multi-cycle** | Break instruction into stages, reuse hardware |
| **Pipelined** | Overlap stages of multiple instructions |

**Course roadmap:**

```text
Lab 9  →  Lab 10  →  Project 6
(components)  (extend)  (full processor)
```

---

## Single-Cycle Processor: Major Components

<div class="mermaid">
flowchart LR
    PC["PC\nReg"] --> ROM["Inst\nMemory"]
    ROM --> DEC["Inst\nDecoder"]
    ROM --> RDEC["Reg/Imm\nDecoder"]
    RDEC --> RF["Register\nFile"]
    DEC --> RF
    DEC --> ALU
    RF --> ALU["ALU"]
    ALU --> RAM["Data\nMemory"]
    RAM --> RF
    ALU --> RF
    PC --> ADD["PC+4"]
    ADD --> PC
</div>

---

## Component Summary

| Component | Role |
|-----------|------|
| **PC register** (64-bit) | Address of current instruction |
| **+4 adder** | Advance to next instruction |
| **Instruction memory (ROM)** | Stores 32-bit machine code |
| **Instruction decoder** | Produces control lines from IW |
| **Register decoder** | Extracts rs1, rs2, rd fields |
| **Immediate decoder** | Extracts/sign-extends immediates |
| **Register file** | 2 read ports + 1 write port |
| **ALU** | Arithmetic, logic, address math |
| **Data memory (RAM)** | Loads and stores |

---

## One Clock Cycle, Step by Step

<div class="mermaid">
flowchart LR
    A["PC selects\ninstruction"] --> B["Fetch IW\nfrom ROM"]
    B --> C["Decode IW\ncontrol+reg+imm"]
    C --> D["Read regs\nRD0, RD1"]
    D --> E["ALU\ncomputes"]
    E --> F["RAM access\nload/store"]
    F --> G["Write back\non rising edge"]
    G --> H["PC = PC+4\nor branch"]
</div>

<div class="info-box">
All state updates (register writes, memory writes, PC advance) happen on the <strong>rising clock edge</strong>.
</div>

---

## The Clock and "Complete Instruction"

```text
        ____      ____      ____
       |    |    |    |    |    |
  _____|    |____|    |____|    |____
       ^         ^         ^
   complete  complete  complete
   instr 1   instr 2   instr 3
```

- Between edges: signals propagate **combinationally**
- The period must accommodate the **slowest** instruction
- Single-cycle is simple but not fast — every instruction pays for the worst case

---

## The Register File — Specification

- **32 logical registers**: X0–X31 (64 bits each)
- **Two read ports**: read two registers simultaneously
- **One write port**: write one register per cycle
- **X0 is hardwired to 0**: reads always return 0, writes discarded
- Result: **31 physical registers** (X1–X31), X0 wired to constant 0

<div class="highlight-box">
<code>2<sup>5</sup> = 32</code> → selectors are <strong>5 bits</strong> wide<br>
RV64 registers → data buses are <strong>64 bits</strong> wide
</div>

---

## Register File — Interface

| Signal | Width | Meaning |
|--------|-------|---------|
| **RR0** | 5 | Read register 0 — selects output on RD0 |
| **RR1** | 5 | Read register 1 — selects output on RD1 |
| **RD0** | 64 | Value of register named by RR0 |
| **RD1** | 64 | Value of register named by RR1 |
| **WR** | 5 | Write register — destination |
| **WD** | 64 | Write data — value to store |
| **WE** | 1 | Write enable — write only when WE=1 |
| **CLK** | 1 | Writes happen on rising edge |
| **CLR** | 1 | Synchronously reset all registers to 0 |

---

## Why Two Read Ports?

```asm
add a2, a0, a1    # a2 = a0 + a1
```

In one cycle: read **two** sources and write **one** destination.

<div class="mermaid">
flowchart LR
    RR0["RR0 = a0"] --> RF["Register File"]
    RR1["RR1 = a1"] --> RF
    RF --> RD0["RD0 = value of a0"]
    RF --> RD1["RD1 = value of a1"]
    RD0 --> ALU
    RD1 --> ALU
    ALU --> WD["WD = a0 + a1"]
    WD --> RF
    WR["WR = a2, WE=1"] --> RF
</div>

Reads are **combinational**. Writes are **synchronous** (rising edge, WE=1).

---

## Adding CLR to a Digital Register

Digital's built-in register has no CLR input. We **wrap** it:

1. **2-to-1 MUX** on `D`: when CLR=1, load 0; when CLR=0, load real D
2. **OR gate** on `EN`: `EN_in = EN OR CLR` — ensures clear triggers a write

```text
D_in  = CLR ? 0 : D       (MUX)
EN_in = EN OR CLR          (OR gate)
```

| CLR | EN | Register loads on rising edge |
|-----|----|-------------------------------|
| 0 | 0 | holds current value |
| 0 | 1 | D (normal write) |
| 1 | 0 | 0 (cleared) |
| 1 | 1 | 0 (CLR wins) |

---

## CLR Wrapper — Schematic

<div class="mermaid">
flowchart LR
    D["D (data in)"] --> MUX{"MUX\n0: D\n1: 0"}
    Z["0 (constant)"] --> MUX
    CLR["CLR"] --> MUX
    MUX --> RD["D_in"]
    EN["EN"] --> OR{"OR"}
    CLR --> OR
    OR --> RE["EN_in"]
    CLK["CLK"] --> RC["CLK"]
    subgraph REG["Wrapped 64-bit Register"]
        RD
        RC
        RE
        Q["Q (out)"]
    end
    Q --> OUT["output"]
</div>

This wrapped register is used for the **PC** and for every physical register X1–X31.

---

## Register File — Write Path

Only one register written per cycle, only when WE=1.

**5-to-32 decoder with enable:**
- Input: WR (5 bits)
- Enable: WE
- Output: 32 one-hot lines → each drives one register's EN

```text
WR (5) ──┐
          ├─► 5-to-32 Decoder ──► EN_x1 → X1.EN
WE (1) ──┘    with Enable     ──► EN_x2 → X2.EN
                               ──► ...
                               ──► EN_x31 → X31.EN

WD (64) ──────────────────────────► D of every register (shared)
CLK ──────────────────────────────► CLK of every register (shared)
```

---

## Register File — Read Path

Two **independent 32-to-1 MUXes** (64 bits wide each):

<div class="mermaid">
flowchart TD
    subgraph REGS["Physical Registers"]
        X0["X0 = 0 (hardwired)"]
        X1["X1 ... X31"]
    end
    X0 --> M0["RD0 MUX 32:1"]
    X1 --> M0
    X0 --> M1["RD1 MUX 32:1"]
    X1 --> M1
    RR0["RR0 (sel)"] --> M0
    RR1["RR1 (sel)"] --> M1
    M0 --> RD0["RD0 (64)"]
    M1 --> RD1["RD1 (64)"]
</div>

RR0 and RR1 select different registers — both read **simultaneously and combinationally**.

---

## Register File — Full Structure

<div class="mermaid">
flowchart LR
    RR0in["RR0 (5)"] --> RM0["RD0 MUX"]
    RR1in["RR1 (5)"] --> RM1["RD1 MUX"]
    WRin["WR (5)"] --> DEC["Decoder+EN"]
    WEin["WE (1)"] --> DEC
    WDin["WD (64)"] --> BUS["shared D"]
    CLKin["CLK"] --> CLKB["shared CLK"]
    DEC --> REGS["Regs X1..X31"]
    BUS --> REGS
    CLKB --> REGS
    X0c["X0 = 0"] --> RM0
    X0c --> RM1
    REGS --> RM0
    REGS --> RM1
    RM0 --> RD0out["RD0 (64)"]
    RM1 --> RD1out["RD1 (64)"]
</div>

---

## X0: Hardwired Zero

<div class="highlight-box">
X0 is <strong>not</strong> a physical register — its output is the constant 0.
<ul>
<li>No flip-flop, no D input, no enable line</li>
<li>Reads of X0 always return 0</li>
<li>Writes to X0 are silently ignored (decoder never selects it)</li>
</ul>
</div>

**Why?** The RISC-V ISA guarantees X0 = 0. Building a flip-flop for it wastes hardware.

Result: **32 logical registers** but only **31 physical registers**.

---

## Trace: `add a2, a0, a1`

Assume `a0=5` (X10), `a1=3` (X11), dest `a2` (X12), WE=1:

```text
RR0 = 10          RR1 = 11
RD0 = 5           RD1 = 3      ← combinational reads

ALU: A=5, B=3, ALUop=000 (add) → R = 8

WR = 12,  WD = 8,  WE = 1
Decoder asserts EN only on line 12.

X12 latches 8 on the RISING CLOCK EDGE.
```

<div class="info-box">
Reads and ALU computation settle during the cycle. The state change happens <strong>only at the rising edge</strong>.
</div>

---

## The ALU (Preview)

The ALU is **purely combinational** — no clock, no state.

| ALUop | Operation |
|-------|-----------|
| `000` | add |
| `001` | sub |
| `010` | mul |
| `011` | sll (shift left logical) |
| `100` | srl (shift right logical) |

- Inputs: A (64), B (64), ALUop (3)
- Output: R (64)
- Also computes **branch/memory addresses** (address arithmetic = addition)

---

## Data Memory (Preview)

- **RAM** component in Digital
- Holds the **stack** (function params, preserved registers)
- Address from ALU is a **byte address** → shift right by 3 for 64-bit word index

<div class="info-box">
Data memory and the immediate extender are <strong>out of scope for Lab 9</strong>. They arrive in Lab 10 / Project 6.
</div>

---

## Lab 9 Deliverables (due Mon Nov 3)

Build and test core components:

- **64-bit PC register with CLR**
- **Register file**: 32 logical / 31 physical registers, 2 reads + 1 write
- **ALU**: add, sub, mul, sll, srl
- **Dashboard**: splitters, tunnels, probes showing register/PC state

Key autograder filenames:
- `lab09.dig` — inputs: `CLK`, `CLR`, `RR0`, `RR1`, `WR`, `WE`, `ALUSrcB`, `ALUOp`, `Imm`; outputs: `T0`, `T1`
- `alu.dig` — inputs: `A`, `B`, `ALUOp`; output: `R`

---

## Lab 9 Test Programs

```asm
addi t0, t0, 1      # T0 = 1
li   t1, 2          # T1 = 2
addi t0, t0, -1     # T0 = 0xFFFFFFFFFFFFFFFF (-1 in 64-bit)

li   t0, 1          # T0 = 1
li   t1, 1          # T1 = 1
sub  t0, t0, t1     # T0 = 0
```

Drive inputs manually to simulate each instruction.

---

## Project 6 Roadmap (due Nov 17)

Extend Lab 9 into a full **single-cycle RISC-V processor**:

| Part | Component |
|------|-----------|
| Part 1 | PC + instruction memory + decoder |
| Part 2 | Register file + register decoder |
| Part 3 | ALU + data memory + load/store |
| Final | Complete processor with branch control |

<div class="highlight-box">
Incremental submissions required: <code>part1</code>, <code>part2</code>, <code>part3</code>, <code>final</code>
</div>

---

## Key Concepts

| Concept | Key Point |
|---------|-----------|
| **ISA** | SW/HW contract — what instructions do |
| **Microarchitecture** | Implementation — how they are done |
| **Single-cycle** | One instruction per clock cycle |
| **Moore's Law** | 2x transistors / ~1.5 yr |
| **PC register** | 64-bit; advances by PC+4 each cycle |
| **Register file** | 2 read ports (combinational) + 1 write (synchronous) |
| **X0** | Hardwired 0; 32 logical / 31 physical regs |
| **CLR wrapper** | MUX + OR around Digital register |
| **Decoder+EN** | One-hot write select, gated by WE |
| **Read MUX tree** | Two 32:1 MUXes for simultaneous reads |

---

## Summary

1. **ISA is the interface** between software and hardware; microarchitecture is the implementation

2. **Single-cycle processor**: PC → ROM → decode → register file → ALU → RAM, all in one clock period

3. **Rising clock edge** latches all state: register writes, memory writes, PC advance

4. **Register file**: 2 combinational read ports (32:1 MUXes) + 1 synchronous write port (decoder+EN)

5. **X0 is hardwired to 0** — 31 physical registers, not 32

6. **CLR wrapper**: `D_in = CLR ? 0 : D`, `EN_in = EN OR CLR` — used for PC and every register

7. **Lab 9** builds PC register, register file, and ALU — foundation for Project 6's full processor