Meeting Summary: CS 315-01 Lecture/Lab — Fall 2025¶

Quick Recap¶

The session covered debugging strategies for Project 4 and Lab 6, with demonstrations of custom test programs, GDB, and code instrumentation.
Greg explained cache memory fundamentals, focusing on direct-mapped caching, address mapping, and the roles of tags and validity bits.
The discussion compared direct-mapped, fully associative, and set-associative caches.
Implementation guidance for Project 4 included extending block size and adding LRU for set-associative caches.
Demonstrations showed how increasing block size can improve cache performance due to spatial locality.

Students: Understand the provided direct-mapped cache (block size = 1) in the starter code.
Students: Extend the direct-mapped cache to support a 4-word block size for Project 4.
Students: Implement a set-associative cache with an LRU replacement policy for misses.
Students: Review the cache memory guide to reinforce lecture concepts.
Students: Test cache implementations with different configurations to evaluate performance.
Greg: Fix formatting issues in the online cache guide.

Greg emphasized incremental development and careful inspection of code.
Recommended approaches:
Create targeted custom test programs.
Use GDB for step-by-step debugging (set breakpoints, inspect registers, trace instruction execution).
Instrument code with print statements to observe control flow and state.
Students are encouraged to become comfortable with these tools to debug effectively.

The cache serves as fast static RAM between the CPU and main memory.
Key concepts:
Cache hits and misses; performance measured via hit/miss rates.
Cache types: direct-mapped (focus of Project 4), fully associative, and set-associative.
Direct-mapped cache components:
Tag, data, and a valid bit.
Tags uniquely identify the memory block; valid bits indicate if a slot contains usable data.

Addresses are 64-bit, but the cache stores 32-bit words; addresses are word-aligned (multiples of 4 bytes).
Word index = byte address / 4.
In a direct-mapped cache, each word maps to exactly one slot:
Slot index = word index mod number_of_slots (e.g., with 4 slots, index ∈ {0,1,2,3}).
Bit-level view:
Shift the byte address right by 2 to get the word address.
Use low bits of the word address for the slot index (mask depends on number_of_slots).
Remaining higher bits form the tag.
Checking a slot:
If valid and tags match: hit.
Otherwise: miss; load data from memory and update tag and valid bit.
As the number of slots increases, the tag shrinks accordingly because more index bits are used.

The slot index can be computed by:
Modulus on the word index: slot = word_index % number_of_slots.
Bit masking when number_of_slots is a power of two (e.g., slot = word_index & (number_of_slots - 1)).
Extracting indices from a byte address:
Right-shift by 2 to get the word address, then apply the mask.

Programs often access adjacent memory locations (spatial locality).
Caches exploit this by transferring multi-word blocks to reduce average latency.
Although the first access (miss) incurs a startup cost, fetching a block amortizes the cost across subsequent nearby accesses.

With block size > 1 word, the address decomposes into:
Tag | Index | Word offset (to select a word within a block).
Project 4 extends the direct-mapped cache to a 4-word block size:
Requires adding a word offset and storing multiple words per slot.
The cache must support both direct-mapped and set-associative modes, configured via an enum in RVEMU.h.

Direct-mapped: simple, fast, but less flexible (one slot per index).
Fully associative: any block can go anywhere; best flexibility but high complexity—rare for general instruction/data caches.
Set-associative: compromise approach; the index selects a set, and the block can occupy any “way” within that set.
The provided simulator focuses on instruction memory references (no data cache simulation).

Address breakdown:
Tag | Set index | Word offset (for block size > 1).
Lookup steps:
Compute set index and tag from the address.
Search all ways in the set for a valid entry with a matching tag.
On hit: return data; update recency metadata.
On miss: choose a replacement way—prefer an invalid (unused) way; otherwise, evict the Least Recently Used.
LRU policy:
Track recency using a timestamp or a “reference counter” proxy updated on each memory reference.
Handling fills:
Load the entire block from memory, set tag/valid, initialize per-word data, and update LRU metadata.

Students should connect the provided pseudocode and data structures to the starter code to understand slot organization and tagging.
Observed effect of block size:
Increasing block size from 1 to 4 improved hit ratio from about 62% to 85%, demonstrating spatial locality benefits.
Students are encouraged to experiment with configurations, consult the guide, and ask questions during office hours.