Skip to content

Lab 02: Getting Started with RISC-V Assembly

Overview

This was a hands-on lab session that introduced RISC-V assembly language programming through the machine code execution model: how a processor fetches, decodes, and executes instructions from memory using registers and the program counter. We walked through the structure of the given Lab02 starter code (a C main, a C implementation, and an assembly implementation linked together by a Makefile), set up GDB for instruction-level debugging, and worked through implementing simple arithmetic functions (add4, mul4) in RISC-V assembly. The session emphasized the workflow of writing assembly, building with make, debugging with GDB, keeping the Git repo clean, and running the autograder against the tests repository.

Learning Objectives

  • Describe the machine code execution model: processor, registers, PC, instruction word, and memory
  • Explain how C source becomes assembly, then object code, then a linked executable
  • Identify the role of the program counter (PC) and why PC = PC + 4 for normal instruction flow
  • Read and write basic three-operand RISC-V instructions (add, addi, mul)
  • Use the RISC-V calling convention: arguments in a0a7, return value in a0
  • Set up and configure GDB for assembly-level debugging on the BeagleV machines
  • Step through assembly with GDB: breakpoints, stepping, examining registers, and return
  • Keep a Git repo clean with make clean, git status, git rm, and run the autograder

Prerequisites

  • Completed Lab01 (Hello World, Makefiles, Git/GitHub, the autograder)
  • Completed Project01 (RISC-V dev environment, C basics, number systems)
  • Access to a RISC-V dev environment (BeagleV machine or qemu-system-riscv64 VM)
  • Basic C programming: functions, arguments, return values
  • Familiarity with the shell and git from the command line

1. The Machine Code Execution Model

Everything we write — C, assembly, anything — ultimately runs as machine code: binary instructions stored in memory. To reason about assembly, you need a mental model of how a processor executes those instructions. This is the central diagram from today's lab.

flowchart LR
    subgraph PROC["PROCESSOR"]
        direction TB
        REGS["Registers<br/>x0 .. x31"]
        PC["PC<br/>(Program Counter)"]
        IW["IW<br/>(Instruction Word)"]
    end

    subgraph MEM["MEMORY"]
        direction TB
        STACK["STACK"]
        DATA["DATA"]
        CODE["CODE<br/>main()<br/>add t0, t1, t2"]
    end

    PC -->|address of next instruction| CODE
    CODE -->|fetch 32-bit instruction| IW
    IW -->|decode + execute| REGS

    style PROC fill:#e8f0ff,stroke:#333,stroke-width:2px
    style MEM fill:#fff0e8,stroke:#333,stroke-width:2px
    style CODE fill:#f9f,stroke:#333,stroke-width:1px

The two big boxes: Processor and Memory

The handwritten notes drew the machine as two large boxes connected by arrows.

PROCESSOR contains:

  • Registers — a small set of fast storage locations named x0 through x31. The processor can only do arithmetic on values that are in registers.
  • PC (Program Counter) — holds the memory address of the next instruction to execute.
  • IW (Instruction Word) — holds the current 32-bit instruction that was just fetched from memory and is being decoded/executed.

MEMORY is one large address space, conventionally divided into regions:

  • STACK — grows downward; holds function call frames, saved registers, and local variables.
  • DATA — globals and other statically allocated data.
  • CODE — the machine code instructions, including main() and the functions it calls. In the notes, the code region contained main() and an instruction add t0, t1, t2, with an arrow showing that each instruction is 32 bits wide.

Why the separation matters

A processor can only operate directly on register values. Memory holds both code and data. If a program needs to operate on data that lives in memory, it must first load that data into a register, compute, and then often store the result back to memory. This load/compute/store cycle is the heart of assembly programming.


2. The Fetch–Decode–Execute Cycle

The processor runs a simple loop forever:

flowchart TD
    A["FETCH<br/>IW = memory[PC]<br/>(read 32-bit instruction)"] --> B["DECODE<br/>figure out the operation<br/>and operands"]
    B --> C["EXECUTE<br/>update register(s)<br/>and/or memory"]
    C --> D["UPDATE PC<br/>PC = PC + 4"]
    D --> A

    style A fill:#e8f0ff,stroke:#333
    style D fill:#ffe8e8,stroke:#333
  1. Fetch — Read the 32-bit instruction at the address in PC into the instruction word IW.
  2. Decode — Determine what operation the bits encode and which registers/values are involved.
  3. Execute — Perform the operation. This usually updates one or more registers, but can also read or write memory.
  4. Update PC — Advance to the next instruction.

Why PC = PC + 4

The notes underlined PC = PC + 4 in red for emphasis. Each base RISC-V instruction is exactly 4 bytes (32 bits), and memory is byte-addressable. To move from one instruction to the next, the processor adds 4 to the program counter.

Address     Instruction
---------   -------------------
0x1000      add  t0, t1, t2     <- PC = 0x1000
0x1004      addi a0, a0, 1      <- after PC = PC + 4
0x1008      mul  a0, a0, a1     <- after another PC = PC + 4
0x100C      ret

PC = PC + 4 is the default. Control instructions (branches and jumps) override this default by setting the PC to a different address — that is how loops, if/else, function calls, and ret work. We will explore those later; in Lab02 the functions are straight-line code.


3. From C to Assembly to Machine Code

A key idea today: all code eventually runs as machine code. There are two paths to get there, and they meet at the object file (.o).

flowchart LR
    C["add4_c.c<br/>(C source)"] -->|"gcc (compile + assemble)"| OC["add4_c.o"]
    S["add4_s.s<br/>(assembly source)"] -->|"as (assemble)"| OS["add4_s.o"]
    M["add4.c<br/>(main)"] -->|gcc| OM["add4.o"]
    OC -->|link| EXE["add4<br/>(executable)"]
    OS -->|link| EXE
    OM -->|link| EXE
    EXE -->|load + run| RUN["machine code in memory"]

    style EXE fill:#f9f,stroke:#333,stroke-width:2px

Compiling vs. assembling

Step Input Tool Output
Compile .c C source gcc .o object code (compiles + assembles)
Assemble .s assembly source as .o object code
Link one or more .o files gcc (or ld) executable

A C compiler actually generates assembly internally, then assembles it to object code. So gcc can do everything in one step. The point is that assembly is a human-readable form of machine code: an assembler (as) translates assembly mnemonics like add into the exact binary the processor decodes.

Building by hand (what the Makefile automates)

# Assemble the assembly implementation to object code
as -o add4_s.o add4_s.s

# Compile the C files and link everything together
gcc -o add4 add4.c add4_c.c add4_s.o

# Or let gcc both assemble and link in one command:
gcc -o add4 add4.c add4_c.c add4_s.s

In Lab02 the provided Makefile runs these steps for you, so you normally just type make.


4. The Lab02 Starter Structure

You are given a Makefile, C files, and assembly files (.s). This same three-file pattern is used for most of our assembly programming. The point of the pattern is to compare a C implementation against your assembly implementation of the same function, so you can check correctness immediately.

For the example program add2 you are given:

add2.c      add2_c.c      add2_s.s
File Role
add2.c The main program — parses arguments, calls both versions, prints results
add2_c.c The C implementation of the function (the reference answer)
add2_s.s The assembly implementation (where you write RISC-V)

When you build and run it:

benson@beagle4:lab02-starter$ ./add2 3 4
C: 7
Asm: 7

The add2 program calls the C version and the assembly version with the same arguments. The goal is to make the two results match. You are also given a full implementation of mul2 to study as a worked example.

What main does (conceptually)

// add2.c (sketch of the given main)
#include <stdio.h>
#include <stdlib.h>

int add2_c(int a, int b);   // prototype for the C version
int add2_s(int a, int b);   // prototype for the assembly version

int main(int argc, char *argv[]) {
    int a = atoi(argv[1]);
    int b = atoi(argv[2]);
    printf("C:   %d\n", add2_c(a, b));
    printf("Asm: %d\n", add2_s(a, b));
    return 0;
}

The two prototypes are critical. They tell the C compiler the names and types of the functions implemented elsewhere. The linker then matches these names to the symbols in add2_c.o and add2_s.o. This is why naming conventions matter: the label in your assembly file must exactly match the name used in main, and it must be made visible with .global.


5. Anatomy of a RISC-V Instruction

Most RISC-V instructions have three operands: one destination (target) and two sources.

add  t0, t1, t2     # t0 = t1 + t2
     ^   ^   ^
     |   |   +--- src2  (second source register)
     |   +------- src1  (first source register)
     +----------- dst   (destination register)

This is exactly the instruction drawn in the code region of the execution-model diagram: add t0, t1, t2. Read it as an assignment: destination = source1 OP source2.

Instruction parts

Part Example Meaning
Mnemonic add, mul, addi The operation name
Destination t0 Register that receives the result
Source 1 t1 First input register
Source 2 t2 (or an immediate) Second input register or constant
Comment # t0 = t1 + t2 Everything after # is ignored

Immediates

Some instructions take a constant (an immediate) instead of a second register. By convention the mnemonic ends in i:

addi t0, t1, 10     # t0 = t1 + 10   (immediate constant)
li   t0, 9          # t0 = 9         (pseudo-instruction)
addi t0, zero, 9    # t0 = 0 + 9 = 9 (what li expands to)

li t0, 9 and addi t0, zero, 9 do the same thing. The zero register (x0) is hardwired to 0, which makes it handy for loading constants and copying values.


6. Registers and the Calling Convention

A RISC-V processor has 32 registers (x0x31) plus the PC. In RV64 each register is 64 bits (8 bytes) wide. We almost always use the ABI names, which describe each register's conventional role.

Hardware ABI Name Role
x0 zero Always 0 (writes ignored)
x1 ra Return address
x2 sp Stack pointer
x5–x7, x28–x31 t0t6 Temporaries (caller-saved)
x10–x17 a0a7 Function arguments / return value (caller-saved)
x8–x9, x18–x27 s0s11 Saved registers (callee-saved)

The rules you need for Lab02

  • Arguments are passed in a0, a1, a2, a3, ... (up to a7).
  • The return value is placed in a0.
  • Temporaries t0t6 are free scratch space inside a leaf function — no need to save them.
  • A leaf function (one that does not call any other function) does not need to touch the stack.

Lab02 functions are all leaf functions

add2, mul2, add4, and mul4 only do arithmetic and return. They make no further function calls, so you only need argument registers (a0a3) and temporaries (t0t6). You do not need to allocate stack space, save ra, or save any s registers. That work comes later in Project02.

Mapping the function signature to registers

For int add4(int a, int b, int c, int d):

a0 = a    (first argument)
a1 = b    (second argument)
a2 = c    (third argument)
a3 = d    (fourth argument)
a0 = return value (written before ret)

7. Writing Your First Assembly Function

A basic assembly function needs three things: a .global directive (so the linker can find it), a label matching the function name, and a ret at the end.

.global add2_s          # make the symbol visible to the linker
add2_s:                 # label = function entry point
    add  a0, a0, a1     # a0 = a0 + a1  (return value in a0)
    ret                 # return to caller (jumps to address in ra)

Walk through it against the execution model:

  1. The caller (main) puts a in a0 and b in a1, then calls add2_s (which sets ra to the return address and sets PC to the add2_s label).
  2. add a0, a0, a1 computes the sum and leaves it in a0.
  3. ret sets PC back to the address in ra, returning control to main. The result is already in a0, where main expects the return value.

Worked example: the given mul2

You are given mul2 as a complete example to study:

.global mul2_s
mul2_s:
    mul  a0, a0, a1     # a0 = a0 * a1
    ret

Identical structure to add2_s, just a different operation. Study this pattern — add4 and mul4 are extensions of it.


8. Lab02 Requirements: add4 and mul4

The lab asks you to write RISC-V implementations of two arithmetic functions. You are given the C implementations; you write the assembly.

add4 — add four 32-bit integers

// add4_c.c (given reference implementation)
int add4_c(int a, int b, int c, int d) {
    return a + b + c + d;
}
$ ./add4 1 2 3 4
C: 10
Asm: 10

Your assembly accumulates the four arguments. Because add is a three-operand instruction, chain the additions, reusing a0 as the running total:

.global add4_s
add4_s:
    add  a0, a0, a1     # a0 = a + b
    add  a0, a0, a2     # a0 = (a + b) + c
    add  a0, a0, a3     # a0 = (a + b + c) + d
    ret                 # return a0

mul4 — multiply four 32-bit integers

// mul4_c.c (given reference implementation)
int mul4_c(int a, int b, int c, int d) {
    return a * b * c * d;
}
$ ./mul4 1 2 3 4
C: 24
Asm: 24
.global mul4_s
mul4_s:
    mul  a0, a0, a1     # a0 = a * b
    mul  a0, a0, a2     # a0 = (a * b) * c
    mul  a0, a0, a3     # a0 = (a * b * c) * d
    ret                 # return a0

Requirement checklist

  1. Write RISC-V assembly implementations for the given arithmetic problems (you are given the C versions).
  2. The executable must be compiled with a Makefile (run make).
  3. Before committing, run make clean so you do not add build products (executables, .o files).
  4. The labs are graded with the autograder against the tests repository.

9. GDB Setup for Assembly Debugging

On the BeagleV machines you can install a GDB init file to get a nice text UI for assembly debugging. The file goes at ~/.config/gdb/gdbinit. Create any missing directories with mkdir -p.

$ cd
$ mkdir -p ~/.config/gdb
$ cd .config/gdb
$ cat > gdbinit
set auto-load safe-path /
set debuginfod enabled off
tui new-layout asm {-horizontal src 1 regs 1} 2 status 0 cmd 1
tui enable
layout asm
^d

^d means press CTRL-d to close the file you are typing into cat.

Common setup mistakes

  • No ~/.config directory yetmkdir -p ~/.config/gdb creates the whole path at once.
  • Typo -2E / -2e — the instructor noted to avoid a -2E flag/typo in configuration; copy the gdbinit exactly as shown.
  • Wrong file name — it must be gdbinit (no dot) inside ~/.config/gdb, not .gdbinit in that folder.

This layout gives you three panes: the source/disassembly view, the registers view, and the command line. The registers view is the most valuable part for assembly work — you watch values change as you step.


10. Debugging Assembly with GDB

GDB lets you stop the fetch–decode–execute cycle at any instruction and inspect the machine state. The core workflow:

gdb ./add4          # start gdb on your executable

Essential commands

Command Short What it does
break add4_s b add4_s Set a breakpoint at the function label
run 1 2 3 4 r 1 2 3 4 Run with command-line arguments
stepi si Step one instruction (into calls)
nexti ni Step one instruction (over calls)
info registers i r Show all registers
info registers a0 i r a0 Show one register
print $a0 p $a0 Print a register's value
continue c Run until the next breakpoint
finish / return Run out of (step out of) the current function
quit q Exit GDB

A typical debugging session for add4

(gdb) break add4_s
Breakpoint 1 at 0x...: file add4_s.s
(gdb) run 1 2 3 4
Breakpoint 1, add4_s () at add4_s.s:3
(gdb) info registers a0 a1 a2 a3
a0  0x1   1
a1  0x2   2
a2  0x3   3
a3  0x4   4
(gdb) stepi                 # execute: add a0, a0, a1
(gdb) print $a0             # a0 should now be 3
$1 = 3
(gdb) stepi                 # execute: add a0, a0, a2
(gdb) print $a0             # a0 should now be 6
$2 = 6
(gdb) stepi                 # execute: add a0, a0, a3
(gdb) print $a0             # a0 should now be 10
$3 = 10
(gdb) continue

Use breakpoints strategically (set them at the function you care about), step instruction-by-instruction with stepi, and use return/finish to step out once you have seen what you need. Watching a0 accumulate is the fastest way to confirm your arithmetic is correct and your registers are right.


11. Git Hygiene and Running the Autograder

A recurring theme today: keep your repository clean. Do not commit build artifacts (executables, .o files). Always make clean first.

make clean              # delete executables and .o build products
git status              # verify only source files are listed
git rm --cached add4    # if a build product was already tracked, untrack it
git commit -a -m "Lab02: add4 and mul4 in assembly"
git push
flowchart TD
    A["Edit add4_s.s / mul4_s.s"] --> B["make"]
    B --> C{"C and Asm<br/>results match?"}
    C -->|No| D["gdb ./add4<br/>step + inspect a0"]
    D --> A
    C -->|Yes| E["make clean"]
    E --> F["git status<br/>(only sources?)"]
    F --> G["git commit -a"]
    G --> H["git push"]
    H --> I["Run autograder<br/>vs tests repo"]

    style C fill:#fff0c0,stroke:#333
    style I fill:#d0f0d0,stroke:#333

The autograder and tests repo

  • Get the test cases by running git pull in the tests repository (https://github.com/USF-CS315-F25/tests).
  • The autograder lives at https://github.com/phpeterson-usf/autograder.
  • Your lab receives the score the autograder reports.

Tests repo access issues

If the autograder cannot find the test directory: verify the test directory path (use tab completion to confirm it exists), re-add/re-clone the tests repo if needed, and make sure your GitHub email registration matches so the repository is visible to you. If you are still stuck, bring it to office hours.


Key Concepts

Concept Definition Example
Machine code Binary instructions the processor executes directly A 32-bit encoding of add t0, t1, t2
Register Fast storage inside the processor; the only place arithmetic happens a0, t0, x5
PC (program counter) Address of the next instruction to fetch PC = PC + 4 for sequential code
IW (instruction word) The current 32-bit instruction being decoded/executed Fetched from the CODE region
Fetch–decode–execute The processor's core loop Read IW from memory[PC], run it, advance PC
Assembling Translating .s assembly to .o object code as -o add4_s.o add4_s.s
Compiling Translating .c C to .o object code gcc -c add4_c.c
Linking Combining .o files into one executable gcc -o add4 add4.c add4_c.c add4_s.o
Three-operand form op dst, src1, src2 meaning dst = src1 OP src2 add a0, a0, a1
Calling convention Args in a0a7, return value in a0 a0=a, a1=b, ... return in a0
Leaf function A function that calls no other function add4_s (no stack needed)
.global Directive exposing a label to the linker .global add4_s

Practice Problems

Problem 1: PC arithmetic

A function's first instruction is at address 0x2000. Each instruction is one base RISC-V instruction. What is the address of the fourth instruction, assuming no branches or jumps?

Click to reveal solution Each base RISC-V instruction is **4 bytes**, and sequential execution does `PC = PC + 4`.
1st instruction: 0x2000
2nd instruction: 0x2004   (0x2000 + 4)
3rd instruction: 0x2008   (0x2004 + 4)
4th instruction: 0x200C   (0x2008 + 4)
The fourth instruction is at **`0x200C`** (which is `0x2000 + 12`, i.e. `0x2000 + 3*4`).

Problem 2: Translate C to assembly

Write a RISC-V assembly implementation add3_s for:

int add3_c(int a, int b, int c) {
    return a + b + c;
}
Click to reveal solution Arguments arrive in `a0`, `a1`, `a2`; the return value goes in `a0`.
.global add3_s
add3_s:
    add  a0, a0, a1     # a0 = a + b
    add  a0, a0, a2     # a0 = (a + b) + c
    ret                 # return a0
Each `add` is three-operand (`dst, src1, src2`), and reusing `a0` as the running total avoids needing any temporaries. Because it is a leaf function, no stack work is required.

Problem 3: Fix the bug

A student wrote this for mul4 but ./mul4 1 2 3 4 prints Asm: 6 instead of 24. Find and fix the bug.

.global mul4_s
mul4_s:
    mul  a0, a0, a1
    mul  a0, a0, a2
    ret
Click to reveal solution The function only multiplies **three** of the four arguments (`a0 * a1 * a2 = 1*2*3 = 6`). It never multiplies in the fourth argument `a3`. Add the missing instruction:
.global mul4_s
mul4_s:
    mul  a0, a0, a1     # a0 = a * b
    mul  a0, a0, a2     # a0 = (a*b) * c
    mul  a0, a0, a3     # a0 = (a*b*c) * d   <-- the missing instruction
    ret
Now `1*2*3*4 = 24`. This is a classic "forgot the last operand" bug — GDB makes it obvious: step through and you will see `a0` stop changing at `6` instead of reaching `24`.

Problem 4: Read the registers

You stop in GDB at the start of add4_s and info registers shows a0=5, a1=10, a2=2, a3=1. After three stepi commands through the three add instructions in the worked example, what is in a0?

Click to reveal solution
start:   a0 = 5
add a0, a0, a1:  a0 = 5 + 10 = 15
add a0, a0, a2:  a0 = 15 + 2 = 17
add a0, a0, a3:  a0 = 17 + 1 = 18
`a0` is **18**. The function returns `5 + 10 + 2 + 1 = 18`, which matches the C reference. Confirm with `print $a0` after the third `stepi`.

Problem 5: Why does the .global matter?

You write add4_s but forget the .global add4_s line. The build fails at the link step with an "undefined reference to add4_s" error. Explain why, and how .global fixes it.

Click to reveal solution The `main` program (`add4.c`) declares a prototype `int add4_s(int, int, int, int);` and calls it. When the compiler produces `add4.o`, the call site references the symbol `add4_s` but does not contain its code. The **linker** is responsible for resolving that reference by finding `add4_s` in another object file. Without `.global add4_s`, the label `add4_s` in your assembly file is **local** to that object file — the linker cannot see it, so it reports "undefined reference." The `.global add4_s` directive exports the symbol so the linker can match the call in `add4.o` to the code in `add4_s.o`.
.global add4_s      # without this line, the linker cannot find add4_s
add4_s:
    ...
    ret

Problem 6: Build the right way

You finished add4 and mul4. List the exact commands, in order, to build, verify cleanliness, and commit without adding build products.

Click to reveal solution
make                # build add4 and mul4
./add4 1 2 3 4      # quick check: C and Asm both print 10
./mul4 1 2 3 4      # quick check: C and Asm both print 24
make clean          # remove executables and .o files
git status          # confirm only .c / .s / Makefile are listed
git add -A          # (or add specific source files)
git commit -m "Lab02: implement add4 and mul4 in RISC-V assembly"
git push
The critical step is `make clean` **before** `git add`. If a build product (like the `add4` executable or `add4_s.o`) was already committed earlier, untrack it with `git rm --cached add4` and commit the removal. Then `git status` should show no `.o` files or executables.

Further Reading


Summary

  1. The machine code execution model has two parts: a processor (registers x0x31, PC, instruction word IW) and memory (stack, data, and code). The processor can only compute on register values.

  2. The processor runs a fetch–decode–execute loop: read the 32-bit instruction at memory[PC] into IW, decode and execute it (updating registers and/or memory), then advance with PC = PC + 4 for normal sequential code.

  3. All code becomes machine code. C is compiled (gcc) and assembly is assembled (as) into object files (.o), which are then linked into one executable. Assembly is just a human-readable form of machine code.

  4. The Lab02 starter uses a three-file pattern (X.c main, X_c.c C version, X_s.s assembly version) so you can directly compare your assembly result against the C reference — they must print the same value.

  5. Most RISC-V instructions are three-operand: op dst, src1, src2 means dst = src1 OP src2. Arguments arrive in a0a7, and the return value goes in a0.

  6. add4 and mul4 are leaf functions — chain add/mul into a0 and ret. No stack, no .global omissions: each function needs .global name, a matching label, and ret.

  7. GDB is the tool for assembly debugging. Set up ~/.config/gdb/gdbinit for the TUI layout, then use break, run, stepi, and info registers/print $a0 to watch values change one instruction at a time.

  8. Keep the repo clean and verify with the autograder. Always make clean before committing so build products are never tracked, then git pull the tests repo and run the autograder for your score.