RISC-V Assembly Part 1: Basics, Instructions, and Registers¶
Overview¶
This session bridges two topics. First we finish the practical mechanics of Project 1: how to structure a multi-file C program with a Makefile, separate compilation, and shared helper code split into numhelpers.c and numhelpers.h. We then transition to the heart of the course, RISC-V assembly language. We introduce assembly as a human-readable form of machine code, dissect the anatomy of a single instruction (add t0, t1, t2), and meet the RISC-V register file: 32 registers, each 64 bits wide, named x0–x31 with friendlier ABI names like a0, t0, and zero. These fundamentals set up Lab 02, where you write your first assembly functions.
Learning Objectives¶
- Structure a multi-file C project using separate compilation and a
Makefile - Factor shared code into a
.csource file with a matching.hheader containing prototypes - Explain the C build pipeline: source → object files → link → executable
- Describe what assembly language is and how it relates to machine code and the processor
- Identify the parts of a RISC-V instruction: mnemonic, destination operand, and source operands
- Name the 32 RISC-V registers by both their numeric (
x0–x31) and ABI (a0,t0,zero, ...) names - Explain the role of the
zeroregister and the difference betweenaddandaddi - Understand that RV64 registers are 64 bits wide even when manipulating 32-bit values
Prerequisites¶
- C programming basics: functions,
printf,struct,bool, pointers (Project 1) - Familiarity with the command line and
gcc - Number systems: decimal, binary, and hexadecimal (Project 1)
- Access to the RISC-V development environment (BeagleV machines or local VM)
1. Where We Are: Finishing Project 1¶
Before assembly, we tie off the build-system mechanics needed for Project 1. The assignment asks for two executables:
numinfo— given a string, report whether it is a valid decimal (int), binary (bin), and/or hexadecimal (hex) value.numconv— convert a number between bases 2, 10, and 16.
Both programs share logic. For example, numinfo needs is_dec_digit() to classify characters, and numconv needs similar digit-handling code. Rather than copy-paste that code into both programs (redundancy is a code-quality deduction), we factor the common helpers into a third file that both programs compile against.
flowchart TD
A["numinfo.c<br/>(main + helpers it owns)"] -->|"#include numhelpers.h"| C["numhelpers.c<br/>is_dec_digit()<br/>is_bin_digit()<br/>is_hex_digit()<br/>..."]
B["numconv.c<br/>(main + helpers it owns)"] -->|"#include numhelpers.h"| C
C -.->|"declared in"| H["numhelpers.h<br/>(prototypes)"]
A -.-> H
B -.-> H
The key idea: shared functions live in one place (numhelpers.c), their prototypes live in a header (numhelpers.h), and each main program includes that header so the compiler knows the function signatures.
2. Separate Compilation and Headers¶
Why split into multiple files?¶
A single giant .c file is hard to read, hard to test, and forces a full rebuild every time you touch anything. Separate compilation lets us:
- Reuse code (
numhelpers.cserves bothnuminfoandnumconv) - Organize code by responsibility
- Recompile only what changed (when wired up with a
Makefile)
The header file holds prototypes¶
A C prototype (function declaration) tells the compiler a function's name, return type, and parameter types — without the body. When numinfo.c calls is_dec_digit('7'), the compiler must already know that is_dec_digit takes a char and returns a bool. The prototype provides that information.
Create numhelpers.h:
#ifndef NUMHELPERS_H
#define NUMHELPERS_H
#include <stdbool.h>
// Prototypes: type signatures only, no bodies.
bool is_dec_digit(char c);
bool is_bin_digit(char c);
bool is_hex_digit(char c);
bool is_dec_str(char *s);
bool is_bin_str(char *s);
bool is_hex_str(char *s);
#endif // NUMHELPERS_H
The #ifndef / #define / #endif trio is an include guard. It prevents the header's contents from being processed twice if it gets included through multiple paths, which would cause duplicate-definition errors.
Put the actual function bodies (definitions) in numhelpers.c:
#include <stdbool.h>
#include "numhelpers.h"
bool is_dec_digit(char c) {
return c >= '0' && c <= '9';
}
bool is_bin_digit(char c) {
return c == '0' || c == '1';
}
bool is_hex_digit(char c) {
return is_dec_digit(c)
|| (c >= 'a' && c <= 'f')
|| (c >= 'A' && c <= 'F');
}
// ... is_dec_str, is_bin_str, is_hex_str ...
Including the header in each program¶
At the top of both numinfo.c and numconv.c, include the standard library headers you need, then your own header. Note the difference in bracket style:
#include <stdio.h> // angle brackets: system/standard headers
#include <stdbool.h>
// ...
#include "numhelpers.h" // double quotes: your own project headers
<stdio.h>uses angle brackets — the compiler searches the standard system include directories."numhelpers.h"uses double quotes — the compiler searches your project directory first.
Compiling and linking¶
Once the files are in place, you can build each executable by handing all the needed .c files to gcc at once:
Each command compiles the listed source files and links them into a single executable. numhelpers.c appears in both because both programs depend on those helpers.
3. The C Build Pipeline: Compiling, Assembling, and Linking¶
Even a one-file program is more than "compile and run." Understanding the pipeline now pays off when we mix C and assembly later.
flowchart LR
A["first.c<br/>(C source)"] -->|"compile + assemble<br/>gcc"| B["first.o<br/>(object code)"]
S["startup code<br/>(C runtime, libc)"] --> L["Linker"]
B --> L
L --> E["first<br/>(executable)"]
When you run gcc -o first first.c, several stages happen behind the scenes:
- Preprocess — expand
#includeand#define. - Compile — translate C into assembly.
- Assemble — translate assembly into machine code (an object file,
.o). - Link — combine your object file(s) with startup code and the C library into a runnable executable.
The "startup" piece is important: your main is not actually the first code to run. The C runtime startup code sets up the stack, prepares argc/argv, calls main, and then turns main's return value into the process exit status.
Two paths to the same machine code¶
This course will write programs partly in C and partly in assembly. Consider two functions that do the same thing — one written in C (first_c.c) and one written in assembly (first_s.s):
first_c.c first_s.s
+-------------+ +-------------+
| main() { | | main: |
| ... | | ... |
| ... | | ... |
| } | | |
+-------------+ +-------------+
(C source) (assembly source)
A C compiler turns C into machine code; an assembler turns assembly into machine code. They meet at the same place — object files — which the linker stitches together. The naming convention we use throughout the course:
| Suffix | Meaning | Example |
|---|---|---|
.c |
C source | add2_c.c (C implementation) |
.s |
Assembly source | add2_s.s (assembly implementation) |
.o |
Object code (machine code, not yet linked) | add2_s.o |
| (none) | Linked executable | add2 |
In Lab 02 you'll get a Makefile, a main C file, a C implementation, and an assembly stub. The main program calls both the C version and the assembly version with the same arguments so you can confirm they produce identical results:
4. What Is Assembly Language?¶
Assembly language is a human-readable form of machine code (machine language).
A processor ultimately executes only machine code — raw binary values. Those binary patterns are nearly impossible for humans to read or write directly. Assembly language gives each machine instruction a readable textual form. There is (almost) a one-to-one correspondence between an assembly instruction and a machine-code instruction.
flowchart LR
A["Assembly<br/>add t0, t1, t2"] -->|"assembler (as)"| B["Machine code<br/>0x006283B3"]
B -->|"loaded + executed by"| C["Processor"]
- We use an assembler (
as) to translate assembly into machine code, just as a compiler (gcc) translates C into machine code. - The processor reads machine code from memory, decodes it, and executes it.
The processor's core elements¶
When reasoning about assembly we keep three things in mind:
- Registers — a small set of very fast storage locations inside the processor. The processor can only do arithmetic directly on values in registers.
- Memory — holds both code (machine instructions) and data (globals, the stack, the heap). Memory is large but slower than registers.
- Instructions — the operations the processor knows how to perform.
The basic execution loop: the processor loads an instruction from memory, decodes it, and executes it. To compute on data that lives in memory, you must first load it into a register, operate on it there, and (often) store the result back to memory. This is called a load/store architecture.
RISC-V¶
RISC-V (pronounced "risk-five") is an open-standard instruction set architecture (ISA). Unlike proprietary ISAs such as x86 (Intel/AMD) or ARM, anyone can implement a RISC-V processor without licensing fees. It is a simple, clean, modular ISA built on reduced-instruction-set computer (RISC) principles, which makes it ideal for learning. This course uses the 64-bit variant with the multiply/divide extension (RV64IM).
5. Anatomy of an Instruction¶
RISC-V uses a strict, regular instruction format. Most instructions name an operation and then a destination followed by sources:
Consider the canonical example:
operands
/--------\
add t0, t1, t2
| | | |
| | | +-- source operand / register (src2)
| | +------ source operand / register (src1)
| +---------- destination operand / register (dst)
+--------------- instruction name (mnemonic)
Breaking it down:
addis the mnemonic — the instruction name. A mnemonic is a short, memorable word standing in for a machine operation.t0is the destination operand (a register). The result is written here.t1andt2are the source operands (registers). Their values are read as inputs.
So add t0, t1, t2 means:
The destination comes first, then the sources. This ordering mirrors an assignment statement: the thing being assigned is on the left.
Comments and syntax discipline¶
Comments begin with # and run to end of line:
RISC-V assembly is strict about formatting. Operands are separated by commas, and the instruction expects exactly the right number and type of operands. Getting the order wrong (sources before destination) is a common beginner mistake — the assembler will not warn you that you meant something else; it just encodes what you wrote.
6. The RISC-V Register File¶
RISC-V 64-bit: 32 registers, where each register holds a 64-bit value.
The registers are the processor's working storage. There are exactly 32 general-purpose registers, numbered x0 through x31. In the 64-bit variant (RV64), each one is 64 bits (8 bytes) wide.
x0 x1 x2 ... x31
+----+----+----+ +----+
| 64 | 64 | 64 | ... | 64 | bits each
+----+----+----+ +----+
Numeric names vs. ABI names¶
Every register has two names:
- A numeric name:
x0,x1, ...,x31. - An ABI name that describes its conventional role:
a0,a1, ...,t0,t1, ...,zero,sp,ra, and so on.
The ABI (Application Binary Interface) names are far more common in practice because they document intent. Writing a0 instead of x10 tells the reader "this holds the first argument / return value." The two names refer to the exact same physical register.
| Register | ABI Name | Conventional Use |
|---|---|---|
x0 |
zero |
Hardwired constant 0 |
x1 |
ra |
Return address |
x2 |
sp |
Stack pointer |
x3 |
gp |
Global pointer |
x4 |
tp |
Thread pointer |
x5–x7 |
t0–t2 |
Temporaries |
x8 |
s0 / fp |
Saved register / frame pointer |
x9 |
s1 |
Saved register |
x10–x11 |
a0–a1 |
Arguments / return values |
x12–x17 |
a2–a7 |
Arguments |
x18–x27 |
s2–s11 |
Saved registers |
x28–x31 |
t3–t6 |
Temporaries |
For now, the registers you will use most are:
a0–a7— function arguments.a0also carries the return value.t0–t6— temporary scratch registers for intermediate calculations.zero— always reads as 0.
The zero register¶
Register x0 (zero) is special: it always reads as 0, and writes to it are discarded. This sounds useless but is surprisingly handy because it lets one instruction stand in for several operations:
addi a0, zero, 5 # a0 = 0 + 5 -> load the constant 5
add a0, a1, zero # a0 = a1 + 0 -> copy a1 into a0
sub a0, zero, a1 # a0 = 0 - a1 -> negate a1
Having a hardwired zero means the ISA does not need separate "load constant," "move register," or "negate" instructions — they all fall out of arithmetic with zero.
32-bit values in 64-bit registers¶
The machine uses 64-bit registers, but a great deal of our work is on 32-bit values (C int, uint32_t). That is fine: a 32-bit value simply occupies the low 32 bits of a 64-bit register. There are 32-bit-aware instructions (for example addw, lw) when the distinction matters, but conceptually you can manipulate 32-bit quantities inside the wider registers without difficulty. We will return to this when it affects sign extension and overflow.
7. Loading Constants: add vs addi¶
A frequent need is putting a constant value into a register. RISC-V distinguishes between operating on two registers and operating on a register plus a small constant baked into the instruction (an immediate).
add — register + register¶
addi — register + immediate¶
The i suffix means immediate. The constant is encoded directly inside the instruction word. RISC-V immediates are 12-bit signed values, so they range from −2048 to 2047.
Loading a constant with addi and zero¶
Combining addi with the zero register loads a literal value:
The assembler also provides the pseudo-instruction li ("load immediate") that does the same thing more readably:
These two forms are equivalent — li is just sugar. A pseudo-instruction is an assembler convenience that expands into one or more real instructions; we will see more of them in later sessions.
| Form | What it does | Notes |
|---|---|---|
add rd, rs1, rs2 |
rd = rs1 + rs2 |
Both operands are registers |
addi rd, rs1, imm |
rd = rs1 + imm |
imm is a 12-bit signed constant |
addi rd, zero, imm |
rd = imm |
Idiom for loading a constant |
li rd, imm |
rd = imm |
Pseudo-instruction; same as above |
There is no subi
RISC-V has no subtract-immediate instruction. To subtract a constant, add a negative immediate: addi t0, t0, -1 subtracts 1.
8. A First Assembly Function¶
We can now read and write a complete, simple function. Functions in assembly are marked by a label (a name followed by a colon) and made visible to the linker with the .global directive.
Consider this C function:
The equivalent RISC-V assembly:
.global add2_s
# Arguments
# a0 - int a
# a1 - int b
# Return value goes in a0
add2_s:
add a0, a0, a1 # a0 = a + b
ret
Reading this top to bottom:
.global add2_s— a directive that exports the labeladd2_sso other files (like the Cmain) can call it.add2_s:— a label naming this location in code; callingadd2_sjumps here.add a0, a0, a1— adds the first two arguments. The result lands ina0.ret— returns to the caller.
The calling convention in miniature¶
This works because of the calling convention everyone agrees on:
- Arguments arrive in
a0,a1,a2, ... in order. The first argumentais ina0, the secondbis ina1. - The return value is placed in
a0.
Since the sum needs to end up in a0 anyway, writing the result of a + b directly into a0 both computes the answer and puts it in the return register in one step. That is why add a0, a0, a1 is the entire body.
sequenceDiagram
participant C as C main()
participant ASM as add2_s
C->>ASM: a0 = 3, a1 = 4, call add2_s
Note over ASM: add a0, a0, a1 -> a0 = 7
ASM-->>C: return (a0 = 7)
Note over C: prints "Asm: 7"
Return values differ from C¶
In C, return a + b; is a single statement that hides where the value goes. In assembly, the return value is a convention: it is simply whatever happens to be in a0 when ret executes. Nothing forces you to put a meaningful value there — it is your responsibility to ensure a0 holds the intended result before returning.
9. Putting It Together: From C to a Running Program¶
Tying the build pipeline (Section 3) to assembly (Section 8): in Lab 02 the same logical function exists in both C and assembly, and a main program calls both.
// add2.c (the main driver)
#include <stdio.h>
#include <stdlib.h>
int add2_c(int a, int b); // C implementation (in add2_c.c)
int add2_s(int a, int b); // assembly implementation (in add2_s.s)
int main(int argc, char *argv[]) {
int a = atoi(argv[1]);
int b = atoi(argv[2]);
printf("C: %d\n", add2_c(a, b));
printf("Asm: %d\n", add2_s(a, b));
return 0;
}
The build assembles the .s file, compiles the .c files, and links everything together. By hand (the Makefile does this for you):
as -o add2_s.o add2_s.s # assemble the assembly file
gcc -o add2 add2.c add2_c.c add2_s.o # compile C + link in the object file
gcc can also do the assembling step itself if you hand it the .s file directly:
Running it confirms both implementations agree:
This C-calls-assembly pattern is how we will validate every assembly function we write: the C version is the reference, and your assembly must match it.
Key Concepts¶
| Concept | Definition | Example |
|---|---|---|
| Separate compilation | Splitting a program across multiple source files compiled independently | numinfo.c + numhelpers.c |
| Header file | A .h file holding prototypes shared across .c files |
numhelpers.h |
| Prototype | A function declaration (signature only, no body) | bool is_dec_digit(char c); |
| Include guard | #ifndef/#define/#endif preventing double inclusion |
#ifndef NUMHELPERS_H |
| Object file | Machine code for one source file, not yet linked | add2_s.o |
| Linker | Combines object files + startup + libraries into an executable | gcc -o add2 ... |
| Assembly language | Human-readable form of machine code | add t0, t1, t2 |
| Assembler | Tool translating assembly to machine code | as |
| Mnemonic | The instruction name part of an instruction | add, addi, ret |
| Operand | A register or constant an instruction acts on | t0, t1, 9 |
| Register | Fast on-chip storage; 32 of them, 64 bits each in RV64 | a0, t0, x5 |
| ABI name | Conventional register name describing its role | a0 = x10 |
| zero register | x0, hardwired to 0, ignores writes |
addi a0, zero, 5 |
| Immediate | A constant encoded inside an instruction | the 9 in addi t0, zero, 9 |
| Label | A named location in code, used as a target | add2_s: |
.global |
Directive exporting a label to the linker | .global add2_s |
Practice Problems¶
Problem 1: Header and Prototype¶
You are factoring is_hex_digit out of numinfo.c into numhelpers.c. What two things must you add so that numconv.c can also call it, and what include style do you use in numconv.c?
Click to reveal solution
1. Add the **prototype** to `numhelpers.h`: 2. Add the **definition** (the function body) to `numhelpers.c`. Then in `numconv.c`, include the header with **double quotes** because it is your own project header: Finally, build with both source files on the command line so the definition gets linked in:Problem 2: Read the Instruction¶
What does each of these instructions do? Write the equivalent C-style assignment.
Click to reveal solution
The destination is always the **first** operand. The last line is the common idiom for copying one register to another (the pseudo-instruction `mv a0, a1` expands to exactly this).Problem 3: Load a Constant Three Ways¶
Write three different (but equivalent) instructions or instruction sequences that leave the value 42 in register t0.
Click to reveal solution
The first two are general and equivalent. A genuinely different third option using only `add`/`addi` would be to build it up: The point: `li` is sugar for `addi rd, zero, imm`, and the `zero` register is what makes loading a constant possible without a dedicated "load constant" instruction.Problem 4: Identify the Registers¶
A C function is declared as:
On entry to its assembly implementation, which registers hold x, y, and z? Where must the return value go?
Click to reveal solution
- `x` → `a0` - `y` → `a1` - `z` → `a2` - Return value → `a0` Arguments fill `a0`, `a1`, `a2`, ... in order. The return value always goes in `a0`, which is why a function often computes its result directly into `a0`.Problem 5: Write add3_s¶
Given this C reference, write the assembly implementation that returns a + b + c.
Click to reveal solution
`add` only takes two source registers, so you chain two additions, accumulating into `a0`. Since `a0` is both an argument register and the return register, the final sum is already in the right place when `ret` executes.Problem 6: Why 64-bit registers for 32-bit data?¶
The course uses RV64, where every register is 64 bits, yet C int values are 32 bits. Explain how a 32-bit int lives in a 64-bit register, and why this is not a problem.
Click to reveal solution
A 32-bit value occupies the **low 32 bits** of the 64-bit register; the upper 32 bits are not needed to represent the value. Arithmetic such as `add a0, a0, a1` operates on the full 64-bit width, but as long as the inputs hold valid 32-bit values, the low 32 bits of the result are the correct 32-bit answer. It is not a problem because: - We only read back the low 32 bits when we treat the value as an `int`. - RISC-V provides word-width instructions (`addw`, `lw`, etc.) for the cases where the distinction between 32-bit and 64-bit behavior matters (sign extension, overflow at 32 bits). So a 64-bit machine can comfortably manipulate 32-bit values; the extra width simply goes unused for those operations.Further Reading¶
- RISC-V ISA Specification — the official standard
- RISC-V Assembly Programmer's Manual — practical assembly reference
- The RISC-V Reader — Patterson & Waterman textbook
- RISC-V reference materials: /guides/riscv/
- Project 1 specification: /assignments/project01/
- Lab 02 specification: /assignments/lab02/
- Source notes (PDF): "/notes/CS315-01 2025-09-02 RISCV Assembly 1.pdf"
Summary¶
-
Project 1 uses separate compilation: shared helpers go in
numhelpers.c, their prototypes innumhelpers.h, and bothnuminfo.candnumconv.cinclude the header and link against the helper file. -
A header holds prototypes, not bodies. Use angle brackets for system headers (
<stdio.h>) and double quotes for your own ("numhelpers.h"), and guard headers with#ifndef/#define/#endif. -
The build pipeline goes source → object code → linked executable. The linker also pulls in startup code, so
mainis not literally the first thing that runs. -
Assembly language is a human-readable form of machine code. An assembler (
as) translates it to machine code; a processor loads, decodes, and executes that machine code, operating only on values held in registers. -
An instruction has a mnemonic and operands:
add t0, t1, t2is mnemonicadd, destinationt0, and sourcest1,t2, meaningt0 = t1 + t2. The destination comes first. -
RV64 has 32 registers, each 64 bits wide, named
x0–x31with ABI names likea0(arguments/return),t0(temporaries), andzero(hardwired 0). 32-bit values occupy the low bits of these wider registers. -
addiadds an immediate constant;addadds two registers. Loading a constant is done withaddi rd, zero, imm, conveniently written as the pseudo-instructionli rd, imm. -
A simple assembly function is a
.globallabel whose arguments arrive ina0,a1, ... and whose return value is left ina0beforeret. The C-calls-assembly pattern lets us check assembly against a C reference (C: 7/Asm: 7).