RISC-V Assembly Part 1: Basics, Instructions, and Registers¶

Overview¶

This session bridges two topics. First we finish the practical mechanics of Project 1: how to structure a multi-file C program with a Makefile, separate compilation, and shared helper code split into numhelpers.c and numhelpers.h. We then transition to the heart of the course, RISC-V assembly language. We introduce assembly as a human-readable form of machine code, dissect the anatomy of a single instruction (add t0, t1, t2), and meet the RISC-V register file: 32 registers, each 64 bits wide, named x0–x31 with friendlier ABI names like a0, t0, and zero. These fundamentals set up Lab 02, where you write your first assembly functions.

Learning Objectives¶

Structure a multi-file C project using separate compilation and a Makefile
Factor shared code into a .c source file with a matching .h header containing prototypes
Explain the C build pipeline: source → object files → link → executable
Describe what assembly language is and how it relates to machine code and the processor
Identify the parts of a RISC-V instruction: mnemonic, destination operand, and source operands
Name the 32 RISC-V registers by both their numeric (x0–x31) and ABI (a0, t0, zero, ...) names
Explain the role of the zero register and the difference between add and addi
Understand that RV64 registers are 64 bits wide even when manipulating 32-bit values

Prerequisites¶

C programming basics: functions, printf, struct, bool, pointers (Project 1)
Familiarity with the command line and gcc
Number systems: decimal, binary, and hexadecimal (Project 1)
Access to the RISC-V development environment (BeagleV machines or local VM)

1. Where We Are: Finishing Project 1¶

Before assembly, we tie off the build-system mechanics needed for Project 1. The assignment asks for two executables:

numinfo — given a string, report whether it is a valid decimal (int), binary (bin), and/or hexadecimal (hex) value.
numconv — convert a number between bases 2, 10, and 16.

Both programs share logic. For example, numinfo needs is_dec_digit() to classify characters, and numconv needs similar digit-handling code. Rather than copy-paste that code into both programs (redundancy is a code-quality deduction), we factor the common helpers into a third file that both programs compile against.

flowchart TD
    A["numinfo.c<br/>(main + helpers it owns)"] -->|"#include numhelpers.h"| C["numhelpers.c<br/>is_dec_digit()<br/>is_bin_digit()<br/>is_hex_digit()<br/>..."]
    B["numconv.c<br/>(main + helpers it owns)"] -->|"#include numhelpers.h"| C
    C -.->|"declared in"| H["numhelpers.h<br/>(prototypes)"]
    A -.-> H
    B -.-> H

The key idea: shared functions live in one place (numhelpers.c), their prototypes live in a header (numhelpers.h), and each main program includes that header so the compiler knows the function signatures.

2. Separate Compilation and Headers¶

Why split into multiple files?¶

A single giant .c file is hard to read, hard to test, and forces a full rebuild every time you touch anything. Separate compilation lets us:

Reuse code (numhelpers.c serves both numinfo and numconv)
Organize code by responsibility
Recompile only what changed (when wired up with a Makefile)

The header file holds prototypes¶

A C prototype (function declaration) tells the compiler a function's name, return type, and parameter types — without the body. When numinfo.c calls is_dec_digit('7'), the compiler must already know that is_dec_digit takes a char and returns a bool. The prototype provides that information.

Create numhelpers.h:

#ifndef NUMHELPERS_H
#define NUMHELPERS_H

#include <stdbool.h>

// Prototypes: type signatures only, no bodies.
bool is_dec_digit(char c);
bool is_bin_digit(char c);
bool is_hex_digit(char c);
bool is_dec_str(char *s);
bool is_bin_str(char *s);
bool is_hex_str(char *s);

#endif // NUMHELPERS_H

The #ifndef / #define / #endif trio is an include guard. It prevents the header's contents from being processed twice if it gets included through multiple paths, which would cause duplicate-definition errors.

Put the actual function bodies (definitions) in numhelpers.c:

#include <stdbool.h>
#include "numhelpers.h"

bool is_dec_digit(char c) {
    return c >= '0' && c <= '9';
}

bool is_bin_digit(char c) {
    return c == '0' || c == '1';
}

bool is_hex_digit(char c) {
    return is_dec_digit(c)
        || (c >= 'a' && c <= 'f')
        || (c >= 'A' && c <= 'F');
}

// ... is_dec_str, is_bin_str, is_hex_str ...

Including the header in each program¶

At the top of both numinfo.c and numconv.c, include the standard library headers you need, then your own header. Note the difference in bracket style:

#include <stdio.h>      // angle brackets: system/standard headers
#include <stdbool.h>
// ...
#include "numhelpers.h" // double quotes: your own project headers

<stdio.h> uses angle brackets — the compiler searches the standard system include directories.
"numhelpers.h" uses double quotes — the compiler searches your project directory first.

Compiling and linking¶

Once the files are in place, you can build each executable by handing all the needed .c files to gcc at once:

gcc -o numinfo numinfo.c numhelpers.c
gcc -o numconv numconv.c numhelpers.c

Each command compiles the listed source files and links them into a single executable. numhelpers.c appears in both because both programs depend on those helpers.

3. The C Build Pipeline: Compiling, Assembling, and Linking¶

Even a one-file program is more than "compile and run." Understanding the pipeline now pays off when we mix C and assembly later.

flowchart LR
    A["first.c<br/>(C source)"] -->|"compile + assemble<br/>gcc"| B["first.o<br/>(object code)"]
    S["startup code<br/>(C runtime, libc)"] --> L["Linker"]
    B --> L
    L --> E["first<br/>(executable)"]

When you run gcc -o first first.c, several stages happen behind the scenes:

Preprocess — expand #include and #define.
Compile — translate C into assembly.
Assemble — translate assembly into machine code (an object file, .o).
Link — combine your object file(s) with startup code and the C library into a runnable executable.

The "startup" piece is important: your main is not actually the first code to run. The C runtime startup code sets up the stack, prepares argc/argv, calls main, and then turns main's return value into the process exit status.

Two paths to the same machine code¶

This course will write programs partly in C and partly in assembly. Consider two functions that do the same thing — one written in C (first_c.c) and one written in assembly (first_s.s):

first_c.c                first_s.s
+-------------+          +-------------+
| main() {    |          | main:       |
|   ...       |          |   ...       |
|   ...       |          |   ...       |
| }           |          |             |
+-------------+          +-------------+
   (C source)             (assembly source)

A C compiler turns C into machine code; an assembler turns assembly into machine code. They meet at the same place — object files — which the linker stitches together. The naming convention we use throughout the course:

Suffix	Meaning	Example
`.c`	C source	`add2_c.c` (C implementation)
`.s`	Assembly source	`add2_s.s` (assembly implementation)
`.o`	Object code (machine code, not yet linked)	`add2_s.o`
(none)	Linked executable	`add2`

In Lab 02 you'll get a Makefile, a main C file, a C implementation, and an assembly stub. The main program calls both the C version and the assembly version with the same arguments so you can confirm they produce identical results:

$ ./add2 3 4
C: 7
Asm: 7

4. What Is Assembly Language?¶

Assembly language is a human-readable form of machine code (machine language).

A processor ultimately executes only machine code — raw binary values. Those binary patterns are nearly impossible for humans to read or write directly. Assembly language gives each machine instruction a readable textual form. There is (almost) a one-to-one correspondence between an assembly instruction and a machine-code instruction.

flowchart LR
    A["Assembly<br/>add t0, t1, t2"] -->|"assembler (as)"| B["Machine code<br/>0x006283B3"]
    B -->|"loaded + executed by"| C["Processor"]

We use an assembler (as) to translate assembly into machine code, just as a compiler (gcc) translates C into machine code.
The processor reads machine code from memory, decodes it, and executes it.

The processor's core elements¶

When reasoning about assembly we keep three things in mind:

Registers — a small set of very fast storage locations inside the processor. The processor can only do arithmetic directly on values in registers.
Memory — holds both code (machine instructions) and data (globals, the stack, the heap). Memory is large but slower than registers.
Instructions — the operations the processor knows how to perform.

The basic execution loop: the processor loads an instruction from memory, decodes it, and executes it. To compute on data that lives in memory, you must first load it into a register, operate on it there, and (often) store the result back to memory. This is called a load/store architecture.

RISC-V¶

RISC-V (pronounced "risk-five") is an open-standard instruction set architecture (ISA). Unlike proprietary ISAs such as x86 (Intel/AMD) or ARM, anyone can implement a RISC-V processor without licensing fees. It is a simple, clean, modular ISA built on reduced-instruction-set computer (RISC) principles, which makes it ideal for learning. This course uses the 64-bit variant with the multiply/divide extension (RV64IM).

5. Anatomy of an Instruction¶

RISC-V uses a strict, regular instruction format. Most instructions name an operation and then a destination followed by sources:

opcode  destination,  source1,  source2

Consider the canonical example:

       operands
      /--------\
add   t0, t1, t2
|     |   |   |
|     |   |   +-- source operand / register (src2)
|     |   +------ source operand / register (src1)
|     +---------- destination operand / register (dst)
+--------------- instruction name (mnemonic)

Breaking it down:

add is the mnemonic — the instruction name. A mnemonic is a short, memorable word standing in for a machine operation.
t0 is the destination operand (a register). The result is written here.
t1 and t2 are the source operands (registers). Their values are read as inputs.

So add t0, t1, t2 means:

t0 = t1 + t2

The destination comes first, then the sources. This ordering mirrors an assignment statement: the thing being assigned is on the left.

Comments and syntax discipline¶

Comments begin with # and run to end of line:

add t0, t1, t2   # t0 = t1 + t2

RISC-V assembly is strict about formatting. Operands are separated by commas, and the instruction expects exactly the right number and type of operands. Getting the order wrong (sources before destination) is a common beginner mistake — the assembler will not warn you that you meant something else; it just encodes what you wrote.

6. The RISC-V Register File¶

RISC-V 64-bit: 32 registers, where each register holds a 64-bit value.

The registers are the processor's working storage. There are exactly 32 general-purpose registers, numbered x0 through x31. In the 64-bit variant (RV64), each one is 64 bits (8 bytes) wide.

x0   x1   x2   ...   x31
+----+----+----+     +----+
| 64 | 64 | 64 | ... | 64 |   bits each
+----+----+----+     +----+

Numeric names vs. ABI names¶

Every register has two names:

A numeric name: x0, x1, ..., x31.
An ABI name that describes its conventional role: a0, a1, ..., t0, t1, ..., zero, sp, ra, and so on.

The ABI (Application Binary Interface) names are far more common in practice because they document intent. Writing a0 instead of x10 tells the reader "this holds the first argument / return value." The two names refer to the exact same physical register.

Register	ABI Name	Conventional Use
`x0`	`zero`	Hardwired constant 0
`x1`	`ra`	Return address
`x2`	`sp`	Stack pointer
`x3`	`gp`	Global pointer
`x4`	`tp`	Thread pointer
`x5`–`x7`	`t0`–`t2`	Temporaries
`x8`	`s0` / `fp`	Saved register / frame pointer
`x9`	`s1`	Saved register
`x10`–`x11`	`a0`–`a1`	Arguments / return values
`x12`–`x17`	`a2`–`a7`	Arguments
`x18`–`x27`	`s2`–`s11`	Saved registers
`x28`–`x31`	`t3`–`t6`	Temporaries

For now, the registers you will use most are:

a0–a7 — function arguments. a0 also carries the return value.
t0–t6 — temporary scratch registers for intermediate calculations.
zero — always reads as 0.

The zero register¶

Register x0 (zero) is special: it always reads as 0, and writes to it are discarded. This sounds useless but is surprisingly handy because it lets one instruction stand in for several operations:

addi a0, zero, 5    # a0 = 0 + 5  -> load the constant 5
add  a0, a1, zero   # a0 = a1 + 0 -> copy a1 into a0
sub  a0, zero, a1   # a0 = 0 - a1 -> negate a1

Having a hardwired zero means the ISA does not need separate "load constant," "move register," or "negate" instructions — they all fall out of arithmetic with zero.

32-bit values in 64-bit registers¶

The machine uses 64-bit registers, but a great deal of our work is on 32-bit values (C int, uint32_t). That is fine: a 32-bit value simply occupies the low 32 bits of a 64-bit register. There are 32-bit-aware instructions (for example addw, lw) when the distinction matters, but conceptually you can manipulate 32-bit quantities inside the wider registers without difficulty. We will return to this when it affects sign extension and overflow.

7. Loading Constants: `add` vs `addi`¶

A frequent need is putting a constant value into a register. RISC-V distinguishes between operating on two registers and operating on a register plus a small constant baked into the instruction (an immediate).

`add` — register + register¶

add t0, t1, t2    # t0 = t1 + t2   (both sources are registers)

`addi` — register + immediate¶

addi t0, t1, 9    # t0 = t1 + 9    (second operand is a constant)

The i suffix means immediate. The constant is encoded directly inside the instruction word. RISC-V immediates are 12-bit signed values, so they range from −2048 to 2047.

Loading a constant with `addi` and `zero`¶

Combining addi with the zero register loads a literal value:

addi t0, zero, 9    # t0 = 0 + 9 = 9

The assembler also provides the pseudo-instruction li ("load immediate") that does the same thing more readably:

li t0, 9            # t0 = 9   (assembler expands this to addi t0, zero, 9)

These two forms are equivalent — li is just sugar. A pseudo-instruction is an assembler convenience that expands into one or more real instructions; we will see more of them in later sessions.

Form	What it does	Notes
`add rd, rs1, rs2`	`rd = rs1 + rs2`	Both operands are registers
`addi rd, rs1, imm`	`rd = rs1 + imm`	`imm` is a 12-bit signed constant
`addi rd, zero, imm`	`rd = imm`	Idiom for loading a constant
`li rd, imm`	`rd = imm`	Pseudo-instruction; same as above

There is no subi

RISC-V has no subtract-immediate instruction. To subtract a constant, add a negative immediate: addi t0, t0, -1 subtracts 1.

8. A First Assembly Function¶

We can now read and write a complete, simple function. Functions in assembly are marked by a label (a name followed by a colon) and made visible to the linker with the .global directive.

Consider this C function:

// Returns the sum of two integers.
int add2_c(int a, int b) {
    return a + b;
}

The equivalent RISC-V assembly:

.global add2_s

# Arguments
# a0 - int a
# a1 - int b
# Return value goes in a0

add2_s:
    add a0, a0, a1    # a0 = a + b
    ret

Reading this top to bottom:

.global add2_s — a directive that exports the label add2_s so other files (like the C main) can call it.
add2_s: — a label naming this location in code; calling add2_s jumps here.
add a0, a0, a1 — adds the first two arguments. The result lands in a0.
ret — returns to the caller.

The calling convention in miniature¶

This works because of the calling convention everyone agrees on:

Arguments arrive in a0, a1, a2, ... in order. The first argument a is in a0, the second b is in a1.
The return value is placed in a0.

Since the sum needs to end up in a0 anyway, writing the result of a + b directly into a0 both computes the answer and puts it in the return register in one step. That is why add a0, a0, a1 is the entire body.

sequenceDiagram
    participant C as C main()
    participant ASM as add2_s
    C->>ASM: a0 = 3, a1 = 4, call add2_s
    Note over ASM: add a0, a0, a1  -> a0 = 7
    ASM-->>C: return (a0 = 7)
    Note over C: prints "Asm: 7"

Return values differ from C¶

In C, return a + b; is a single statement that hides where the value goes. In assembly, the return value is a convention: it is simply whatever happens to be in a0 when ret executes. Nothing forces you to put a meaningful value there — it is your responsibility to ensure a0 holds the intended result before returning.

9. Putting It Together: From C to a Running Program¶

Tying the build pipeline (Section 3) to assembly (Section 8): in Lab 02 the same logical function exists in both C and assembly, and a main program calls both.

// add2.c  (the main driver)
#include <stdio.h>
#include <stdlib.h>

int add2_c(int a, int b);   // C implementation (in add2_c.c)
int add2_s(int a, int b);   // assembly implementation (in add2_s.s)

int main(int argc, char *argv[]) {
    int a = atoi(argv[1]);
    int b = atoi(argv[2]);
    printf("C: %d\n", add2_c(a, b));
    printf("Asm: %d\n", add2_s(a, b));
    return 0;
}

The build assembles the .s file, compiles the .c files, and links everything together. By hand (the Makefile does this for you):

as -o add2_s.o add2_s.s         # assemble the assembly file
gcc -o add2 add2.c add2_c.c add2_s.o   # compile C + link in the object file

gcc can also do the assembling step itself if you hand it the .s file directly:

gcc -o add2 add2.c add2_c.c add2_s.s

Running it confirms both implementations agree:

$ ./add2 3 4
C: 7
Asm: 7

This C-calls-assembly pattern is how we will validate every assembly function we write: the C version is the reference, and your assembly must match it.

Key Concepts¶

Concept	Definition	Example
Separate compilation	Splitting a program across multiple source files compiled independently	`numinfo.c` + `numhelpers.c`
Header file	A `.h` file holding prototypes shared across `.c` files	`numhelpers.h`
Prototype	A function declaration (signature only, no body)	`bool is_dec_digit(char c);`
Include guard	`#ifndef/#define/#endif` preventing double inclusion	`#ifndef NUMHELPERS_H`
Object file	Machine code for one source file, not yet linked	`add2_s.o`
Linker	Combines object files + startup + libraries into an executable	`gcc -o add2 ...`
Assembly language	Human-readable form of machine code	`add t0, t1, t2`
Assembler	Tool translating assembly to machine code	`as`
Mnemonic	The instruction name part of an instruction	`add`, `addi`, `ret`
Operand	A register or constant an instruction acts on	`t0`, `t1`, `9`
Register	Fast on-chip storage; 32 of them, 64 bits each in RV64	`a0`, `t0`, `x5`
ABI name	Conventional register name describing its role	`a0` = `x10`
zero register	`x0`, hardwired to 0, ignores writes	`addi a0, zero, 5`
Immediate	A constant encoded inside an instruction	the `9` in `addi t0, zero, 9`
Label	A named location in code, used as a target	`add2_s:`
`.global`	Directive exporting a label to the linker	`.global add2_s`

Practice Problems¶

Problem 1: Header and Prototype¶

You are factoring is_hex_digit out of numinfo.c into numhelpers.c. What two things must you add so that numconv.c can also call it, and what include style do you use in numconv.c?

Click to reveal solution

1. Add the **prototype** to `numhelpers.h`:

bool is_hex_digit(char c);

2. Add the **definition** (the function body) to `numhelpers.c`. Then in `numconv.c`, include the header with **double quotes** because it is your own project header:

#include "numhelpers.h"

Finally, build with both source files on the command line so the definition gets linked in:

gcc -o numconv numconv.c numhelpers.c

Problem 2: Read the Instruction¶

What does each of these instructions do? Write the equivalent C-style assignment.

sub  t3, t1, t2
addi t0, t1, 100
add  a0, a1, zero

Click to reveal solution

sub  t3, t1, t2     # t3 = t1 - t2
addi t0, t1, 100    # t0 = t1 + 100
add  a0, a1, zero   # a0 = a1 + 0  -> copies a1 into a0

The destination is always the **first** operand. The last line is the common idiom for copying one register to another (the pseudo-instruction `mv a0, a1` expands to exactly this).

Problem 3: Load a Constant Three Ways¶

Write three different (but equivalent) instructions or instruction sequences that leave the value 42 in register t0.

Click to reveal solution

li   t0, 42         # pseudo-instruction (most readable)
addi t0, zero, 42   # what li expands to
add  t0, zero, t0   # only works if t0 already held 42 -- NOT general

The first two are general and equivalent. A genuinely different third option using only `add`/`addi` would be to build it up:

addi t0, zero, 40   # t0 = 40
addi t0, t0, 2      # t0 = 42

The point: `li` is sugar for `addi rd, zero, imm`, and the `zero` register is what makes loading a constant possible without a dedicated "load constant" instruction.

Problem 4: Identify the Registers¶

A C function is declared as:

int combine(int x, int y, int z);

On entry to its assembly implementation, which registers hold x, y, and z? Where must the return value go?

Click to reveal solution

- `x` → `a0` - `y` → `a1` - `z` → `a2` - Return value → `a0` Arguments fill `a0`, `a1`, `a2`, ... in order. The return value always goes in `a0`, which is why a function often computes its result directly into `a0`.

Problem 5: Write add3_s¶

Given this C reference, write the assembly implementation that returns a + b + c.

int add3_c(int a, int b, int c) {
    return a + b + c;
}

Click to reveal solution

.global add3_s

# a0 - int a
# a1 - int b
# a2 - int c

add3_s:
    add a0, a0, a1    # a0 = a + b
    add a0, a0, a2    # a0 = (a + b) + c
    ret

`add` only takes two source registers, so you chain two additions, accumulating into `a0`. Since `a0` is both an argument register and the return register, the final sum is already in the right place when `ret` executes.

Problem 6: Why 64-bit registers for 32-bit data?¶

The course uses RV64, where every register is 64 bits, yet C int values are 32 bits. Explain how a 32-bit int lives in a 64-bit register, and why this is not a problem.

Click to reveal solution

A 32-bit value occupies the **low 32 bits** of the 64-bit register; the upper 32 bits are not needed to represent the value. Arithmetic such as `add a0, a0, a1` operates on the full 64-bit width, but as long as the inputs hold valid 32-bit values, the low 32 bits of the result are the correct 32-bit answer. It is not a problem because: - We only read back the low 32 bits when we treat the value as an `int`. - RISC-V provides word-width instructions (`addw`, `lw`, etc.) for the cases where the distinction between 32-bit and 64-bit behavior matters (sign extension, overflow at 32 bits). So a 64-bit machine can comfortably manipulate 32-bit values; the extra width simply goes unused for those operations.

Summary¶

Project 1 uses separate compilation: shared helpers go in numhelpers.c, their prototypes in numhelpers.h, and both numinfo.c and numconv.c include the header and link against the helper file.
A header holds prototypes, not bodies. Use angle brackets for system headers (<stdio.h>) and double quotes for your own ("numhelpers.h"), and guard headers with #ifndef/#define/#endif.
The build pipeline goes source → object code → linked executable. The linker also pulls in startup code, so main is not literally the first thing that runs.
Assembly language is a human-readable form of machine code. An assembler (as) translates it to machine code; a processor loads, decodes, and executes that machine code, operating only on values held in registers.
An instruction has a mnemonic and operands: add t0, t1, t2 is mnemonic add, destination t0, and sources t1, t2, meaning t0 = t1 + t2. The destination comes first.
RV64 has 32 registers, each 64 bits wide, named x0–x31 with ABI names like a0 (arguments/return), t0 (temporaries), and zero (hardwired 0). 32-bit values occupy the low bits of these wider registers.
addi adds an immediate constant; add adds two registers. Loading a constant is done with addi rd, zero, imm, conveniently written as the pseudo-instruction li rd, imm.
A simple assembly function is a .global label whose arguments arrive in a0, a1, ... and whose return value is left in a0 before ret. The C-calls-assembly pattern lets us check assembly against a C reference (C: 7 / Asm: 7).

RISC-V Assembly Part 1: Basics, Instructions, and Registers¶

Overview¶

Learning Objectives¶

Prerequisites¶

1. Where We Are: Finishing Project 1¶

2. Separate Compilation and Headers¶

Why split into multiple files?¶

The header file holds prototypes¶

Including the header in each program¶

Compiling and linking¶

3. The C Build Pipeline: Compiling, Assembling, and Linking¶

Two paths to the same machine code¶

4. What Is Assembly Language?¶

The processor's core elements¶

RISC-V¶

5. Anatomy of an Instruction¶

Comments and syntax discipline¶

6. The RISC-V Register File¶

Numeric names vs. ABI names¶

The zero register¶

32-bit values in 64-bit registers¶

7. Loading Constants: add vs addi¶

add — register + register¶

addi — register + immediate¶

Loading a constant with addi and zero¶

8. A First Assembly Function¶

The calling convention in miniature¶

Return values differ from C¶

9. Putting It Together: From C to a Running Program¶

Key Concepts¶

Practice Problems¶

Problem 1: Header and Prototype¶

Problem 2: Read the Instruction¶

Problem 3: Load a Constant Three Ways¶

Problem 4: Identify the Registers¶

Problem 5: Write add3_s¶

Problem 6: Why 64-bit registers for 32-bit data?¶

Further Reading¶

Summary¶

7. Loading Constants: `add` vs `addi`¶

`add` — register + register¶

`addi` — register + immediate¶

Loading a constant with `addi` and `zero`¶