RISC-V Assembly Part 6: Byte Order, Two's Complement, and Strings¶
Overview¶
This lecture connects high-level C programs to their underlying memory representation. We treat memory as a byte-addressable array, examine how a multi-byte integer is laid out in memory (endianness), and explain why almost every modern machine uses two's complement to represent signed integers. We then look at how to convert between positive and negative values, how to widen a value with sign extension and narrow it with truncation, and finish with C strings as null-terminated arrays of character bytes and the RISC-V load/store instructions used to walk them one byte at a time. These ideas are the foundation for Project 3, where you manipulate strings and structs directly in RISC-V assembly.
Learning Objectives¶
- Describe the memory model used by a processor: registers, the program counter, and RAM partitioned into stack, heap, data, and code
- Explain that memory is a byte-addressable array and that a byte is 8 bits
- Distinguish big-endian from little-endian byte ordering and identify which order RISC-V uses
- Compare sign-magnitude, one's complement, and two's complement, and explain why two's complement won
- Convert a positive value to its negative two's complement representation (and back) using
invert(v) + 1 - Widen a value to more bits with sign extension and narrow it with truncation
- Represent C strings as null-terminated byte arrays and access individual characters with
lb/sb - Use the correct load width (
lb,lw,ld) for the data type you are accessing
Prerequisites¶
- RISC-V registers, instructions, and the fetch/decode/execute cycle (Assembly Parts 1–5)
- The RISC-V calling convention: argument registers
a0–a7, return value ina0, the stack pointersp - Memory instructions: load and store with an optional offset, e.g.
lw a1, 0(a2) - Binary and hexadecimal number systems and base conversion in C
- C pointers:
&(address-of),*(dereference), pointer arithmetic, and type casts
1. The Memory Model: Processor and RAM¶
Before we can talk about how data is laid out, we need a clear picture of the two halves of a computer and how they communicate.
flowchart LR
subgraph CPU["Processor (CPU)"]
REGS["Registers<br/>(x0–x31)"]
PC["PC<br/>(program counter)"]
EXEC["Execute<br/>instructions"]
end
subgraph RAM["Memory (RAM)"]
STACK["STACK"]
HEAP["HEAP"]
DATA["DATA"]
CODE["CODE"]
end
REGS -- "store" --> RAM
RAM -- "load" --> REGS
PC -- "fetch instruction" --> CODE
The processor can only compute on values that are held in its registers. Memory (RAM) is where both the program's code (machine instructions) and its data live. Two activities cross the boundary between CPU and memory:
- Load / store: data moves between registers and memory. A
loadcopies bytes from memory into a register; astorecopies bytes from a register out to memory. - Instruction fetch: the program counter (
PC) holds the address of the next instruction. The processor reads the instruction atPCout of the code region, decodes it, executes it, and then advancesPC(usuallyPC + 4for the next instruction, or a branch/jump target).
RAM is conventionally drawn as a single tall column partitioned into regions. From high addresses down to low addresses:
| Region | What it holds | Grows |
|---|---|---|
| Stack | Function call frames, local variables, saved registers | Downward (toward lower addresses) |
| Heap | Dynamically allocated memory (malloc) |
Upward (toward higher addresses) |
| Data | Global and static variables | Fixed |
| Code | The machine instructions of your program | Fixed |
The stack grows down and the heap grows up so that they can share the large gap of unused addresses between them. The key takeaway for this lecture is the layer below all of this: regardless of region, memory is just a long array of bytes.
2. Memory Is a Byte-Addressable Array¶
The smallest unit of memory that has its own address is the byte (8 bits). We say memory is byte addressable: every byte has a unique numeric address, starting at 0 and counting up. You can picture memory as one giant uint8_t array:
addr: ... 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
byte: | | | | | | | | | | | | | | | | |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
^
1 byte = 8 bits
A single byte can only represent 256 distinct values (0x00–0xFF). Most data we care about is larger than a byte, so larger values must occupy several consecutive bytes:
| C type | Bits | Bytes | RISC-V load |
|---|---|---|---|
char / uint8_t |
8 | 1 | lb / lbu |
short |
16 | 2 | lh / lhu |
int / uint32_t |
32 | 4 | lw / lwu |
long / pointer / uint64_t |
64 | 8 | ld |
This immediately raises a question: if an int is 4 bytes and memory addresses individual bytes, in what order do those 4 bytes get stored? That is the problem of byte ordering, or endianness.
3. Endianness: Big-Endian vs Little-Endian¶
Consider this declaration:
The 32-bit value 0xFFAA1122 is made of four bytes. From most significant to least significant they are:
The value occupies 4 consecutive byte addresses, but there are two reasonable conventions for which byte goes at the lowest address. Assume &x points at address 8 and the four bytes occupy addresses 8, 9, 10, 11.
Big-endian stores the most significant byte first (at the lowest address):
Little-endian stores the least significant byte first (at the lowest address):
Putting them side by side, reading from low address (&x) upward:
| Address | Big-endian | Little-endian |
|---|---|---|
&x + 0 (8) |
0xFF (MSB) |
0x22 (LSB) |
&x + 1 (9) |
0xAA |
0x11 |
&x + 2 (10) |
0x11 |
0xAA |
&x + 3 (11) |
0x22 (LSB) |
0xFF (MSB) |
flowchart LR
subgraph V["int x = 0xFFAA1122"]
direction TB
msb["MSB 0xFF | 0xAA | 0x11 | 0x22 LSB"]
end
V --> BE["Big-endian:<br/>FF at &x,<br/>then AA, 11, 22"]
V --> LE["Little-endian:<br/>22 at &x,<br/>then 11, AA, FF"]
Two important facts:
- RISC-V (and x86) are little-endian. The byte at the lowest address is the least significant byte.
- There is no universal standard. Different architectures chose differently. The mnemonic comes from Gulliver's Travels: the Lilliputians went to war over which end of a soft-boiled egg to crack — the big end or the little end. The choice is arbitrary, but everyone on a given machine must agree.
Why Endianness Matters¶
Inside a single machine, endianness is invisible: you store an int and load it back and get the same value. It becomes visible in two situations:
- Byte-level access. If you cast an
int *to achar *and read byte 0, which byte you get depends on endianness (see Section 4). - Networking. When two machines exchange raw bytes, a little-endian sender and a big-endian receiver will disagree on what a multi-byte field means. Network protocols therefore define a canonical network byte order (big-endian), and code converts to/from host order. TCP/IP handles reliable, in-order delivery of the bytes; agreeing on their interpretation is a separate, application-level concern.
Inspecting Bytes in C¶
We can prove the machine is little-endian by reading the first byte of an int:
#include <stdio.h>
#include <stdint.h>
int main(void) {
int x = 0xFFAA1122;
uint8_t *p = (uint8_t *)&x; // treat the int as an array of bytes
printf("p[0] = 0x%02X\n", p[0]); // 0x22 on little-endian
printf("p[1] = 0x%02X\n", p[1]); // 0x11
printf("p[2] = 0x%02X\n", p[2]); // 0xAA
printf("p[3] = 0x%02X\n", p[3]); // 0xFF
if (p[0] == 0x22)
printf("little-endian\n");
else
printf("big-endian\n");
return 0;
}
Note how the cast works: &x is an int * (the address of a 4-byte value). Casting it to uint8_t * reinterprets that same address as the start of a byte array. The pointer value (the address) does not change — only how many bytes we read when we dereference does. That is the central idea of pointer casting: a pointer always holds a 64-bit address; the pointed-to type decides the access size.
You can confirm the same thing in GDB by examining memory:
x/4xb means "examine 4 values, in hex, byte-sized." The bytes come out in little-endian order.
4. Binary Representation of Signed Integers¶
Unsigned binary is straightforward: 0b1010 is 8 + 2 = 10. But how do we represent negative numbers when all we have are 0s and 1s? In C, integer types are signed by default (int means signed int), and the representation that almost every machine uses is two's complement.
To compare the candidates, here is the full table of 4-bit patterns under three schemes. (With 4 bits there are 16 patterns.)
| Unsigned Decimal | Binary | Sign-Magnitude | Two's Complement |
|---|---|---|---|
| 0 | 0000 |
0 | 0 |
| 1 | 0001 |
1 | 1 |
| 2 | 0010 |
2 | 2 |
| 3 | 0011 |
3 | 3 |
| 4 | 0100 |
4 | 4 |
| 5 | 0101 |
5 | 5 |
| 6 | 0110 |
6 | 6 |
| 7 | 0111 |
7 | 7 |
| 8 | 1000 |
-0 | -8 |
| 9 | 1001 |
-1 | -7 |
| 10 | 1010 |
-2 | -6 |
| 11 | 1011 |
-3 | -5 |
| 12 | 1100 |
-4 | -4 |
| 13 | 1101 |
-5 | -3 |
| 14 | 1110 |
-6 | -2 |
| 15 | 1111 |
-7 | -1 |
Sign-Magnitude¶
The simplest idea: use the most significant bit as a sign flag (0 = positive, 1 = negative) and the remaining bits as the magnitude. So 0101 is +5 and 1101 is -5.
Sign-magnitude has two fatal problems:
- Two zeros.
0000is+0and1000is-0. Wasting a pattern, and equality checks (x == 0) become awkward. - Arithmetic does not "just work." Adding a positive and a negative number with plain binary addition gives the wrong answer. Watch:
0 1 0 1 (+5)
+ 1 0 1 1 (-3 in sign-magnitude)
---------
1 0 0 0 0 = 0 (after dropping carry) WRONG: 5 + (-3) should be 2
The hardware would need special-case logic (or lookup tables) to do signed addition. That does not scale.
Two's Complement¶
Two's complement keeps the MSB as a sign indicator (1 means negative) but assigns the negative values cleverly so that ordinary binary addition produces correct results. With the same example:
0 1 0 1 (+5)
+ 1 1 0 1 (-3 in two's complement)
---------
1 0 0 1 0 drop the carry out of the top:
0 0 1 0 = 2 CORRECT: 5 + (-3) = 2
The same grade-school add-with-carry circuit handles both positive and negative numbers. Two's complement wins because:
- Only one representation of zero (
0000). This is why the negative range extends one further than the positive range: 4 bits give-8 .. +7, not-7 .. +7. - The MSB is still the sign bit (0 = non-negative, 1 = negative), so checking the sign is one bit test.
- Addition, subtraction, and multiplication use the same hardware as unsigned — no special cases, no lookup tables. This scales to any bit width.
This is the reason "binary representation of integers, implied to be signed, means two's complement" on essentially every modern computer.
5. Converting Between Positive and Negative¶
The mechanical recipe to negate a two's complement value is:
That is: flip every bit (the bitwise NOT / one's complement), then add 1.
Positive to Negative: 3 to -3 (4 bits)¶
So -3 in 4-bit two's complement is 1101. Check against the table above — row 13 (1101) is indeed -3.
Negative to Positive: -3 back to 3¶
The beauty of two's complement is that the same operation (invert + 1) converts in the other direction too:
invert(v) + 1 is its own inverse, so you only ever need to learn one procedure.
In C¶
#include <stdio.h>
#include <stdint.h>
int main(void) {
int8_t v = 3;
int8_t neg = ~v + 1; // invert bits, add one
printf("%d\n", neg); // -3
printf("0x%02X\n", (uint8_t)neg); // 0xFD (1111 1101 in 8 bits)
// The compiler does the same thing for unary minus:
printf("%d\n", -v); // -3
return 0;
}
The unary - operator in C is exactly this invert + 1 operation under the hood.
6. Changing Bit Width: Sign Extension and Truncation¶
A value is stored in some fixed number of bits, but we often need to move it into a wider or narrower container — for example, loading a 4-bit or 8-bit value into a 64-bit register. The rule is: going from n bits to m bits where m > n, we must preserve both the value and its sign.
Widening: Sign Extension¶
To widen a two's complement value, copy the original sign bit (the MSB) into all the new high bits. This is sign extension.
Negative example — 1101 (-3) from 4 bits to 8 bits:
The sign bit is 1, so fill the new upper bits with 1:
You can verify this is still -3: invert 11111101 to 00000010, add 1, get 00000011 = 3, so the original was -3. The value is preserved.
Positive example — 0011 (3) from 4 bits to 8 bits:
The sign bit is 0, so fill the new upper bits with 0:
Visually, sign extension "drags" the top bit leftward across all the new positions:
Equivalently, the same value -3 shown at 4, 8, and 64 bits:
| Width | Bits (-3) |
Hex |
|---|---|---|
| 4 | 1101 |
— |
| 8 | 1111 1101 |
0xFD |
| 64 | 1111…1111 1101 |
0xFFFFFFFFFFFFFFFD |
This is exactly why Project 3's unstruct prints -99 as 0xFFFFFFFFFFFFFF9D: the negative value has been sign-extended to 64 bits, filling the top with Fs.
Important: Sign extension is only correct for signed values. For an unsigned value you zero-extend (fill the new high bits with 0). RISC-V provides both:
lbsign-extends a loaded byte, whilelbuzero-extends it.
Sign Extension by Shifting¶
A common trick to sign-extend a sub-word value that is sitting in the low bits of a register is "shift left all the way, then arithmetic-shift right all the way." The arithmetic right shift (sra / srai) replicates the sign bit as it shifts:
int32_t v = 0b1110; // we *mean* -2 as a 4-bit value
v = (v << 28) >> 28; // shift the 4 bits to the top, then SRA back
// v is now -2 (0xFFFFFFFE)
In assembly, the right shift must be the arithmetic form (srai), not the logical form (srli), or you would zero-fill and get the wrong answer for negatives.
Narrowing: Truncation¶
Going the other way — from more bits to fewer — you simply keep the low bits and discard the high bits. This is truncation.
int32_t big = -3; // 0xFFFFFFFD
int8_t small = (int8_t)big; // keep low 8 bits: 0xFD = -3 (value survives)
int32_t huge = 300; // 0x0000012C
int8_t tiny = (int8_t)huge; // keep low 8 bits: 0x2C = 44 (value LOST!)
Truncation is safe only when the value actually fits in the narrower type. -3 fits in 8 bits, so it survives. 300 does not fit in a signed 8-bit range (-128 .. 127), so the result wraps to 44. Always make sure a value fits before narrowing.
7. Strings: Arrays of Character Bytes¶
A C string is an array of characters (bytes) terminated by a special null byte '\0' (the value 0). There is no separate length field — the terminating zero is how code knows where the string ends.
Consider:
s is a pointer to the first character. The string "foo" occupies four bytes in memory (three letters plus the terminator):
address byte value (hex / char)
grows up
+-----------+
s[3] | '\0' 0 | <- terminator, value 0
+-----------+
s[2] | 'o' 6F |
+-----------+
s[1] | 'o' 6F |
+-----------+
s[0] | 'f' 66 | <- s points here
+-----------+
Each character is one byte holding its ASCII code ('f' = 0x66 = 102, 'o' = 0x6F = 111). The final s[3] holds 0, which is not the digit '0' (that would be 0x30) — it is the integer zero that marks end-of-string. Walking a string means starting at s and reading bytes until you hit the zero.
Choosing the Right Load Width¶
When you access memory in assembly, you must use the load instruction that matches the size of the thing you are reading. The handwritten notes list the three common widths:
| Instruction | Width | Reads | Typical C type |
|---|---|---|---|
lw |
32 bits | a word (4 bytes) | int, uint32_t |
ld |
64 bits | a doubleword (8 bytes) | long, pointer, uint64_t |
lb |
8 bits | one byte (sign-extended) | char, a single string character |
For strings, characters are bytes, so we use lb to read a character and sb to write one.
# a0 points at a C string; load the first character
lb t0, 0(a0) # t0 = s[0], sign-extended to 64 bits
lb Touches Only the Low 8 Bits¶
When you load or store a byte, only the low 8 bits of the register are involved. The handwritten note "lb t0, (a0) ... lower 8 bits set to byte value" captures this:
lb t0, (a0)reads one byte from memory and places it in the low 8 bits oft0, then sign-extends it into bits 8–63.sb t0, (a0)writes only the low 8 bits oft0out to one byte of memory; the upper 56 bits oft0are ignored.
register t0 (64 bits)
+----------------------------------+--------+
| sign-extended upper 56 bits | byte 0 | <- lb writes here / sb reads here
+----------------------------------+--------+
\______/
lower 8 bits = byte value
Use lbu instead of lb when the byte should be treated as unsigned (0–255) so that it is zero-extended rather than sign-extended.
Iterating a String in C¶
The canonical "string length" loop walks the pointer until it finds the null terminator:
int my_strlen(char *s) {
int len = 0;
while (*s != '\0') { // stop at the null byte
len++;
s++; // advance one byte (char is 1 byte)
}
return len;
}
Iterating a String in RISC-V Assembly¶
The same logic in RISC-V. We load one byte at a time with lb, stop when it is zero, and otherwise bump the count and advance the pointer by 1 (because each character is one byte):
# int my_strlen(char *s)
# a0 = s (pointer to string)
# returns length in a0
.global my_strlen
my_strlen:
li t0, 0 # t0 = len = 0
strlen_loop:
lb t1, 0(a0) # t1 = *s (current character byte)
beq t1, zero, strlen_done # if byte == '\0', stop
addi t0, t0, 1 # len++
addi a0, a0, 1 # s++ (advance one byte)
j strlen_loop
strlen_done:
mv a0, t0 # return value goes in a0
ret
Two details worth highlighting, both raised in lecture:
- Clear loop labels. Naming the loop (
strlen_loop) and its exit (strlen_done) makes the control flow readable. - Where the return value goes. By convention the result is returned in
a0, so we copy the computed length there beforeret.
String Copy in Assembly¶
Copying a string is the same idea with both a load and a store each iteration. Using index-based access (i in t2):
void my_strcpy(char *dst, char *src) {
int i = 0;
do {
dst[i] = src[i]; // copy a byte (including the final '\0')
} while (src[i++] != '\0');
}
# void my_strcpy(char *dst, char *src)
# a0 = dst, a1 = src
.global my_strcpy
my_strcpy:
li t2, 0 # i = 0
strcpy_loop:
add t3, a1, t2 # t3 = &src[i]
lb t1, 0(t3) # t1 = src[i]
add t4, a0, t2 # t4 = &dst[i]
sb t1, 0(t4) # dst[i] = src[i]
beq t1, zero, strcpy_done # stop AFTER copying the '\0'
addi t2, t2, 1 # i++
j strcpy_loop
strcpy_done:
ret
Because each element is one byte, the index offset is the byte offset — there is no * 4 scaling as there would be for an int array. Copying the null terminator before exiting is essential, otherwise the destination would not be a valid C string.
Calling C Library Functions from Assembly¶
You do not always have to reimplement string routines. Project 3 notes that you can call the C library directly from assembly: declare the symbol .global, set the argument registers, and call it (remember to save any caller-saved registers you still need across the call):
Key Concepts¶
| Concept | Definition | Example |
|---|---|---|
| Byte addressable | Every byte in memory has its own unique address | &x, &x + 1, &x + 2, … |
| Byte | The smallest addressable unit; 8 bits | 0x00–0xFF |
| Endianness | The order in which the bytes of a multi-byte value are stored | RISC-V is little-endian |
| Big-endian | Most significant byte at the lowest address | 0xFFAA1122 → FF AA 11 22 |
| Little-endian | Least significant byte at the lowest address | 0xFFAA1122 → 22 11 AA FF |
| Sign-magnitude | MSB is sign, rest is magnitude; has two zeros and broken arithmetic | 1101 = -5 |
| Two's complement | Signed scheme where ordinary binary addition is correct | 1101 (4-bit) = -3 |
| Negate | invert(v) + 1; its own inverse |
0011 → 1101 (3 → -3) |
| Sign extension | Widening by replicating the sign bit into new high bits | 1101 (-3, 4b) → 1111 1101 (-3, 8b) |
| Zero extension | Widening an unsigned value by filling new high bits with 0 | lbu of 0xFD → 0x00000000000000FD |
| Truncation | Narrowing by keeping the low bits, discarding the high bits | (int8_t)0xFFFFFFFD = -3 |
| C string | Null-terminated array of character bytes | "foo" = 'f' 'o' 'o' '\0' |
lb / sb |
Load/store a single byte (low 8 bits of a register) | lb t0, 0(a0) reads s[0] |
Practice Problems¶
Problem 1: Byte Layout¶
The 32-bit value int y = 0x12345678; is stored at address 0x2000. Write out the byte at each of 0x2000, 0x2001, 0x2002, 0x2003 under both big-endian and little-endian, and state which one RISC-V uses.
Click to reveal solution
The four bytes from MSB to LSB are `0x12 0x34 0x56 0x78`. | Address | Big-endian | Little-endian | |---------|-----------|---------------| | `0x2000` | `0x12` (MSB) | `0x78` (LSB) | | `0x2001` | `0x34` | `0x56` | | `0x2002` | `0x56` | `0x34` | | `0x2003` | `0x78` (LSB) | `0x12` (MSB) | RISC-V is **little-endian**, so reading a byte at `0x2000` (the lowest address) yields `0x78`, the least significant byte.Problem 2: Negate a Value¶
Compute the 8-bit two's complement representation of -5. Then verify your answer by negating it back to +5.
Click to reveal solution
Start with `+5 = 0000 0101`. So `-5 = 1111 1011 = 0xFB`. Verify by negating again: We get `+5` back, confirming `invert(v) + 1` is its own inverse.Problem 3: Sign Extension¶
The 4-bit two's complement value 1010 is to be widened to 8 bits. What is the 8-bit pattern, and what decimal value does it represent? What if 1010 were an unsigned 4-bit value loaded with lbu-style zero extension?
Click to reveal solution
As a **signed** 4-bit value, the sign bit of `1010` is `1`, so we sign-extend with `1`s: Decimal value: invert `1111 1010` → `0000 0101`, add 1 → `0000 0110 = 6`, so the value is `-6`. (Check the 4-bit table: `1010` = `-6`.) Sign extension preserves the value. As an **unsigned** 4-bit value, `1010` = 10, and zero extension gives: Same low bits, but the upper bits are zeros, and the value is `+10`. This is why `lb` (sign-extend) and `lbu` (zero-extend) give different results for bytes with the top bit set.Problem 4: What Does This C Print?¶
On a RISC-V (little-endian) machine, what is printed, and why?
Click to reveal solution
`int x = 1` is `0x00000001`. The four bytes are `01 00 00 00` from low address to high on a **little-endian** machine, because the least significant byte (`0x01`) is stored first. `p` points at the lowest address (`&x`), so `p[0]` reads `0x01`. The program prints: On a big-endian machine `p[0]` would be `0x00` and it would print `0`. This three-line program is a classic endianness detector.Problem 5: String in Memory¶
Draw the byte-by-byte memory layout of char *s = "Hi!";. How many bytes does it occupy? What does s[3] hold?
Click to reveal solution
The string `"Hi!"` is 3 visible characters plus the null terminator = **4 bytes**. `s[3]` holds the integer `0` (the null terminator `'\0'`), which marks the end of the string. It is *not* the character `'0'` (which would be `0x30`).Problem 6: Strlen Trace¶
Trace my_strlen (from Section 7) on the string "Hi!". How many times does the loop body execute, and what is returned in a0?
Click to reveal solution
`a0` starts pointing at `'H'`. `t0` (len) starts at 0. | Iteration | byte loaded (`lb t1`) | `beq` taken? | len after | a0 advanced to | |-----------|----------------------|--------------|-----------|----------------| | 1 | `'H'` (0x48) | no | 1 | `'i'` | | 2 | `'i'` (0x69) | no | 2 | `'!'` | | 3 | `'!'` (0x21) | no | 3 | `'\0'` | | 4 | `'\0'` (0x00) | **yes → done** | 3 | — | The loop body that increments runs **3 times** (once per visible character). On the 4th load the byte is `0`, so `beq t1, zero, strlen_done` branches out before incrementing. `mv a0, t0` puts `3` in `a0`, and the function returns **3**, the correct length.Further Reading¶
- RISC-V Reference Guide — instruction set, registers, and calling convention
- Key Concepts — endianness, two's complement, and bit manipulation summaries
- Project 3: RISC-V Assembly Language Part 2 — strings, structs, and sign extension in practice
- Source notes: CS315-01 2025-09-16 RISC-V Assembly 6.pdf
- Two's complement (Wikipedia)
- Endianness (Wikipedia)
- ASCII (Wikipedia)
Summary¶
-
The processor computes on registers; memory holds code and data. Loads and stores move bytes between registers and RAM, which is partitioned into stack, heap, data, and code.
-
Memory is a byte-addressable array. A byte is 8 bits and is the smallest addressable unit; larger values occupy several consecutive bytes.
-
Endianness is the order of those bytes. Big-endian stores the most significant byte at the lowest address; little-endian stores the least significant byte first. RISC-V is little-endian, and there is no universal standard, which is why network protocols define a canonical byte order.
-
Two's complement is the standard for signed integers because it has a single representation of zero, keeps the MSB as a sign bit, and makes ordinary binary addition correct for both positive and negative numbers — beating sign-magnitude and one's complement.
-
Negate with
invert(v) + 1. Flip all the bits and add one. The same operation converts negative to positive, so it is its own inverse. -
Widen with sign extension, narrow with truncation. To go to more bits, replicate the sign bit into the new high bits (zero-extend for unsigned values). To go to fewer bits, keep the low bits and discard the rest — safe only when the value fits.
-
C strings are null-terminated byte arrays. Each character is one byte holding an ASCII code, and a trailing
'\0'(value 0) marks the end; code walks the string until it finds that zero. -
Use the load width that matches the data.
lb/sbfor single bytes (only the low 8 bits of a register),lwfor 32-bit words,ldfor 64-bit doublewords; advance a string pointer by 1 byte per character.