update
Some checks failed
Sync from Gitea (main→main, keep workflow) / mirror (push) Has been cancelled

This commit is contained in:
Zheyuan Wu
2026-04-02 15:17:50 -05:00
parent 96f5304400
commit d6bc8375ce
3 changed files with 1025 additions and 2 deletions

View File

@@ -1,3 +1,430 @@
# CSE4303 Introduction to Computer Security (Lecture 17) # CSE4303 Introduction to Computer Security (Lecture 17)
## Software security > Due to lack of my attention, this lecture note is generated by AI to create continuations of the previous lecture note. I kept this warning because the note was created by AI.
#### Software security
### Administrative notes
#### Project details
- Project plan
- Thursday, `4/9` at the end of class
- `5%`
- Written document and presentation recording
- Thursday, `4/30` at `11:30 AM`
- `15%`
- View peer presentations and provide feedback
- Wednesday, `5/6` at `11:59 PM`
- `5%`
#### Upcoming schedule
- This week (`3/20`)
- software security lecture
- studio
- some time for studio on Tuesday
- Next week (`4/6`)
- fuzzing
- some time to discuss project ideas
- `4/13`
- Web security
- `4/20`
- Privacy and ethics overview
- time to work on projects
- course wrap-up
### Overview
#### Outline
- Context
- Prominent software vulnerabilities and exploits
- Buffer overflows
- Background: C code, compilation, memory layout, execution
- Baseline exploit
- Challenges
- Defenses, countermeasures, counter-countermeasures
Sources:
- SEED lab book
- Gilbert/Tamassia book
- Slides from Bryant/O'Hallaron (CMU), Dan Boneh (Stanford), Michael Hicks (UMD)
### Context
#### Context: computing stack (informal)
| Layer | Example |
| --- | --- |
| Application | web server, standalone app |
| Compiler / assembler | `gcc`, `clang` |
| OS: syscalls | `execve()`, `setuid()`, `write()`, `open()`, `fork()` |
| OS: processes, mem layout | Linux virtual memory layout |
| Architecture (ISA, execution) | x86, x86_64, ARM |
| Hardware | Intel Sky Lake processor |
- User control is strongest near the application / compiler level.
- System control becomes more important as we move down toward OS, architecture, and hardware.
### Prominent software vulnerabilities and exploits
#### Software security: categories
- Race conditions
- Privilege escalation
- Path traversal
- Environment variable modification
- Language-specific vulnerabilities
- Format string attack
- Buffer overflows
#### Buffer Overflows (BoFs)
- A buffer overflow is a bug that affects low-level code, typically in C and C++, with significant security implications.
- Normally, a program with this bug will simply crash.
- But an attacker can alter the situations that cause the program to do much worse.
- Steal private information
- e.g. Heartbleed
- Corrupt valuable information
- Run code of the attacker's choice
#### Application behavior
- Slide contains a figure only.
- Intended point: normal application behavior can become attacker-controlled if input handling is unsafe.
#### BoFs: why do we care?
- Reference from slide: [IEEE Spectrum top programming languages 2025](https://spectrum.ieee.org/top-programming-languages-2025)
#### Critical systems in C/C++
- Most OS kernels and utilities
- `fingerd`
- X windows server
- shell
- Many high-performance servers
- Microsoft IIS
- Apache `httpd`
- `nginx`
- Microsoft SQL Server
- MySQL
- `redis`
- `memcached`
- Many embedded systems
- Mars rover
- industrial control systems
- automobiles
A successful attack on these systems can be particularly dangerous.
#### Morris Worm
- Slide contains a figure / historical reference only.
- It is included as an example of how memory-corruption vulnerabilities mattered in practice.
#### Why do we still care?
- The slide references the NVD search page: [NVD vulnerability search](https://nvd.nist.gov/vuln/search)
- Why the drop?
- Memory-safe languages
- Rust
- Go
- Stronger defenses
- Fuzzing
- find bugs before release
- Change in development practices
- code review
- static analysis tools
- related engineering improvements
#### MITRE Top 25 2025
- Reference from slide: [MITRE CWE Top 25](http://cwe.mitre.org/top25/)
### Buffer overflows
#### Outline
- System Basics
- Application memory layout
- How does function call work under the hood
- `32-bit x86` only
- `64-bit x86_64` similar, but with important differences
- Buffer overflow
- Overwriting the return address pointer
- Point it to shell code injected
#### Buffer Overflows (BoFs)
- 2-minute version first, then all background / full version
#### Process memory layout: virtual address space
- Slide reference: [virtual address space reference](https://hungys.xyz/unix-prog-process-environment/)
#### Process memory layout: function calls
- Slide reference: [Tenouk function call figure 1](http://www.tenouk.com/Bufferoverflowc/Bufferoverflow2.html)
- Slide reference: [Tenouk function call figure 2](http://www.tenouk.com/Bufferoverflowc/Bufferoverflow4.html)
#### Process memory layout: compromised frame
- Slide reference: [Tenouk compromised frame figure](http://www.tenouk.com/Bufferoverflowc/Bufferoverflow4.html)
#### Computer System
High-level examples used in the slide:
```c
car *c = malloc(sizeof(car));
c->miles = 100;
c->gals = 17;
float mpg = get_mpg(c);
free(c);
```
```java
Car c = new Car();
c.setMiles(100);
c.setGals(17);
float mpg = c.getMPG();
```
Assembly-language example used in the slide:
```asm
get_mpg:
pushq %rbp
movq %rsp, %rbp
...
popq %rbp
ret
```
- The same computation can be viewed at multiple levels:
- C / Java source
- assembly language
- machine code
- operating system context
#### Little Theme 1: Representation
- All digital systems represent everything as `0`s and `1`s.
- The `0` and `1` are really two different voltage ranges in wires.
- Or magnetic positions on a disk, hole depths on a DVD, or even DNA.
- "Everything" includes:
- numbers
- integers and floating point
- characters
- building blocks of strings
- instructions
- directives to the CPU that make up a program
- pointers
- addresses of data objects stored in memory
- These encodings are stored throughout the computer system.
- registers
- caches
- memories
- disks
- They all need addresses.
- find an item
- find a place for a new item
- reclaim memory when data is no longer needed
#### Little Theme 2: Translation
- There is a big gap between how we think about programs / data and the `0`s and `1`s of computers.
- We need languages to describe what we mean.
- These languages must be translated one level at a time.
- Example point from the slide:
- we know Java as a programming language
- but we must work down to the `0`s and `1`s of computers
- we try not to lose anything in translation
- we encounter Java bytecode, C, assembly, and machine code
#### Little Theme 3: Control Flow
- How do computers orchestrate everything they are doing?
- Within one program:
- How are `if/else`, loops, and switches implemented?
- How do we track nested procedure calls?
- How do we know what to do upon `return`?
- At the operating-system level:
- library loading
- sharing system resources
- memory
- I/O
- disks
#### HW/SW Interface: Code / Compile / Run Times
- Code time
- user program in C
- `.c` file
- Compile time
- C compiler
- assembler
- Run time
- executable `.exe` file
- hardware executes it
- Note from slide:
- the compiler and assembler are themselves just programs developed using this same process
#### Assembly Programmer's View
- Programmer-visible CPU / memory state
- Program counter
- address of next instruction
- called `RIP` in x86-64
- Named registers
- heavily used program data
- together called the register file
- Condition codes
- store status information about most recent arithmetic operation
- used for conditional branching
- Memory
- byte-addressable array
- contains code and user data
- includes the stack for supporting procedures
#### Turning C into Object Code
- Code in files `p1.c` and `p2.c`
- Compile with:
```bash
gcc -Og p1.c p2.c -o p
```
- Notes from the slide
- `-Og` uses basic optimizations
- resulting machine code goes into file `p`
- Translation chain
- C program -> assembly program -> object program -> executable program
- Associated tools
- compiler
- assembler
- linker
- static libraries (`.a`)
#### Machine Instruction Example
- C code
```c
*dest = t;
```
- Meaning
- store value `t` where designated by `dest`
- Assembly
```asm
movq %rsi, (%rdx)
```
- Interpretation
- move 8-byte value to memory
- operands
- `t` is in register `%rsi`
- `dest` is in register `%rdx`
- `*dest` means memory `M[%rdx]`
- Object code
```text
0x400539: 48 89 32
```
- It is a 3-byte instruction stored at address `0x400539`.
#### IA32 Registers - 32 bits wide
- General-purpose register families shown in the slide
- `%eax`, `%ax`, `%ah`, `%al`
- `%ecx`, `%cx`, `%ch`, `%cl`
- `%edx`, `%dx`, `%dh`, `%dl`
- `%ebx`, `%bx`, `%bh`, `%bl`
- `%esi`, `%si`
- `%edi`, `%di`
- `%esp`, `%sp`
- `%ebp`, `%bp`
- Roles highlighted in the slide
- accumulate
- counter
- data
- base
- source index
- destination index
- stack pointer
- base pointer
#### Data Sizes
- Slide is primarily a figure summarizing common integer widths and sizes.
#### Assembly Data Types
- "Integer" data of `1`, `2`, `4`, or `8` bytes
- data values
- addresses / untyped pointers
- No aggregate types such as arrays or structures at the assembly level
- just contiguous bytes in memory
- Two common syntaxes
- `AT&T`
- used in the course, slides, textbook, GNU tools
- `Intel`
- used in Intel documentation and Intel tools
- Need to know which syntax you are reading because operand order may be reversed.
#### Three Basic Kinds of Instructions
- Transfer data between memory and register
- load
- `%reg = Mem[address]`
- store
- `Mem[address] = %reg`
- Perform arithmetic on register or memory data
- examples: addition, shifting, bitwise operations
- Control flow
- unconditional jumps to / from procedures
- conditional branches
#### Abstract Memory Layout
```text
High addresses
Stack <- local variables, procedure context
Dynamic Data <- heap, new / malloc
Static Data <- globals / static variables
Literals <- large constants such as strings
Instructions
Low addresses
```
#### The ELF File Format
- ELF = Executable and Linkable Format
- One of the most widely used binary object formats
- ELF is architecture-independent
- ELF file types
- Relocatable
- must be fixed by the linker before execution
- Executable
- ready for execution
- Shared
- shared libraries with linking information
- Core
- core dumps created when a program terminates with a fault
- Tools mentioned on slide
- `readelf`
- `file`
- `objdump -D`
#### Process Memory Layout (32-bit x86 machine)
- This slide is primarily a diagram.
- Key idea: a `32-bit x86` process has a standard virtual memory layout with code, static data, heap, and stack arranged in distinct regions.
We continue with the concrete runtime layout and the actual overflow mechanics in Lecture 18.

View File

@@ -0,0 +1,594 @@
# CSE4303 Introduction to Computer Security (Lecture 18)
> Due to lack of my attention, this lecture note is generated by AI to create continuations of the previous lecture note. I kept this warning because the note was created by AI.
#### Software security
### Overview
#### Outline
- Context
- Prominent software vulnerabilities and exploits
- Buffer overflows
- Background: C code, compilation, memory layout, execution
- Baseline exploit
- Challenges
- Defenses, countermeasures, counter-countermeasures
### Buffer overflows
#### All programs are stored in memory
- The process's view of memory is that it owns all of it.
- For a `32-bit` process, the virtual address space runs from:
- `0x00000000`
- to `0xffffffff`
- In reality, these are virtual addresses.
- The OS and CPU map them to physical addresses.
#### The instructions themselves are in memory
- Program text is also stored in memory.
- The slide shows instructions such as:
```asm
0x4c2 sub $0x224,%esp
0x4c1 push %ecx
0x4bf mov %esp,%ebp
0x4be push %ebp
```
- Important point:
- code and data are both memory-resident
- control flow therefore depends on values stored in memory
#### Data's location depends on how it's created
- Static initialized data example
```c
static const int y = 10;
```
- Static uninitialized data example
```c
static int x;
```
- Command-line arguments and environment are set when the process starts.
- Stack data appears when functions run.
```c
int f() {
int x;
...
}
```
- Heap data appears at runtime.
```c
malloc(sizeof(long));
```
- Summary from the slide
- Known at compile time
- text
- initialized data
- uninitialized data
- Set when process starts
- command line and environment
- Runtime
- stack
- heap
#### We are going to focus on runtime attacks
- Stack and heap grow in opposite directions.
- Compiler-generated instructions adjust the stack size at runtime.
- The stack pointer tracks the active top of the stack.
- Repeated `push` instructions place values onto the stack.
- The slides use the sequence:
- `push 1`
- `push 2`
- `push 3`
- `return`
- Heap allocation is apportioned by the OS and managed in-process by `malloc`.
- The lecture says: focusing on the stack for now.
```text
0x00000000 0xffffffff
Heap ---------------------------------> <--------------------------------- Stack
```
#### Stack layout when calling functions
Questions asked on the slide:
- What do we do when we call a function?
- What data need to be stored?
- Where do they go?
- How do we return from a function?
- What data need to be restored?
- Where do they come from?
Example used in the slide:
```c
void func(char *arg1, int arg2, int arg3)
{
char loc1[4];
int loc2;
int loc3;
}
```
Important layout points:
- Arguments are pushed in reverse order of code.
- Local variables are pushed in the same order as they appear in the code.
- The slide then introduces two unknown slots between locals and arguments.
#### Accessing variables
Example:
```c
void func(char *arg1, int arg2, int arg3)
{
char loc1[4];
int loc2;
int loc3;
...
loc2++;
...
}
```
Question from the slide:
- Where is `loc2`?
Step-by-step answer developed in the slides:
- Its absolute address is undecidable at compile time.
- We do not know exactly where `loc2` is in absolute memory.
- We do not know how many arguments there are in general.
- But `loc2` is always a fixed offset before the frame metadata.
- This motivates the frame pointer.
Definitions from the slide:
- Stack frame
- the current function call's region on the stack
- Frame pointer
- `%ebp`
- Example answer
- `loc2` is at `-8(%ebp)`
#### Notation
- `%ebp`
- a memory address stored in the frame-pointer register
- `(%ebp)`
- the value at memory address `%ebp`
- like dereferencing a pointer
The slide sequence then shows:
```asm
pushl %ebp
movl %esp, %ebp
```
- Meaning:
- first save the old frame pointer on the stack
- then set the new frame pointer to the current stack pointer
#### Returning from functions
Example caller:
```c
int main()
{
...
func("Hey", 10, -3);
...
}
```
Questions from the slides:
- How do we restore `%ebp`?
- How do we resume execution at the correct place?
Slide answers:
- Push `%ebp` before locals.
- Set `%ebp` to current `%esp`.
- Set `%ebp` to `(%ebp)` at return.
- Push next `%eip` before `call`.
- Set `%eip` to `4(%ebp)` at return.
#### Stack and functions: Summary
- Calling function
- push arguments onto the stack in reverse order
- push the return address
- the address of the instruction that should run after control returns
- jump to the function's address
- Called function
- push old frame pointer `%ebp` onto the stack
- set frame pointer `%ebp` to current `%esp`
- push local variables onto the stack
- access locals as offsets from `%ebp`
- Returning function
- reset previous stack frame
- `%ebp = (%ebp)`
- jump back to return address
- `%eip = 4(%ebp)`
#### Quick overview (again)
- Buffer
- contiguous set of a given data type
- common in C
- all strings are buffers of `char`
- Overflow
- put more into the buffer than it can hold
- Question
- where does the extra data go?
- Slide answer
- now that we know memory layouts, we can reason about where the overwrite lands
#### A buffer overflow example
Example 1 from the slide:
```c
void func(char *arg1)
{
char buffer[4];
strcpy(buffer, arg1);
...
}
int main()
{
char *mystr = "AuthMe!";
func(mystr);
...
}
```
Step-by-step effect shown in the slides:
- Initial stack region includes:
- `buffer`
- saved `%ebp`
- saved `%eip`
- `&arg1`
- First 4 bytes copied:
- `A u t h`
- Remaining bytes continue writing:
- `M e ! \0`
- Because `strcpy` keeps copying until it sees `\0`, bytes go past the end of the buffer.
- In the example, upon return:
- `%ebp` becomes `0x0021654d`
- Result:
- segmentation fault
- shown as `SEGFAULT (0x00216551)` in the slide sequence
#### A buffer overflow example: changing control data vs. changing program data
Example 2 from the slide:
```c
void func(char *arg1)
{
int authenticated = 0;
char buffer[4];
strcpy(buffer, arg1);
if (authenticated) { ... }
}
int main()
{
char *mystr = "AuthMe!";
func(mystr);
...
}
```
Step-by-step effect shown in the slides:
- Initial stack contains:
- `buffer`
- `authenticated`
- saved `%ebp`
- saved `%eip`
- `&arg1`
- Overflow writes:
- `A u t h` into `buffer`
- `M e ! \0` into `authenticated`
- Result:
- code still runs
- user now appears "authenticated"
Important lesson:
- A buffer overflow does not need to crash.
- It may silently change program data or logic.
#### `gets` vs `fgets`
Unsafe function shown in the slide:
```c
void vulnerable()
{
char buf[80];
gets(buf);
}
```
Safer version shown in the slide:
```c
void safe()
{
char buf[80];
fgets(buf, 64, stdin);
}
```
Even safer pattern from the next slide:
```c
void safer()
{
char buf[80];
fgets(buf, sizeof(buf), stdin);
}
```
Reference from slide:
- [List of vulnerable C functions](https://security.web.cern.ch/security/recommendations/en/codetools/c.shtml)
#### User-supplied strings
- In the toy examples, the strings are constant.
- In reality they come from users in many ways:
- text input
- packets
- environment variables
- file input
- Validating assumptions about user input is extremely important.
#### What's the worst that could happen?
Using:
```c
char buffer[4];
strcpy(buffer, arg1);
```
- `strcpy` will let you write as much as you want until a `\0`.
- If attacker-controlled input is long enough, the memory past the buffer becomes "all ours" from the attacker's perspective.
- That raises the key question from the slide:
- what could you write to memory to wreak havoc?
#### Code injection
- Title-only transition slide.
- It introduces the move from accidental overwrite to deliberate attacker payloads.
#### High-level idea
Example used in the slide:
```c
void func(char *arg1)
{
char buffer[4];
sprintf(buffer, arg1);
...
}
```
Two-step plan shown in the slides:
- 1. Load my own code into memory.
- 2. Somehow get `%eip` to point to it.
The slide sequence draws this as:
- vulnerable buffer on stack
- attacker-controlled bytes placed in memory
- `%eip` redirected toward those bytes
#### This is nontrivial
- Pulling off this attack requires getting a few things really right, and some things only sorta right.
- The lecture says to think about what is tricky about the attack.
- Main security idea:
- the key to defending it is to make the hard parts really hard
#### Challenge 1: Loading code into memory
- The attacker payload must be machine-code instructions.
- already compiled
- ready to run
- We have to be careful in how we construct it.
- It cannot contain all-zero bytes.
- otherwise `sprintf`, `gets`, `scanf`, and similar routines stop copying
- It cannot make use of the loader.
- because we are injecting the bytes directly
- It cannot use the stack.
- because we are in the process of smashing it
- The lecture then gives the name:
- shellcode
#### What kind of code would we want to run?
- Goal: full-purpose shell
- code to launch a shell is called shellcode
- it is nontrivial to write shellcode that works as injected code
- no zeroes
- cannot use the stack
- no loader dependence
- there are many shellcodes already written
- there are even competitions for writing the smallest shellcode
- Goal: privilege escalation
- ideally, attacker goes from guest or non-user to root
#### Shellcode
High-level C version shown in the slides:
```c
#include <stdio.h>
int main() {
char *name[2];
name[0] = "/bin/sh";
name[1] = NULL;
execve(name[0], name, NULL);
}
```
Assembly version shown in the slides:
```asm
xorl %eax, %eax
pushl %eax
pushl $0x68732f2f
pushl $0x6e69622f
movl %esp, %ebx
pushl %eax
...
```
Machine-code bytes shown in the slides:
```text
"\x31\xc0"
"\x50"
"\x68""//sh"
"\x68""/bin"
"\x89\xe3"
"\x50"
...
```
Important point from the slide:
- those machine-code bytes can become part of the attacker's input
#### Challenge 2: Getting our injected code to run
- We cannot insert a fresh "jump into my code" instruction.
- We must use whatever code is already running.
#### Hijacking the saved `%eip`
- Strategy:
- overwrite the saved return address
- make it point into the injected bytes
- Core idea:
- when the function returns, the CPU loads the overwritten return address into `%eip`
Question raised by the slides:
- But how do we know the address?
Failure mode shown in the slide sequence:
- if the guessed address is wrong, the CPU tries to execute data bytes
- this is most likely not valid code
- result:
- invalid instruction
- CPU "panic" / crash
#### Challenge 3: Finding the return address
- If we do not have the code, we may not know how far the buffer is from the saved `%ebp`.
- One approach:
- try many different values
- Worst case:
- `2^32` possible addresses on `32-bit`
- `2^64` possible addresses on `64-bit`
- But without address randomization:
- the stack always starts from the same fixed address
- the stack grows, but usually not very deeply unless heavily recursive
#### Improving our chances: nop sleds
- `nop` is a single-byte instruction.
- Definition:
- it does nothing except move execution to the next instruction
- NOP sled idea:
- put a long sequence of `nop` bytes before the real malicious code
- now jumping anywhere in that region still works
- execution slides down into the payload
Why this helps:
- it increases the chance that an approximate address guess still succeeds
- the slides explicitly state:
- now we improve our chances of guessing by a factor of `#nops`
```text
[padding][saved return address guess][nop nop nop ...][malicious code]
```
#### Putting it all together
- Payload components shown in the slides:
- padding
- guessed return address
- NOP sled
- malicious code
- Constraint noted by the lecture:
- input has to start wherever the vulnerable `gets` / similar function begins writing
#### Buffer overflow defense #1: use secure bounds-checking functions
- User-level protection
- Replace unbounded routines with bounded ones.
- Prefer secure languages where possible:
- Java
- Rust
- etc.
#### Buffer overflow defense #2: Address Space Layout Randomization (ASLR)
- Randomize starting address of program regions.
- Goal:
- prevent attacker from guessing / finding the correct address to put in the return-address slot
- OS-level protection
#### Buffer overflow counter-technique: NOP sled
- Counter-technique against uncertain addresses
- By jumping somewhere into a wide sled, exact address knowledge becomes less necessary
#### Buffer overflow defense #3: Canary
- Put a guard value between vulnerable local data and control-flow data.
- If overflow changes the canary, the program can detect corruption before returning.
- OS-level / compiler-assisted protection in the lecture framing
#### Buffer overflow defense #4: No-execute bits (NX)
- Mark the stack as not executable.
- Requires hardware support.
- OS / hardware-level protection
#### Buffer overflow counter-technique: ret-to-libc and ROP
- Code in the C library is already stored at consistent addresses.
- Attacker can find code in the C library that has the desired effect.
- possibly heavily fragmented
- Then return to the necessary address or addresses in the proper order.
- This is the motivation behind:
- `ret-to-libc`
- Return-Oriented Programming (ROP)
We will continue from defenses / exploitation follow-ups in the next lecture.

View File

@@ -20,5 +20,7 @@ export default {
CSE4303_L13: "Introduction to Computer Security (Lecture 13)", CSE4303_L13: "Introduction to Computer Security (Lecture 13)",
CSE4303_L14: "Introduction to Computer Security (Lecture 14)", CSE4303_L14: "Introduction to Computer Security (Lecture 14)",
CSE4303_L15: "Introduction to Computer Security (Lecture 15)", CSE4303_L15: "Introduction to Computer Security (Lecture 15)",
CSE4303_L16: "Introduction to Computer Security (Lecture 16)" CSE4303_L16: "Introduction to Computer Security (Lecture 16)",
CSE4303_L17: "Introduction to Computer Security (Lecture 17)",
CSE4303_L18: "Introduction to Computer Security (Lecture 18)"
} }