update
Some checks failed
Sync from Gitea (main→main, keep workflow) / mirror (push) Has been cancelled
Some checks failed
Sync from Gitea (main→main, keep workflow) / mirror (push) Has been cancelled
This commit is contained in:
@@ -1,3 +1,430 @@
|
|||||||
# CSE4303 Introduction to Computer Security (Lecture 17)
|
# CSE4303 Introduction to Computer Security (Lecture 17)
|
||||||
|
|
||||||
## Software security
|
> Due to lack of my attention, this lecture note is generated by AI to create continuations of the previous lecture note. I kept this warning because the note was created by AI.
|
||||||
|
|
||||||
|
#### Software security
|
||||||
|
|
||||||
|
### Administrative notes
|
||||||
|
|
||||||
|
#### Project details
|
||||||
|
|
||||||
|
- Project plan
|
||||||
|
- Thursday, `4/9` at the end of class
|
||||||
|
- `5%`
|
||||||
|
- Written document and presentation recording
|
||||||
|
- Thursday, `4/30` at `11:30 AM`
|
||||||
|
- `15%`
|
||||||
|
- View peer presentations and provide feedback
|
||||||
|
- Wednesday, `5/6` at `11:59 PM`
|
||||||
|
- `5%`
|
||||||
|
|
||||||
|
#### Upcoming schedule
|
||||||
|
|
||||||
|
- This week (`3/20`)
|
||||||
|
- software security lecture
|
||||||
|
- studio
|
||||||
|
- some time for studio on Tuesday
|
||||||
|
- Next week (`4/6`)
|
||||||
|
- fuzzing
|
||||||
|
- some time to discuss project ideas
|
||||||
|
- `4/13`
|
||||||
|
- Web security
|
||||||
|
- `4/20`
|
||||||
|
- Privacy and ethics overview
|
||||||
|
- time to work on projects
|
||||||
|
- course wrap-up
|
||||||
|
|
||||||
|
### Overview
|
||||||
|
|
||||||
|
#### Outline
|
||||||
|
|
||||||
|
- Context
|
||||||
|
- Prominent software vulnerabilities and exploits
|
||||||
|
- Buffer overflows
|
||||||
|
- Background: C code, compilation, memory layout, execution
|
||||||
|
- Baseline exploit
|
||||||
|
- Challenges
|
||||||
|
- Defenses, countermeasures, counter-countermeasures
|
||||||
|
|
||||||
|
Sources:
|
||||||
|
- SEED lab book
|
||||||
|
- Gilbert/Tamassia book
|
||||||
|
- Slides from Bryant/O'Hallaron (CMU), Dan Boneh (Stanford), Michael Hicks (UMD)
|
||||||
|
|
||||||
|
### Context
|
||||||
|
|
||||||
|
#### Context: computing stack (informal)
|
||||||
|
|
||||||
|
| Layer | Example |
|
||||||
|
| --- | --- |
|
||||||
|
| Application | web server, standalone app |
|
||||||
|
| Compiler / assembler | `gcc`, `clang` |
|
||||||
|
| OS: syscalls | `execve()`, `setuid()`, `write()`, `open()`, `fork()` |
|
||||||
|
| OS: processes, mem layout | Linux virtual memory layout |
|
||||||
|
| Architecture (ISA, execution) | x86, x86_64, ARM |
|
||||||
|
| Hardware | Intel Sky Lake processor |
|
||||||
|
|
||||||
|
- User control is strongest near the application / compiler level.
|
||||||
|
- System control becomes more important as we move down toward OS, architecture, and hardware.
|
||||||
|
|
||||||
|
### Prominent software vulnerabilities and exploits
|
||||||
|
|
||||||
|
#### Software security: categories
|
||||||
|
|
||||||
|
- Race conditions
|
||||||
|
- Privilege escalation
|
||||||
|
- Path traversal
|
||||||
|
- Environment variable modification
|
||||||
|
- Language-specific vulnerabilities
|
||||||
|
- Format string attack
|
||||||
|
- Buffer overflows
|
||||||
|
|
||||||
|
#### Buffer Overflows (BoFs)
|
||||||
|
|
||||||
|
- A buffer overflow is a bug that affects low-level code, typically in C and C++, with significant security implications.
|
||||||
|
- Normally, a program with this bug will simply crash.
|
||||||
|
- But an attacker can alter the situations that cause the program to do much worse.
|
||||||
|
- Steal private information
|
||||||
|
- e.g. Heartbleed
|
||||||
|
- Corrupt valuable information
|
||||||
|
- Run code of the attacker's choice
|
||||||
|
|
||||||
|
#### Application behavior
|
||||||
|
|
||||||
|
- Slide contains a figure only.
|
||||||
|
- Intended point: normal application behavior can become attacker-controlled if input handling is unsafe.
|
||||||
|
|
||||||
|
#### BoFs: why do we care?
|
||||||
|
|
||||||
|
- Reference from slide: [IEEE Spectrum top programming languages 2025](https://spectrum.ieee.org/top-programming-languages-2025)
|
||||||
|
|
||||||
|
#### Critical systems in C/C++
|
||||||
|
|
||||||
|
- Most OS kernels and utilities
|
||||||
|
- `fingerd`
|
||||||
|
- X windows server
|
||||||
|
- shell
|
||||||
|
- Many high-performance servers
|
||||||
|
- Microsoft IIS
|
||||||
|
- Apache `httpd`
|
||||||
|
- `nginx`
|
||||||
|
- Microsoft SQL Server
|
||||||
|
- MySQL
|
||||||
|
- `redis`
|
||||||
|
- `memcached`
|
||||||
|
- Many embedded systems
|
||||||
|
- Mars rover
|
||||||
|
- industrial control systems
|
||||||
|
- automobiles
|
||||||
|
|
||||||
|
A successful attack on these systems can be particularly dangerous.
|
||||||
|
|
||||||
|
#### Morris Worm
|
||||||
|
|
||||||
|
- Slide contains a figure / historical reference only.
|
||||||
|
- It is included as an example of how memory-corruption vulnerabilities mattered in practice.
|
||||||
|
|
||||||
|
#### Why do we still care?
|
||||||
|
|
||||||
|
- The slide references the NVD search page: [NVD vulnerability search](https://nvd.nist.gov/vuln/search)
|
||||||
|
- Why the drop?
|
||||||
|
- Memory-safe languages
|
||||||
|
- Rust
|
||||||
|
- Go
|
||||||
|
- Stronger defenses
|
||||||
|
- Fuzzing
|
||||||
|
- find bugs before release
|
||||||
|
- Change in development practices
|
||||||
|
- code review
|
||||||
|
- static analysis tools
|
||||||
|
- related engineering improvements
|
||||||
|
|
||||||
|
#### MITRE Top 25 2025
|
||||||
|
|
||||||
|
- Reference from slide: [MITRE CWE Top 25](http://cwe.mitre.org/top25/)
|
||||||
|
|
||||||
|
### Buffer overflows
|
||||||
|
|
||||||
|
#### Outline
|
||||||
|
|
||||||
|
- System Basics
|
||||||
|
- Application memory layout
|
||||||
|
- How does function call work under the hood
|
||||||
|
- `32-bit x86` only
|
||||||
|
- `64-bit x86_64` similar, but with important differences
|
||||||
|
- Buffer overflow
|
||||||
|
- Overwriting the return address pointer
|
||||||
|
- Point it to shell code injected
|
||||||
|
|
||||||
|
#### Buffer Overflows (BoFs)
|
||||||
|
|
||||||
|
- 2-minute version first, then all background / full version
|
||||||
|
|
||||||
|
#### Process memory layout: virtual address space
|
||||||
|
|
||||||
|
- Slide reference: [virtual address space reference](https://hungys.xyz/unix-prog-process-environment/)
|
||||||
|
|
||||||
|
#### Process memory layout: function calls
|
||||||
|
|
||||||
|
- Slide reference: [Tenouk function call figure 1](http://www.tenouk.com/Bufferoverflowc/Bufferoverflow2.html)
|
||||||
|
- Slide reference: [Tenouk function call figure 2](http://www.tenouk.com/Bufferoverflowc/Bufferoverflow4.html)
|
||||||
|
|
||||||
|
#### Process memory layout: compromised frame
|
||||||
|
|
||||||
|
- Slide reference: [Tenouk compromised frame figure](http://www.tenouk.com/Bufferoverflowc/Bufferoverflow4.html)
|
||||||
|
|
||||||
|
#### Computer System
|
||||||
|
|
||||||
|
High-level examples used in the slide:
|
||||||
|
|
||||||
|
```c
|
||||||
|
car *c = malloc(sizeof(car));
|
||||||
|
c->miles = 100;
|
||||||
|
c->gals = 17;
|
||||||
|
float mpg = get_mpg(c);
|
||||||
|
free(c);
|
||||||
|
```
|
||||||
|
|
||||||
|
```java
|
||||||
|
Car c = new Car();
|
||||||
|
c.setMiles(100);
|
||||||
|
c.setGals(17);
|
||||||
|
float mpg = c.getMPG();
|
||||||
|
```
|
||||||
|
|
||||||
|
Assembly-language example used in the slide:
|
||||||
|
|
||||||
|
```asm
|
||||||
|
get_mpg:
|
||||||
|
pushq %rbp
|
||||||
|
movq %rsp, %rbp
|
||||||
|
...
|
||||||
|
popq %rbp
|
||||||
|
ret
|
||||||
|
```
|
||||||
|
|
||||||
|
- The same computation can be viewed at multiple levels:
|
||||||
|
- C / Java source
|
||||||
|
- assembly language
|
||||||
|
- machine code
|
||||||
|
- operating system context
|
||||||
|
|
||||||
|
#### Little Theme 1: Representation
|
||||||
|
|
||||||
|
- All digital systems represent everything as `0`s and `1`s.
|
||||||
|
- The `0` and `1` are really two different voltage ranges in wires.
|
||||||
|
- Or magnetic positions on a disk, hole depths on a DVD, or even DNA.
|
||||||
|
- "Everything" includes:
|
||||||
|
- numbers
|
||||||
|
- integers and floating point
|
||||||
|
- characters
|
||||||
|
- building blocks of strings
|
||||||
|
- instructions
|
||||||
|
- directives to the CPU that make up a program
|
||||||
|
- pointers
|
||||||
|
- addresses of data objects stored in memory
|
||||||
|
- These encodings are stored throughout the computer system.
|
||||||
|
- registers
|
||||||
|
- caches
|
||||||
|
- memories
|
||||||
|
- disks
|
||||||
|
- They all need addresses.
|
||||||
|
- find an item
|
||||||
|
- find a place for a new item
|
||||||
|
- reclaim memory when data is no longer needed
|
||||||
|
|
||||||
|
#### Little Theme 2: Translation
|
||||||
|
|
||||||
|
- There is a big gap between how we think about programs / data and the `0`s and `1`s of computers.
|
||||||
|
- We need languages to describe what we mean.
|
||||||
|
- These languages must be translated one level at a time.
|
||||||
|
- Example point from the slide:
|
||||||
|
- we know Java as a programming language
|
||||||
|
- but we must work down to the `0`s and `1`s of computers
|
||||||
|
- we try not to lose anything in translation
|
||||||
|
- we encounter Java bytecode, C, assembly, and machine code
|
||||||
|
|
||||||
|
#### Little Theme 3: Control Flow
|
||||||
|
|
||||||
|
- How do computers orchestrate everything they are doing?
|
||||||
|
- Within one program:
|
||||||
|
- How are `if/else`, loops, and switches implemented?
|
||||||
|
- How do we track nested procedure calls?
|
||||||
|
- How do we know what to do upon `return`?
|
||||||
|
- At the operating-system level:
|
||||||
|
- library loading
|
||||||
|
- sharing system resources
|
||||||
|
- memory
|
||||||
|
- I/O
|
||||||
|
- disks
|
||||||
|
|
||||||
|
#### HW/SW Interface: Code / Compile / Run Times
|
||||||
|
|
||||||
|
- Code time
|
||||||
|
- user program in C
|
||||||
|
- `.c` file
|
||||||
|
- Compile time
|
||||||
|
- C compiler
|
||||||
|
- assembler
|
||||||
|
- Run time
|
||||||
|
- executable `.exe` file
|
||||||
|
- hardware executes it
|
||||||
|
- Note from slide:
|
||||||
|
- the compiler and assembler are themselves just programs developed using this same process
|
||||||
|
|
||||||
|
#### Assembly Programmer's View
|
||||||
|
|
||||||
|
- Programmer-visible CPU / memory state
|
||||||
|
- Program counter
|
||||||
|
- address of next instruction
|
||||||
|
- called `RIP` in x86-64
|
||||||
|
- Named registers
|
||||||
|
- heavily used program data
|
||||||
|
- together called the register file
|
||||||
|
- Condition codes
|
||||||
|
- store status information about most recent arithmetic operation
|
||||||
|
- used for conditional branching
|
||||||
|
- Memory
|
||||||
|
- byte-addressable array
|
||||||
|
- contains code and user data
|
||||||
|
- includes the stack for supporting procedures
|
||||||
|
|
||||||
|
#### Turning C into Object Code
|
||||||
|
|
||||||
|
- Code in files `p1.c` and `p2.c`
|
||||||
|
- Compile with:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gcc -Og p1.c p2.c -o p
|
||||||
|
```
|
||||||
|
|
||||||
|
- Notes from the slide
|
||||||
|
- `-Og` uses basic optimizations
|
||||||
|
- resulting machine code goes into file `p`
|
||||||
|
- Translation chain
|
||||||
|
- C program -> assembly program -> object program -> executable program
|
||||||
|
- Associated tools
|
||||||
|
- compiler
|
||||||
|
- assembler
|
||||||
|
- linker
|
||||||
|
- static libraries (`.a`)
|
||||||
|
|
||||||
|
#### Machine Instruction Example
|
||||||
|
|
||||||
|
- C code
|
||||||
|
|
||||||
|
```c
|
||||||
|
*dest = t;
|
||||||
|
```
|
||||||
|
|
||||||
|
- Meaning
|
||||||
|
- store value `t` where designated by `dest`
|
||||||
|
- Assembly
|
||||||
|
|
||||||
|
```asm
|
||||||
|
movq %rsi, (%rdx)
|
||||||
|
```
|
||||||
|
|
||||||
|
- Interpretation
|
||||||
|
- move 8-byte value to memory
|
||||||
|
- operands
|
||||||
|
- `t` is in register `%rsi`
|
||||||
|
- `dest` is in register `%rdx`
|
||||||
|
- `*dest` means memory `M[%rdx]`
|
||||||
|
- Object code
|
||||||
|
|
||||||
|
```text
|
||||||
|
0x400539: 48 89 32
|
||||||
|
```
|
||||||
|
|
||||||
|
- It is a 3-byte instruction stored at address `0x400539`.
|
||||||
|
|
||||||
|
#### IA32 Registers - 32 bits wide
|
||||||
|
|
||||||
|
- General-purpose register families shown in the slide
|
||||||
|
- `%eax`, `%ax`, `%ah`, `%al`
|
||||||
|
- `%ecx`, `%cx`, `%ch`, `%cl`
|
||||||
|
- `%edx`, `%dx`, `%dh`, `%dl`
|
||||||
|
- `%ebx`, `%bx`, `%bh`, `%bl`
|
||||||
|
- `%esi`, `%si`
|
||||||
|
- `%edi`, `%di`
|
||||||
|
- `%esp`, `%sp`
|
||||||
|
- `%ebp`, `%bp`
|
||||||
|
- Roles highlighted in the slide
|
||||||
|
- accumulate
|
||||||
|
- counter
|
||||||
|
- data
|
||||||
|
- base
|
||||||
|
- source index
|
||||||
|
- destination index
|
||||||
|
- stack pointer
|
||||||
|
- base pointer
|
||||||
|
|
||||||
|
#### Data Sizes
|
||||||
|
|
||||||
|
- Slide is primarily a figure summarizing common integer widths and sizes.
|
||||||
|
|
||||||
|
#### Assembly Data Types
|
||||||
|
|
||||||
|
- "Integer" data of `1`, `2`, `4`, or `8` bytes
|
||||||
|
- data values
|
||||||
|
- addresses / untyped pointers
|
||||||
|
- No aggregate types such as arrays or structures at the assembly level
|
||||||
|
- just contiguous bytes in memory
|
||||||
|
- Two common syntaxes
|
||||||
|
- `AT&T`
|
||||||
|
- used in the course, slides, textbook, GNU tools
|
||||||
|
- `Intel`
|
||||||
|
- used in Intel documentation and Intel tools
|
||||||
|
- Need to know which syntax you are reading because operand order may be reversed.
|
||||||
|
|
||||||
|
#### Three Basic Kinds of Instructions
|
||||||
|
|
||||||
|
- Transfer data between memory and register
|
||||||
|
- load
|
||||||
|
- `%reg = Mem[address]`
|
||||||
|
- store
|
||||||
|
- `Mem[address] = %reg`
|
||||||
|
- Perform arithmetic on register or memory data
|
||||||
|
- examples: addition, shifting, bitwise operations
|
||||||
|
- Control flow
|
||||||
|
- unconditional jumps to / from procedures
|
||||||
|
- conditional branches
|
||||||
|
|
||||||
|
#### Abstract Memory Layout
|
||||||
|
|
||||||
|
```text
|
||||||
|
High addresses
|
||||||
|
Stack <- local variables, procedure context
|
||||||
|
Dynamic Data <- heap, new / malloc
|
||||||
|
Static Data <- globals / static variables
|
||||||
|
Literals <- large constants such as strings
|
||||||
|
Instructions
|
||||||
|
Low addresses
|
||||||
|
```
|
||||||
|
|
||||||
|
#### The ELF File Format
|
||||||
|
|
||||||
|
- ELF = Executable and Linkable Format
|
||||||
|
- One of the most widely used binary object formats
|
||||||
|
- ELF is architecture-independent
|
||||||
|
- ELF file types
|
||||||
|
- Relocatable
|
||||||
|
- must be fixed by the linker before execution
|
||||||
|
- Executable
|
||||||
|
- ready for execution
|
||||||
|
- Shared
|
||||||
|
- shared libraries with linking information
|
||||||
|
- Core
|
||||||
|
- core dumps created when a program terminates with a fault
|
||||||
|
- Tools mentioned on slide
|
||||||
|
- `readelf`
|
||||||
|
- `file`
|
||||||
|
- `objdump -D`
|
||||||
|
|
||||||
|
#### Process Memory Layout (32-bit x86 machine)
|
||||||
|
|
||||||
|
- This slide is primarily a diagram.
|
||||||
|
- Key idea: a `32-bit x86` process has a standard virtual memory layout with code, static data, heap, and stack arranged in distinct regions.
|
||||||
|
|
||||||
|
We continue with the concrete runtime layout and the actual overflow mechanics in Lecture 18.
|
||||||
|
|||||||
594
content/CSE4303/CSE4303_L18.md
Normal file
594
content/CSE4303/CSE4303_L18.md
Normal file
@@ -0,0 +1,594 @@
|
|||||||
|
# CSE4303 Introduction to Computer Security (Lecture 18)
|
||||||
|
|
||||||
|
> Due to lack of my attention, this lecture note is generated by AI to create continuations of the previous lecture note. I kept this warning because the note was created by AI.
|
||||||
|
|
||||||
|
#### Software security
|
||||||
|
|
||||||
|
### Overview
|
||||||
|
|
||||||
|
#### Outline
|
||||||
|
|
||||||
|
- Context
|
||||||
|
- Prominent software vulnerabilities and exploits
|
||||||
|
- Buffer overflows
|
||||||
|
- Background: C code, compilation, memory layout, execution
|
||||||
|
- Baseline exploit
|
||||||
|
- Challenges
|
||||||
|
- Defenses, countermeasures, counter-countermeasures
|
||||||
|
|
||||||
|
### Buffer overflows
|
||||||
|
|
||||||
|
#### All programs are stored in memory
|
||||||
|
|
||||||
|
- The process's view of memory is that it owns all of it.
|
||||||
|
- For a `32-bit` process, the virtual address space runs from:
|
||||||
|
- `0x00000000`
|
||||||
|
- to `0xffffffff`
|
||||||
|
- In reality, these are virtual addresses.
|
||||||
|
- The OS and CPU map them to physical addresses.
|
||||||
|
|
||||||
|
#### The instructions themselves are in memory
|
||||||
|
|
||||||
|
- Program text is also stored in memory.
|
||||||
|
- The slide shows instructions such as:
|
||||||
|
|
||||||
|
```asm
|
||||||
|
0x4c2 sub $0x224,%esp
|
||||||
|
0x4c1 push %ecx
|
||||||
|
0x4bf mov %esp,%ebp
|
||||||
|
0x4be push %ebp
|
||||||
|
```
|
||||||
|
|
||||||
|
- Important point:
|
||||||
|
- code and data are both memory-resident
|
||||||
|
- control flow therefore depends on values stored in memory
|
||||||
|
|
||||||
|
#### Data's location depends on how it's created
|
||||||
|
|
||||||
|
- Static initialized data example
|
||||||
|
|
||||||
|
```c
|
||||||
|
static const int y = 10;
|
||||||
|
```
|
||||||
|
|
||||||
|
- Static uninitialized data example
|
||||||
|
|
||||||
|
```c
|
||||||
|
static int x;
|
||||||
|
```
|
||||||
|
|
||||||
|
- Command-line arguments and environment are set when the process starts.
|
||||||
|
- Stack data appears when functions run.
|
||||||
|
|
||||||
|
```c
|
||||||
|
int f() {
|
||||||
|
int x;
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
- Heap data appears at runtime.
|
||||||
|
|
||||||
|
```c
|
||||||
|
malloc(sizeof(long));
|
||||||
|
```
|
||||||
|
|
||||||
|
- Summary from the slide
|
||||||
|
- Known at compile time
|
||||||
|
- text
|
||||||
|
- initialized data
|
||||||
|
- uninitialized data
|
||||||
|
- Set when process starts
|
||||||
|
- command line and environment
|
||||||
|
- Runtime
|
||||||
|
- stack
|
||||||
|
- heap
|
||||||
|
|
||||||
|
#### We are going to focus on runtime attacks
|
||||||
|
|
||||||
|
- Stack and heap grow in opposite directions.
|
||||||
|
- Compiler-generated instructions adjust the stack size at runtime.
|
||||||
|
- The stack pointer tracks the active top of the stack.
|
||||||
|
- Repeated `push` instructions place values onto the stack.
|
||||||
|
- The slides use the sequence:
|
||||||
|
- `push 1`
|
||||||
|
- `push 2`
|
||||||
|
- `push 3`
|
||||||
|
- `return`
|
||||||
|
- Heap allocation is apportioned by the OS and managed in-process by `malloc`.
|
||||||
|
- The lecture says: focusing on the stack for now.
|
||||||
|
|
||||||
|
```text
|
||||||
|
0x00000000 0xffffffff
|
||||||
|
Heap ---------------------------------> <--------------------------------- Stack
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Stack layout when calling functions
|
||||||
|
|
||||||
|
Questions asked on the slide:
|
||||||
|
|
||||||
|
- What do we do when we call a function?
|
||||||
|
- What data need to be stored?
|
||||||
|
- Where do they go?
|
||||||
|
- How do we return from a function?
|
||||||
|
- What data need to be restored?
|
||||||
|
- Where do they come from?
|
||||||
|
|
||||||
|
Example used in the slide:
|
||||||
|
|
||||||
|
```c
|
||||||
|
void func(char *arg1, int arg2, int arg3)
|
||||||
|
{
|
||||||
|
char loc1[4];
|
||||||
|
int loc2;
|
||||||
|
int loc3;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Important layout points:
|
||||||
|
|
||||||
|
- Arguments are pushed in reverse order of code.
|
||||||
|
- Local variables are pushed in the same order as they appear in the code.
|
||||||
|
- The slide then introduces two unknown slots between locals and arguments.
|
||||||
|
|
||||||
|
#### Accessing variables
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
```c
|
||||||
|
void func(char *arg1, int arg2, int arg3)
|
||||||
|
{
|
||||||
|
char loc1[4];
|
||||||
|
int loc2;
|
||||||
|
int loc3;
|
||||||
|
...
|
||||||
|
loc2++;
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Question from the slide:
|
||||||
|
- Where is `loc2`?
|
||||||
|
|
||||||
|
Step-by-step answer developed in the slides:
|
||||||
|
|
||||||
|
- Its absolute address is undecidable at compile time.
|
||||||
|
- We do not know exactly where `loc2` is in absolute memory.
|
||||||
|
- We do not know how many arguments there are in general.
|
||||||
|
- But `loc2` is always a fixed offset before the frame metadata.
|
||||||
|
- This motivates the frame pointer.
|
||||||
|
|
||||||
|
Definitions from the slide:
|
||||||
|
|
||||||
|
- Stack frame
|
||||||
|
- the current function call's region on the stack
|
||||||
|
- Frame pointer
|
||||||
|
- `%ebp`
|
||||||
|
- Example answer
|
||||||
|
- `loc2` is at `-8(%ebp)`
|
||||||
|
|
||||||
|
#### Notation
|
||||||
|
|
||||||
|
- `%ebp`
|
||||||
|
- a memory address stored in the frame-pointer register
|
||||||
|
- `(%ebp)`
|
||||||
|
- the value at memory address `%ebp`
|
||||||
|
- like dereferencing a pointer
|
||||||
|
|
||||||
|
The slide sequence then shows:
|
||||||
|
|
||||||
|
```asm
|
||||||
|
pushl %ebp
|
||||||
|
movl %esp, %ebp
|
||||||
|
```
|
||||||
|
|
||||||
|
- Meaning:
|
||||||
|
- first save the old frame pointer on the stack
|
||||||
|
- then set the new frame pointer to the current stack pointer
|
||||||
|
|
||||||
|
#### Returning from functions
|
||||||
|
|
||||||
|
Example caller:
|
||||||
|
|
||||||
|
```c
|
||||||
|
int main()
|
||||||
|
{
|
||||||
|
...
|
||||||
|
func("Hey", 10, -3);
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Questions from the slides:
|
||||||
|
|
||||||
|
- How do we restore `%ebp`?
|
||||||
|
- How do we resume execution at the correct place?
|
||||||
|
|
||||||
|
Slide answers:
|
||||||
|
|
||||||
|
- Push `%ebp` before locals.
|
||||||
|
- Set `%ebp` to current `%esp`.
|
||||||
|
- Set `%ebp` to `(%ebp)` at return.
|
||||||
|
- Push next `%eip` before `call`.
|
||||||
|
- Set `%eip` to `4(%ebp)` at return.
|
||||||
|
|
||||||
|
#### Stack and functions: Summary
|
||||||
|
|
||||||
|
- Calling function
|
||||||
|
- push arguments onto the stack in reverse order
|
||||||
|
- push the return address
|
||||||
|
- the address of the instruction that should run after control returns
|
||||||
|
- jump to the function's address
|
||||||
|
- Called function
|
||||||
|
- push old frame pointer `%ebp` onto the stack
|
||||||
|
- set frame pointer `%ebp` to current `%esp`
|
||||||
|
- push local variables onto the stack
|
||||||
|
- access locals as offsets from `%ebp`
|
||||||
|
- Returning function
|
||||||
|
- reset previous stack frame
|
||||||
|
- `%ebp = (%ebp)`
|
||||||
|
- jump back to return address
|
||||||
|
- `%eip = 4(%ebp)`
|
||||||
|
|
||||||
|
#### Quick overview (again)
|
||||||
|
|
||||||
|
- Buffer
|
||||||
|
- contiguous set of a given data type
|
||||||
|
- common in C
|
||||||
|
- all strings are buffers of `char`
|
||||||
|
- Overflow
|
||||||
|
- put more into the buffer than it can hold
|
||||||
|
- Question
|
||||||
|
- where does the extra data go?
|
||||||
|
- Slide answer
|
||||||
|
- now that we know memory layouts, we can reason about where the overwrite lands
|
||||||
|
|
||||||
|
#### A buffer overflow example
|
||||||
|
|
||||||
|
Example 1 from the slide:
|
||||||
|
|
||||||
|
```c
|
||||||
|
void func(char *arg1)
|
||||||
|
{
|
||||||
|
char buffer[4];
|
||||||
|
strcpy(buffer, arg1);
|
||||||
|
...
|
||||||
|
}
|
||||||
|
|
||||||
|
int main()
|
||||||
|
{
|
||||||
|
char *mystr = "AuthMe!";
|
||||||
|
func(mystr);
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Step-by-step effect shown in the slides:
|
||||||
|
|
||||||
|
- Initial stack region includes:
|
||||||
|
- `buffer`
|
||||||
|
- saved `%ebp`
|
||||||
|
- saved `%eip`
|
||||||
|
- `&arg1`
|
||||||
|
- First 4 bytes copied:
|
||||||
|
- `A u t h`
|
||||||
|
- Remaining bytes continue writing:
|
||||||
|
- `M e ! \0`
|
||||||
|
- Because `strcpy` keeps copying until it sees `\0`, bytes go past the end of the buffer.
|
||||||
|
- In the example, upon return:
|
||||||
|
- `%ebp` becomes `0x0021654d`
|
||||||
|
- Result:
|
||||||
|
- segmentation fault
|
||||||
|
- shown as `SEGFAULT (0x00216551)` in the slide sequence
|
||||||
|
|
||||||
|
#### A buffer overflow example: changing control data vs. changing program data
|
||||||
|
|
||||||
|
Example 2 from the slide:
|
||||||
|
|
||||||
|
```c
|
||||||
|
void func(char *arg1)
|
||||||
|
{
|
||||||
|
int authenticated = 0;
|
||||||
|
char buffer[4];
|
||||||
|
strcpy(buffer, arg1);
|
||||||
|
if (authenticated) { ... }
|
||||||
|
}
|
||||||
|
|
||||||
|
int main()
|
||||||
|
{
|
||||||
|
char *mystr = "AuthMe!";
|
||||||
|
func(mystr);
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Step-by-step effect shown in the slides:
|
||||||
|
|
||||||
|
- Initial stack contains:
|
||||||
|
- `buffer`
|
||||||
|
- `authenticated`
|
||||||
|
- saved `%ebp`
|
||||||
|
- saved `%eip`
|
||||||
|
- `&arg1`
|
||||||
|
- Overflow writes:
|
||||||
|
- `A u t h` into `buffer`
|
||||||
|
- `M e ! \0` into `authenticated`
|
||||||
|
- Result:
|
||||||
|
- code still runs
|
||||||
|
- user now appears "authenticated"
|
||||||
|
|
||||||
|
Important lesson:
|
||||||
|
- A buffer overflow does not need to crash.
|
||||||
|
- It may silently change program data or logic.
|
||||||
|
|
||||||
|
#### `gets` vs `fgets`
|
||||||
|
|
||||||
|
Unsafe function shown in the slide:
|
||||||
|
|
||||||
|
```c
|
||||||
|
void vulnerable()
|
||||||
|
{
|
||||||
|
char buf[80];
|
||||||
|
gets(buf);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Safer version shown in the slide:
|
||||||
|
|
||||||
|
```c
|
||||||
|
void safe()
|
||||||
|
{
|
||||||
|
char buf[80];
|
||||||
|
fgets(buf, 64, stdin);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Even safer pattern from the next slide:
|
||||||
|
|
||||||
|
```c
|
||||||
|
void safer()
|
||||||
|
{
|
||||||
|
char buf[80];
|
||||||
|
fgets(buf, sizeof(buf), stdin);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Reference from slide:
|
||||||
|
- [List of vulnerable C functions](https://security.web.cern.ch/security/recommendations/en/codetools/c.shtml)
|
||||||
|
|
||||||
|
#### User-supplied strings
|
||||||
|
|
||||||
|
- In the toy examples, the strings are constant.
|
||||||
|
- In reality they come from users in many ways:
|
||||||
|
- text input
|
||||||
|
- packets
|
||||||
|
- environment variables
|
||||||
|
- file input
|
||||||
|
- Validating assumptions about user input is extremely important.
|
||||||
|
|
||||||
|
#### What's the worst that could happen?
|
||||||
|
|
||||||
|
Using:
|
||||||
|
|
||||||
|
```c
|
||||||
|
char buffer[4];
|
||||||
|
strcpy(buffer, arg1);
|
||||||
|
```
|
||||||
|
|
||||||
|
- `strcpy` will let you write as much as you want until a `\0`.
|
||||||
|
- If attacker-controlled input is long enough, the memory past the buffer becomes "all ours" from the attacker's perspective.
|
||||||
|
- That raises the key question from the slide:
|
||||||
|
- what could you write to memory to wreak havoc?
|
||||||
|
|
||||||
|
#### Code injection
|
||||||
|
|
||||||
|
- Title-only transition slide.
|
||||||
|
- It introduces the move from accidental overwrite to deliberate attacker payloads.
|
||||||
|
|
||||||
|
#### High-level idea
|
||||||
|
|
||||||
|
Example used in the slide:
|
||||||
|
|
||||||
|
```c
|
||||||
|
void func(char *arg1)
|
||||||
|
{
|
||||||
|
char buffer[4];
|
||||||
|
sprintf(buffer, arg1);
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Two-step plan shown in the slides:
|
||||||
|
|
||||||
|
- 1. Load my own code into memory.
|
||||||
|
- 2. Somehow get `%eip` to point to it.
|
||||||
|
|
||||||
|
The slide sequence draws this as:
|
||||||
|
- vulnerable buffer on stack
|
||||||
|
- attacker-controlled bytes placed in memory
|
||||||
|
- `%eip` redirected toward those bytes
|
||||||
|
|
||||||
|
#### This is nontrivial
|
||||||
|
|
||||||
|
- Pulling off this attack requires getting a few things really right, and some things only sorta right.
|
||||||
|
- The lecture says to think about what is tricky about the attack.
|
||||||
|
- Main security idea:
|
||||||
|
- the key to defending it is to make the hard parts really hard
|
||||||
|
|
||||||
|
#### Challenge 1: Loading code into memory
|
||||||
|
|
||||||
|
- The attacker payload must be machine-code instructions.
|
||||||
|
- already compiled
|
||||||
|
- ready to run
|
||||||
|
- We have to be careful in how we construct it.
|
||||||
|
- It cannot contain all-zero bytes.
|
||||||
|
- otherwise `sprintf`, `gets`, `scanf`, and similar routines stop copying
|
||||||
|
- It cannot make use of the loader.
|
||||||
|
- because we are injecting the bytes directly
|
||||||
|
- It cannot use the stack.
|
||||||
|
- because we are in the process of smashing it
|
||||||
|
- The lecture then gives the name:
|
||||||
|
- shellcode
|
||||||
|
|
||||||
|
#### What kind of code would we want to run?
|
||||||
|
|
||||||
|
- Goal: full-purpose shell
|
||||||
|
- code to launch a shell is called shellcode
|
||||||
|
- it is nontrivial to write shellcode that works as injected code
|
||||||
|
- no zeroes
|
||||||
|
- cannot use the stack
|
||||||
|
- no loader dependence
|
||||||
|
- there are many shellcodes already written
|
||||||
|
- there are even competitions for writing the smallest shellcode
|
||||||
|
- Goal: privilege escalation
|
||||||
|
- ideally, attacker goes from guest or non-user to root
|
||||||
|
|
||||||
|
#### Shellcode
|
||||||
|
|
||||||
|
High-level C version shown in the slides:
|
||||||
|
|
||||||
|
```c
|
||||||
|
#include <stdio.h>
|
||||||
|
int main() {
|
||||||
|
char *name[2];
|
||||||
|
name[0] = "/bin/sh";
|
||||||
|
name[1] = NULL;
|
||||||
|
execve(name[0], name, NULL);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Assembly version shown in the slides:
|
||||||
|
|
||||||
|
```asm
|
||||||
|
xorl %eax, %eax
|
||||||
|
pushl %eax
|
||||||
|
pushl $0x68732f2f
|
||||||
|
pushl $0x6e69622f
|
||||||
|
movl %esp, %ebx
|
||||||
|
pushl %eax
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Machine-code bytes shown in the slides:
|
||||||
|
|
||||||
|
```text
|
||||||
|
"\x31\xc0"
|
||||||
|
"\x50"
|
||||||
|
"\x68""//sh"
|
||||||
|
"\x68""/bin"
|
||||||
|
"\x89\xe3"
|
||||||
|
"\x50"
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Important point from the slide:
|
||||||
|
- those machine-code bytes can become part of the attacker's input
|
||||||
|
|
||||||
|
#### Challenge 2: Getting our injected code to run
|
||||||
|
|
||||||
|
- We cannot insert a fresh "jump into my code" instruction.
|
||||||
|
- We must use whatever code is already running.
|
||||||
|
|
||||||
|
#### Hijacking the saved `%eip`
|
||||||
|
|
||||||
|
- Strategy:
|
||||||
|
- overwrite the saved return address
|
||||||
|
- make it point into the injected bytes
|
||||||
|
- Core idea:
|
||||||
|
- when the function returns, the CPU loads the overwritten return address into `%eip`
|
||||||
|
|
||||||
|
Question raised by the slides:
|
||||||
|
- But how do we know the address?
|
||||||
|
|
||||||
|
Failure mode shown in the slide sequence:
|
||||||
|
- if the guessed address is wrong, the CPU tries to execute data bytes
|
||||||
|
- this is most likely not valid code
|
||||||
|
- result:
|
||||||
|
- invalid instruction
|
||||||
|
- CPU "panic" / crash
|
||||||
|
|
||||||
|
#### Challenge 3: Finding the return address
|
||||||
|
|
||||||
|
- If we do not have the code, we may not know how far the buffer is from the saved `%ebp`.
|
||||||
|
- One approach:
|
||||||
|
- try many different values
|
||||||
|
- Worst case:
|
||||||
|
- `2^32` possible addresses on `32-bit`
|
||||||
|
- `2^64` possible addresses on `64-bit`
|
||||||
|
- But without address randomization:
|
||||||
|
- the stack always starts from the same fixed address
|
||||||
|
- the stack grows, but usually not very deeply unless heavily recursive
|
||||||
|
|
||||||
|
#### Improving our chances: nop sleds
|
||||||
|
|
||||||
|
- `nop` is a single-byte instruction.
|
||||||
|
- Definition:
|
||||||
|
- it does nothing except move execution to the next instruction
|
||||||
|
- NOP sled idea:
|
||||||
|
- put a long sequence of `nop` bytes before the real malicious code
|
||||||
|
- now jumping anywhere in that region still works
|
||||||
|
- execution slides down into the payload
|
||||||
|
|
||||||
|
Why this helps:
|
||||||
|
- it increases the chance that an approximate address guess still succeeds
|
||||||
|
- the slides explicitly state:
|
||||||
|
- now we improve our chances of guessing by a factor of `#nops`
|
||||||
|
|
||||||
|
```text
|
||||||
|
[padding][saved return address guess][nop nop nop ...][malicious code]
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Putting it all together
|
||||||
|
|
||||||
|
- Payload components shown in the slides:
|
||||||
|
- padding
|
||||||
|
- guessed return address
|
||||||
|
- NOP sled
|
||||||
|
- malicious code
|
||||||
|
- Constraint noted by the lecture:
|
||||||
|
- input has to start wherever the vulnerable `gets` / similar function begins writing
|
||||||
|
|
||||||
|
#### Buffer overflow defense #1: use secure bounds-checking functions
|
||||||
|
|
||||||
|
- User-level protection
|
||||||
|
- Replace unbounded routines with bounded ones.
|
||||||
|
- Prefer secure languages where possible:
|
||||||
|
- Java
|
||||||
|
- Rust
|
||||||
|
- etc.
|
||||||
|
|
||||||
|
#### Buffer overflow defense #2: Address Space Layout Randomization (ASLR)
|
||||||
|
|
||||||
|
- Randomize starting address of program regions.
|
||||||
|
- Goal:
|
||||||
|
- prevent attacker from guessing / finding the correct address to put in the return-address slot
|
||||||
|
- OS-level protection
|
||||||
|
|
||||||
|
#### Buffer overflow counter-technique: NOP sled
|
||||||
|
|
||||||
|
- Counter-technique against uncertain addresses
|
||||||
|
- By jumping somewhere into a wide sled, exact address knowledge becomes less necessary
|
||||||
|
|
||||||
|
#### Buffer overflow defense #3: Canary
|
||||||
|
|
||||||
|
- Put a guard value between vulnerable local data and control-flow data.
|
||||||
|
- If overflow changes the canary, the program can detect corruption before returning.
|
||||||
|
- OS-level / compiler-assisted protection in the lecture framing
|
||||||
|
|
||||||
|
#### Buffer overflow defense #4: No-execute bits (NX)
|
||||||
|
|
||||||
|
- Mark the stack as not executable.
|
||||||
|
- Requires hardware support.
|
||||||
|
- OS / hardware-level protection
|
||||||
|
|
||||||
|
#### Buffer overflow counter-technique: ret-to-libc and ROP
|
||||||
|
|
||||||
|
- Code in the C library is already stored at consistent addresses.
|
||||||
|
- Attacker can find code in the C library that has the desired effect.
|
||||||
|
- possibly heavily fragmented
|
||||||
|
- Then return to the necessary address or addresses in the proper order.
|
||||||
|
- This is the motivation behind:
|
||||||
|
- `ret-to-libc`
|
||||||
|
- Return-Oriented Programming (ROP)
|
||||||
|
|
||||||
|
We will continue from defenses / exploitation follow-ups in the next lecture.
|
||||||
@@ -20,5 +20,7 @@ export default {
|
|||||||
CSE4303_L13: "Introduction to Computer Security (Lecture 13)",
|
CSE4303_L13: "Introduction to Computer Security (Lecture 13)",
|
||||||
CSE4303_L14: "Introduction to Computer Security (Lecture 14)",
|
CSE4303_L14: "Introduction to Computer Security (Lecture 14)",
|
||||||
CSE4303_L15: "Introduction to Computer Security (Lecture 15)",
|
CSE4303_L15: "Introduction to Computer Security (Lecture 15)",
|
||||||
CSE4303_L16: "Introduction to Computer Security (Lecture 16)"
|
CSE4303_L16: "Introduction to Computer Security (Lecture 16)",
|
||||||
|
CSE4303_L17: "Introduction to Computer Security (Lecture 17)",
|
||||||
|
CSE4303_L18: "Introduction to Computer Security (Lecture 18)"
|
||||||
}
|
}
|
||||||
|
|||||||
Reference in New Issue
Block a user