Files
NoteNextra-origin/content/CSE4303/CSE4303_L17.md
Zheyuan Wu d6bc8375ce
Some checks failed
Sync from Gitea (main→main, keep workflow) / mirror (push) Has been cancelled
update
2026-04-02 15:17:50 -05:00

431 lines
11 KiB
Markdown

# CSE4303 Introduction to Computer Security (Lecture 17)
> Due to lack of my attention, this lecture note is generated by AI to create continuations of the previous lecture note. I kept this warning because the note was created by AI.
#### Software security
### Administrative notes
#### Project details
- Project plan
- Thursday, `4/9` at the end of class
- `5%`
- Written document and presentation recording
- Thursday, `4/30` at `11:30 AM`
- `15%`
- View peer presentations and provide feedback
- Wednesday, `5/6` at `11:59 PM`
- `5%`
#### Upcoming schedule
- This week (`3/20`)
- software security lecture
- studio
- some time for studio on Tuesday
- Next week (`4/6`)
- fuzzing
- some time to discuss project ideas
- `4/13`
- Web security
- `4/20`
- Privacy and ethics overview
- time to work on projects
- course wrap-up
### Overview
#### Outline
- Context
- Prominent software vulnerabilities and exploits
- Buffer overflows
- Background: C code, compilation, memory layout, execution
- Baseline exploit
- Challenges
- Defenses, countermeasures, counter-countermeasures
Sources:
- SEED lab book
- Gilbert/Tamassia book
- Slides from Bryant/O'Hallaron (CMU), Dan Boneh (Stanford), Michael Hicks (UMD)
### Context
#### Context: computing stack (informal)
| Layer | Example |
| --- | --- |
| Application | web server, standalone app |
| Compiler / assembler | `gcc`, `clang` |
| OS: syscalls | `execve()`, `setuid()`, `write()`, `open()`, `fork()` |
| OS: processes, mem layout | Linux virtual memory layout |
| Architecture (ISA, execution) | x86, x86_64, ARM |
| Hardware | Intel Sky Lake processor |
- User control is strongest near the application / compiler level.
- System control becomes more important as we move down toward OS, architecture, and hardware.
### Prominent software vulnerabilities and exploits
#### Software security: categories
- Race conditions
- Privilege escalation
- Path traversal
- Environment variable modification
- Language-specific vulnerabilities
- Format string attack
- Buffer overflows
#### Buffer Overflows (BoFs)
- A buffer overflow is a bug that affects low-level code, typically in C and C++, with significant security implications.
- Normally, a program with this bug will simply crash.
- But an attacker can alter the situations that cause the program to do much worse.
- Steal private information
- e.g. Heartbleed
- Corrupt valuable information
- Run code of the attacker's choice
#### Application behavior
- Slide contains a figure only.
- Intended point: normal application behavior can become attacker-controlled if input handling is unsafe.
#### BoFs: why do we care?
- Reference from slide: [IEEE Spectrum top programming languages 2025](https://spectrum.ieee.org/top-programming-languages-2025)
#### Critical systems in C/C++
- Most OS kernels and utilities
- `fingerd`
- X windows server
- shell
- Many high-performance servers
- Microsoft IIS
- Apache `httpd`
- `nginx`
- Microsoft SQL Server
- MySQL
- `redis`
- `memcached`
- Many embedded systems
- Mars rover
- industrial control systems
- automobiles
A successful attack on these systems can be particularly dangerous.
#### Morris Worm
- Slide contains a figure / historical reference only.
- It is included as an example of how memory-corruption vulnerabilities mattered in practice.
#### Why do we still care?
- The slide references the NVD search page: [NVD vulnerability search](https://nvd.nist.gov/vuln/search)
- Why the drop?
- Memory-safe languages
- Rust
- Go
- Stronger defenses
- Fuzzing
- find bugs before release
- Change in development practices
- code review
- static analysis tools
- related engineering improvements
#### MITRE Top 25 2025
- Reference from slide: [MITRE CWE Top 25](http://cwe.mitre.org/top25/)
### Buffer overflows
#### Outline
- System Basics
- Application memory layout
- How does function call work under the hood
- `32-bit x86` only
- `64-bit x86_64` similar, but with important differences
- Buffer overflow
- Overwriting the return address pointer
- Point it to shell code injected
#### Buffer Overflows (BoFs)
- 2-minute version first, then all background / full version
#### Process memory layout: virtual address space
- Slide reference: [virtual address space reference](https://hungys.xyz/unix-prog-process-environment/)
#### Process memory layout: function calls
- Slide reference: [Tenouk function call figure 1](http://www.tenouk.com/Bufferoverflowc/Bufferoverflow2.html)
- Slide reference: [Tenouk function call figure 2](http://www.tenouk.com/Bufferoverflowc/Bufferoverflow4.html)
#### Process memory layout: compromised frame
- Slide reference: [Tenouk compromised frame figure](http://www.tenouk.com/Bufferoverflowc/Bufferoverflow4.html)
#### Computer System
High-level examples used in the slide:
```c
car *c = malloc(sizeof(car));
c->miles = 100;
c->gals = 17;
float mpg = get_mpg(c);
free(c);
```
```java
Car c = new Car();
c.setMiles(100);
c.setGals(17);
float mpg = c.getMPG();
```
Assembly-language example used in the slide:
```asm
get_mpg:
pushq %rbp
movq %rsp, %rbp
...
popq %rbp
ret
```
- The same computation can be viewed at multiple levels:
- C / Java source
- assembly language
- machine code
- operating system context
#### Little Theme 1: Representation
- All digital systems represent everything as `0`s and `1`s.
- The `0` and `1` are really two different voltage ranges in wires.
- Or magnetic positions on a disk, hole depths on a DVD, or even DNA.
- "Everything" includes:
- numbers
- integers and floating point
- characters
- building blocks of strings
- instructions
- directives to the CPU that make up a program
- pointers
- addresses of data objects stored in memory
- These encodings are stored throughout the computer system.
- registers
- caches
- memories
- disks
- They all need addresses.
- find an item
- find a place for a new item
- reclaim memory when data is no longer needed
#### Little Theme 2: Translation
- There is a big gap between how we think about programs / data and the `0`s and `1`s of computers.
- We need languages to describe what we mean.
- These languages must be translated one level at a time.
- Example point from the slide:
- we know Java as a programming language
- but we must work down to the `0`s and `1`s of computers
- we try not to lose anything in translation
- we encounter Java bytecode, C, assembly, and machine code
#### Little Theme 3: Control Flow
- How do computers orchestrate everything they are doing?
- Within one program:
- How are `if/else`, loops, and switches implemented?
- How do we track nested procedure calls?
- How do we know what to do upon `return`?
- At the operating-system level:
- library loading
- sharing system resources
- memory
- I/O
- disks
#### HW/SW Interface: Code / Compile / Run Times
- Code time
- user program in C
- `.c` file
- Compile time
- C compiler
- assembler
- Run time
- executable `.exe` file
- hardware executes it
- Note from slide:
- the compiler and assembler are themselves just programs developed using this same process
#### Assembly Programmer's View
- Programmer-visible CPU / memory state
- Program counter
- address of next instruction
- called `RIP` in x86-64
- Named registers
- heavily used program data
- together called the register file
- Condition codes
- store status information about most recent arithmetic operation
- used for conditional branching
- Memory
- byte-addressable array
- contains code and user data
- includes the stack for supporting procedures
#### Turning C into Object Code
- Code in files `p1.c` and `p2.c`
- Compile with:
```bash
gcc -Og p1.c p2.c -o p
```
- Notes from the slide
- `-Og` uses basic optimizations
- resulting machine code goes into file `p`
- Translation chain
- C program -> assembly program -> object program -> executable program
- Associated tools
- compiler
- assembler
- linker
- static libraries (`.a`)
#### Machine Instruction Example
- C code
```c
*dest = t;
```
- Meaning
- store value `t` where designated by `dest`
- Assembly
```asm
movq %rsi, (%rdx)
```
- Interpretation
- move 8-byte value to memory
- operands
- `t` is in register `%rsi`
- `dest` is in register `%rdx`
- `*dest` means memory `M[%rdx]`
- Object code
```text
0x400539: 48 89 32
```
- It is a 3-byte instruction stored at address `0x400539`.
#### IA32 Registers - 32 bits wide
- General-purpose register families shown in the slide
- `%eax`, `%ax`, `%ah`, `%al`
- `%ecx`, `%cx`, `%ch`, `%cl`
- `%edx`, `%dx`, `%dh`, `%dl`
- `%ebx`, `%bx`, `%bh`, `%bl`
- `%esi`, `%si`
- `%edi`, `%di`
- `%esp`, `%sp`
- `%ebp`, `%bp`
- Roles highlighted in the slide
- accumulate
- counter
- data
- base
- source index
- destination index
- stack pointer
- base pointer
#### Data Sizes
- Slide is primarily a figure summarizing common integer widths and sizes.
#### Assembly Data Types
- "Integer" data of `1`, `2`, `4`, or `8` bytes
- data values
- addresses / untyped pointers
- No aggregate types such as arrays or structures at the assembly level
- just contiguous bytes in memory
- Two common syntaxes
- `AT&T`
- used in the course, slides, textbook, GNU tools
- `Intel`
- used in Intel documentation and Intel tools
- Need to know which syntax you are reading because operand order may be reversed.
#### Three Basic Kinds of Instructions
- Transfer data between memory and register
- load
- `%reg = Mem[address]`
- store
- `Mem[address] = %reg`
- Perform arithmetic on register or memory data
- examples: addition, shifting, bitwise operations
- Control flow
- unconditional jumps to / from procedures
- conditional branches
#### Abstract Memory Layout
```text
High addresses
Stack <- local variables, procedure context
Dynamic Data <- heap, new / malloc
Static Data <- globals / static variables
Literals <- large constants such as strings
Instructions
Low addresses
```
#### The ELF File Format
- ELF = Executable and Linkable Format
- One of the most widely used binary object formats
- ELF is architecture-independent
- ELF file types
- Relocatable
- must be fixed by the linker before execution
- Executable
- ready for execution
- Shared
- shared libraries with linking information
- Core
- core dumps created when a program terminates with a fault
- Tools mentioned on slide
- `readelf`
- `file`
- `objdump -D`
#### Process Memory Layout (32-bit x86 machine)
- This slide is primarily a diagram.
- Key idea: a `32-bit x86` process has a standard virtual memory layout with code, static data, heap, and stack arranged in distinct regions.
We continue with the concrete runtime layout and the actual overflow mechanics in Lecture 18.