# CSE4303 Introduction to Computer Security (Lecture 17) > Due to lack of my attention, this lecture note is generated by AI to create continuations of the previous lecture note. I kept this warning because the note was created by AI. #### Software security ### Administrative notes #### Project details - Project plan - Thursday, `4/9` at the end of class - `5%` - Written document and presentation recording - Thursday, `4/30` at `11:30 AM` - `15%` - View peer presentations and provide feedback - Wednesday, `5/6` at `11:59 PM` - `5%` #### Upcoming schedule - This week (`3/20`) - software security lecture - studio - some time for studio on Tuesday - Next week (`4/6`) - fuzzing - some time to discuss project ideas - `4/13` - Web security - `4/20` - Privacy and ethics overview - time to work on projects - course wrap-up ### Overview #### Outline - Context - Prominent software vulnerabilities and exploits - Buffer overflows - Background: C code, compilation, memory layout, execution - Baseline exploit - Challenges - Defenses, countermeasures, counter-countermeasures Sources: - SEED lab book - Gilbert/Tamassia book - Slides from Bryant/O'Hallaron (CMU), Dan Boneh (Stanford), Michael Hicks (UMD) ### Context #### Context: computing stack (informal) | Layer | Example | | --- | --- | | Application | web server, standalone app | | Compiler / assembler | `gcc`, `clang` | | OS: syscalls | `execve()`, `setuid()`, `write()`, `open()`, `fork()` | | OS: processes, mem layout | Linux virtual memory layout | | Architecture (ISA, execution) | x86, x86_64, ARM | | Hardware | Intel Sky Lake processor | - User control is strongest near the application / compiler level. - System control becomes more important as we move down toward OS, architecture, and hardware. ### Prominent software vulnerabilities and exploits #### Software security: categories - Race conditions - Privilege escalation - Path traversal - Environment variable modification - Language-specific vulnerabilities - Format string attack - Buffer overflows #### Buffer Overflows (BoFs) - A buffer overflow is a bug that affects low-level code, typically in C and C++, with significant security implications. - Normally, a program with this bug will simply crash. - But an attacker can alter the situations that cause the program to do much worse. - Steal private information - e.g. Heartbleed - Corrupt valuable information - Run code of the attacker's choice #### Application behavior - Slide contains a figure only. - Intended point: normal application behavior can become attacker-controlled if input handling is unsafe. #### BoFs: why do we care? - Reference from slide: [IEEE Spectrum top programming languages 2025](https://spectrum.ieee.org/top-programming-languages-2025) #### Critical systems in C/C++ - Most OS kernels and utilities - `fingerd` - X windows server - shell - Many high-performance servers - Microsoft IIS - Apache `httpd` - `nginx` - Microsoft SQL Server - MySQL - `redis` - `memcached` - Many embedded systems - Mars rover - industrial control systems - automobiles A successful attack on these systems can be particularly dangerous. #### Morris Worm - Slide contains a figure / historical reference only. - It is included as an example of how memory-corruption vulnerabilities mattered in practice. #### Why do we still care? - The slide references the NVD search page: [NVD vulnerability search](https://nvd.nist.gov/vuln/search) - Why the drop? - Memory-safe languages - Rust - Go - Stronger defenses - Fuzzing - find bugs before release - Change in development practices - code review - static analysis tools - related engineering improvements #### MITRE Top 25 2025 - Reference from slide: [MITRE CWE Top 25](http://cwe.mitre.org/top25/) ### Buffer overflows #### Outline - System Basics - Application memory layout - How does function call work under the hood - `32-bit x86` only - `64-bit x86_64` similar, but with important differences - Buffer overflow - Overwriting the return address pointer - Point it to shell code injected #### Buffer Overflows (BoFs) - 2-minute version first, then all background / full version #### Process memory layout: virtual address space - Slide reference: [virtual address space reference](https://hungys.xyz/unix-prog-process-environment/) #### Process memory layout: function calls - Slide reference: [Tenouk function call figure 1](http://www.tenouk.com/Bufferoverflowc/Bufferoverflow2.html) - Slide reference: [Tenouk function call figure 2](http://www.tenouk.com/Bufferoverflowc/Bufferoverflow4.html) #### Process memory layout: compromised frame - Slide reference: [Tenouk compromised frame figure](http://www.tenouk.com/Bufferoverflowc/Bufferoverflow4.html) #### Computer System High-level examples used in the slide: ```c car *c = malloc(sizeof(car)); c->miles = 100; c->gals = 17; float mpg = get_mpg(c); free(c); ``` ```java Car c = new Car(); c.setMiles(100); c.setGals(17); float mpg = c.getMPG(); ``` Assembly-language example used in the slide: ```asm get_mpg: pushq %rbp movq %rsp, %rbp ... popq %rbp ret ``` - The same computation can be viewed at multiple levels: - C / Java source - assembly language - machine code - operating system context #### Little Theme 1: Representation - All digital systems represent everything as `0`s and `1`s. - The `0` and `1` are really two different voltage ranges in wires. - Or magnetic positions on a disk, hole depths on a DVD, or even DNA. - "Everything" includes: - numbers - integers and floating point - characters - building blocks of strings - instructions - directives to the CPU that make up a program - pointers - addresses of data objects stored in memory - These encodings are stored throughout the computer system. - registers - caches - memories - disks - They all need addresses. - find an item - find a place for a new item - reclaim memory when data is no longer needed #### Little Theme 2: Translation - There is a big gap between how we think about programs / data and the `0`s and `1`s of computers. - We need languages to describe what we mean. - These languages must be translated one level at a time. - Example point from the slide: - we know Java as a programming language - but we must work down to the `0`s and `1`s of computers - we try not to lose anything in translation - we encounter Java bytecode, C, assembly, and machine code #### Little Theme 3: Control Flow - How do computers orchestrate everything they are doing? - Within one program: - How are `if/else`, loops, and switches implemented? - How do we track nested procedure calls? - How do we know what to do upon `return`? - At the operating-system level: - library loading - sharing system resources - memory - I/O - disks #### HW/SW Interface: Code / Compile / Run Times - Code time - user program in C - `.c` file - Compile time - C compiler - assembler - Run time - executable `.exe` file - hardware executes it - Note from slide: - the compiler and assembler are themselves just programs developed using this same process #### Assembly Programmer's View - Programmer-visible CPU / memory state - Program counter - address of next instruction - called `RIP` in x86-64 - Named registers - heavily used program data - together called the register file - Condition codes - store status information about most recent arithmetic operation - used for conditional branching - Memory - byte-addressable array - contains code and user data - includes the stack for supporting procedures #### Turning C into Object Code - Code in files `p1.c` and `p2.c` - Compile with: ```bash gcc -Og p1.c p2.c -o p ``` - Notes from the slide - `-Og` uses basic optimizations - resulting machine code goes into file `p` - Translation chain - C program -> assembly program -> object program -> executable program - Associated tools - compiler - assembler - linker - static libraries (`.a`) #### Machine Instruction Example - C code ```c *dest = t; ``` - Meaning - store value `t` where designated by `dest` - Assembly ```asm movq %rsi, (%rdx) ``` - Interpretation - move 8-byte value to memory - operands - `t` is in register `%rsi` - `dest` is in register `%rdx` - `*dest` means memory `M[%rdx]` - Object code ```text 0x400539: 48 89 32 ``` - It is a 3-byte instruction stored at address `0x400539`. #### IA32 Registers - 32 bits wide - General-purpose register families shown in the slide - `%eax`, `%ax`, `%ah`, `%al` - `%ecx`, `%cx`, `%ch`, `%cl` - `%edx`, `%dx`, `%dh`, `%dl` - `%ebx`, `%bx`, `%bh`, `%bl` - `%esi`, `%si` - `%edi`, `%di` - `%esp`, `%sp` - `%ebp`, `%bp` - Roles highlighted in the slide - accumulate - counter - data - base - source index - destination index - stack pointer - base pointer #### Data Sizes - Slide is primarily a figure summarizing common integer widths and sizes. #### Assembly Data Types - "Integer" data of `1`, `2`, `4`, or `8` bytes - data values - addresses / untyped pointers - No aggregate types such as arrays or structures at the assembly level - just contiguous bytes in memory - Two common syntaxes - `AT&T` - used in the course, slides, textbook, GNU tools - `Intel` - used in Intel documentation and Intel tools - Need to know which syntax you are reading because operand order may be reversed. #### Three Basic Kinds of Instructions - Transfer data between memory and register - load - `%reg = Mem[address]` - store - `Mem[address] = %reg` - Perform arithmetic on register or memory data - examples: addition, shifting, bitwise operations - Control flow - unconditional jumps to / from procedures - conditional branches #### Abstract Memory Layout ```text High addresses Stack <- local variables, procedure context Dynamic Data <- heap, new / malloc Static Data <- globals / static variables Literals <- large constants such as strings Instructions Low addresses ``` #### The ELF File Format - ELF = Executable and Linkable Format - One of the most widely used binary object formats - ELF is architecture-independent - ELF file types - Relocatable - must be fixed by the linker before execution - Executable - ready for execution - Shared - shared libraries with linking information - Core - core dumps created when a program terminates with a fault - Tools mentioned on slide - `readelf` - `file` - `objdump -D` #### Process Memory Layout (32-bit x86 machine) - This slide is primarily a diagram. - Key idea: a `32-bit x86` process has a standard virtual memory layout with code, static data, heap, and stack arranged in distinct regions. We continue with the concrete runtime layout and the actual overflow mechanics in Lecture 18.