Files
NoteNextra-origin/content/CSE4303/CSE4303_L17.md
Zheyuan Wu d6bc8375ce
Some checks failed
Sync from Gitea (main→main, keep workflow) / mirror (push) Has been cancelled
update
2026-04-02 15:17:50 -05:00

11 KiB

CSE4303 Introduction to Computer Security (Lecture 17)

Due to lack of my attention, this lecture note is generated by AI to create continuations of the previous lecture note. I kept this warning because the note was created by AI.

Software security

Administrative notes

Project details

  • Project plan
    • Thursday, 4/9 at the end of class
    • 5%
  • Written document and presentation recording
    • Thursday, 4/30 at 11:30 AM
    • 15%
  • View peer presentations and provide feedback
    • Wednesday, 5/6 at 11:59 PM
    • 5%

Upcoming schedule

  • This week (3/20)
    • software security lecture
    • studio
    • some time for studio on Tuesday
  • Next week (4/6)
    • fuzzing
    • some time to discuss project ideas
  • 4/13
    • Web security
  • 4/20
    • Privacy and ethics overview
    • time to work on projects
    • course wrap-up

Overview

Outline

  • Context
  • Prominent software vulnerabilities and exploits
  • Buffer overflows
    • Background: C code, compilation, memory layout, execution
    • Baseline exploit
    • Challenges
    • Defenses, countermeasures, counter-countermeasures

Sources:

  • SEED lab book
  • Gilbert/Tamassia book
  • Slides from Bryant/O'Hallaron (CMU), Dan Boneh (Stanford), Michael Hicks (UMD)

Context

Context: computing stack (informal)

Layer Example
Application web server, standalone app
Compiler / assembler gcc, clang
OS: syscalls execve(), setuid(), write(), open(), fork()
OS: processes, mem layout Linux virtual memory layout
Architecture (ISA, execution) x86, x86_64, ARM
Hardware Intel Sky Lake processor
  • User control is strongest near the application / compiler level.
  • System control becomes more important as we move down toward OS, architecture, and hardware.

Prominent software vulnerabilities and exploits

Software security: categories

  • Race conditions
  • Privilege escalation
  • Path traversal
  • Environment variable modification
  • Language-specific vulnerabilities
    • Format string attack
    • Buffer overflows

Buffer Overflows (BoFs)

  • A buffer overflow is a bug that affects low-level code, typically in C and C++, with significant security implications.
  • Normally, a program with this bug will simply crash.
  • But an attacker can alter the situations that cause the program to do much worse.
    • Steal private information
      • e.g. Heartbleed
    • Corrupt valuable information
    • Run code of the attacker's choice

Application behavior

  • Slide contains a figure only.
  • Intended point: normal application behavior can become attacker-controlled if input handling is unsafe.

BoFs: why do we care?

Critical systems in C/C++

  • Most OS kernels and utilities
    • fingerd
    • X windows server
    • shell
  • Many high-performance servers
    • Microsoft IIS
    • Apache httpd
    • nginx
    • Microsoft SQL Server
    • MySQL
    • redis
    • memcached
  • Many embedded systems
    • Mars rover
    • industrial control systems
    • automobiles

A successful attack on these systems can be particularly dangerous.

Morris Worm

  • Slide contains a figure / historical reference only.
  • It is included as an example of how memory-corruption vulnerabilities mattered in practice.

Why do we still care?

  • The slide references the NVD search page: NVD vulnerability search
  • Why the drop?
    • Memory-safe languages
      • Rust
      • Go
    • Stronger defenses
    • Fuzzing
      • find bugs before release
    • Change in development practices
      • code review
      • static analysis tools
      • related engineering improvements

MITRE Top 25 2025

Buffer overflows

Outline

  • System Basics
    • Application memory layout
    • How does function call work under the hood
      • 32-bit x86 only
      • 64-bit x86_64 similar, but with important differences
  • Buffer overflow
    • Overwriting the return address pointer
    • Point it to shell code injected

Buffer Overflows (BoFs)

  • 2-minute version first, then all background / full version

Process memory layout: virtual address space

Process memory layout: function calls

Process memory layout: compromised frame

Computer System

High-level examples used in the slide:

car *c = malloc(sizeof(car));
c->miles = 100;
c->gals = 17;
float mpg = get_mpg(c);
free(c);
Car c = new Car();
c.setMiles(100);
c.setGals(17);
float mpg = c.getMPG();

Assembly-language example used in the slide:

get_mpg:
    pushq   %rbp
    movq    %rsp, %rbp
    ...
    popq    %rbp
    ret
  • The same computation can be viewed at multiple levels:
    • C / Java source
    • assembly language
    • machine code
    • operating system context

Little Theme 1: Representation

  • All digital systems represent everything as 0s and 1s.
    • The 0 and 1 are really two different voltage ranges in wires.
    • Or magnetic positions on a disk, hole depths on a DVD, or even DNA.
  • "Everything" includes:
    • numbers
      • integers and floating point
    • characters
      • building blocks of strings
    • instructions
      • directives to the CPU that make up a program
    • pointers
      • addresses of data objects stored in memory
  • These encodings are stored throughout the computer system.
    • registers
    • caches
    • memories
    • disks
  • They all need addresses.
    • find an item
    • find a place for a new item
    • reclaim memory when data is no longer needed

Little Theme 2: Translation

  • There is a big gap between how we think about programs / data and the 0s and 1s of computers.
  • We need languages to describe what we mean.
  • These languages must be translated one level at a time.
  • Example point from the slide:
    • we know Java as a programming language
    • but we must work down to the 0s and 1s of computers
    • we try not to lose anything in translation
    • we encounter Java bytecode, C, assembly, and machine code

Little Theme 3: Control Flow

  • How do computers orchestrate everything they are doing?
  • Within one program:
    • How are if/else, loops, and switches implemented?
    • How do we track nested procedure calls?
    • How do we know what to do upon return?
  • At the operating-system level:
    • library loading
    • sharing system resources
      • memory
      • I/O
      • disks

HW/SW Interface: Code / Compile / Run Times

  • Code time
    • user program in C
    • .c file
  • Compile time
    • C compiler
    • assembler
  • Run time
    • executable .exe file
    • hardware executes it
  • Note from slide:
    • the compiler and assembler are themselves just programs developed using this same process

Assembly Programmer's View

  • Programmer-visible CPU / memory state
    • Program counter
      • address of next instruction
      • called RIP in x86-64
    • Named registers
      • heavily used program data
      • together called the register file
    • Condition codes
      • store status information about most recent arithmetic operation
      • used for conditional branching
  • Memory
    • byte-addressable array
    • contains code and user data
    • includes the stack for supporting procedures

Turning C into Object Code

  • Code in files p1.c and p2.c
  • Compile with:
gcc -Og p1.c p2.c -o p
  • Notes from the slide
    • -Og uses basic optimizations
    • resulting machine code goes into file p
  • Translation chain
    • C program -> assembly program -> object program -> executable program
  • Associated tools
    • compiler
    • assembler
    • linker
    • static libraries (.a)

Machine Instruction Example

  • C code
*dest = t;
  • Meaning
    • store value t where designated by dest
  • Assembly
movq %rsi, (%rdx)
  • Interpretation
    • move 8-byte value to memory
    • operands
      • t is in register %rsi
      • dest is in register %rdx
      • *dest means memory M[%rdx]
  • Object code
0x400539: 48 89 32
  • It is a 3-byte instruction stored at address 0x400539.

IA32 Registers - 32 bits wide

  • General-purpose register families shown in the slide
    • %eax, %ax, %ah, %al
    • %ecx, %cx, %ch, %cl
    • %edx, %dx, %dh, %dl
    • %ebx, %bx, %bh, %bl
    • %esi, %si
    • %edi, %di
    • %esp, %sp
    • %ebp, %bp
  • Roles highlighted in the slide
    • accumulate
    • counter
    • data
    • base
    • source index
    • destination index
    • stack pointer
    • base pointer

Data Sizes

  • Slide is primarily a figure summarizing common integer widths and sizes.

Assembly Data Types

  • "Integer" data of 1, 2, 4, or 8 bytes
    • data values
    • addresses / untyped pointers
  • No aggregate types such as arrays or structures at the assembly level
    • just contiguous bytes in memory
  • Two common syntaxes
    • AT&T
      • used in the course, slides, textbook, GNU tools
    • Intel
      • used in Intel documentation and Intel tools
  • Need to know which syntax you are reading because operand order may be reversed.

Three Basic Kinds of Instructions

  • Transfer data between memory and register
    • load
      • %reg = Mem[address]
    • store
      • Mem[address] = %reg
  • Perform arithmetic on register or memory data
    • examples: addition, shifting, bitwise operations
  • Control flow
    • unconditional jumps to / from procedures
    • conditional branches

Abstract Memory Layout

High addresses
Stack              <- local variables, procedure context
Dynamic Data       <- heap, new / malloc
Static Data        <- globals / static variables
Literals           <- large constants such as strings
Instructions
Low addresses

The ELF File Format

  • ELF = Executable and Linkable Format
  • One of the most widely used binary object formats
  • ELF is architecture-independent
  • ELF file types
    • Relocatable
      • must be fixed by the linker before execution
    • Executable
      • ready for execution
    • Shared
      • shared libraries with linking information
    • Core
      • core dumps created when a program terminates with a fault
  • Tools mentioned on slide
    • readelf
    • file
    • objdump -D

Process Memory Layout (32-bit x86 machine)

  • This slide is primarily a diagram.
  • Key idea: a 32-bit x86 process has a standard virtual memory layout with code, static data, heap, and stack arranged in distinct regions.

We continue with the concrete runtime layout and the actual overflow mechanics in Lecture 18.