Virtual Memory Systems: Paging, Segmentation, and Swap
Virtual memory is the foundational abstraction through which modern operating systems present each process with the illusion of a private, contiguous address space larger than the physical RAM installed in the machine. This page describes the mechanics of paging, segmentation, and swap space; the causal pressures that drive virtual memory design; the classification distinctions between address translation schemes; and the tradeoffs that shape how these systems perform under real workloads. The reference covers x86-64 and ARM architectures as the dominant deployed platforms, with grounding in POSIX standards, IEEE specifications, and documented CPU vendor architecture manuals.
- Definition and scope
- Core mechanics or structure
- Causal relationships or drivers
- Classification boundaries
- Tradeoffs and tensions
- Common misconceptions
- Checklist or steps
- Reference table or matrix
- References
Definition and scope
Virtual memory is an operating system (OS) abstraction that decouples the address space visible to a process from the physical addresses of installed DRAM. Each process operates within a virtual address space — 48 bits wide in the current x86-64 implementation canonical form, yielding 256 TiB of addressable space per process (Intel 64 and IA-32 Architectures Software Developer's Manual, Vol. 1, §3.3.7) — while the OS and hardware Memory Management Unit (MMU) map virtual addresses to physical frames on demand.
The scope of virtual memory systems encompasses three interlocking subsystems: paging, which divides address space into fixed-size units; segmentation, which divides it into variable-size logical regions; and swap, which extends the effective address space by using persistent storage as a backing store when physical DRAM is insufficient. Together these form the memory management layer of the operating system that every process depends on, whether or not the application developer is aware of it.
As a full reference for the broader placement of virtual memory within the storage and speed hierarchy, the memory hierarchy in computing context establishes where virtual memory sits relative to registers, caches, DRAM, and NVMe storage — a relationship that is directly relevant to swap performance.
Core mechanics or structure
Paging
Paging divides both the virtual address space and physical memory into equal-size blocks. The virtual blocks are called pages; the physical blocks are called frames. The x86-64 architecture defines a default page size of 4 KiB, with optional large pages of 2 MiB (huge pages) and 1 GiB (gigantic pages) (Intel SDM Vol. 3A, §4.3).
Address translation occurs through a page table: a hierarchical data structure maintained by the OS and walked by the MMU's hardware page table walker. On x86-64, the standard paging mode uses a 4-level page table (PML4 → PDPT → PD → PT), each level indexed by 9 bits of the virtual address, with the remaining 12 bits serving as the byte offset within a 4 KiB page. Linux, under the mm subsystem, uses a 5-level page table when compiled with CONFIG_X86_5LEVEL, extending addressability to 57-bit virtual addresses and 128 PiB of virtual space (Linux kernel documentation, Documentation/arch/x86/x86_64/5level-paging.rst).
The Translation Lookaside Buffer (TLB) caches recently resolved virtual-to-physical mappings, avoiding full page table walks on every memory access. TLB miss rates and TLB shootdown costs under multi-core workloads are a principal performance variable in high-throughput systems.
Each page table entry contains the physical frame number, a present bit (P), a dirty bit (D), an accessed bit (A), and protection bits encoding read/write/execute permissions. When the present bit is zero, a hardware page fault (#PF exception) is raised, transferring control to the OS page fault handler.
Segmentation
Segmentation divides the address space into named, variable-length regions — code, data, stack, heap — each described by a segment descriptor containing a base address, a limit, and access rights. On x86-64 in 64-bit mode, hardware segmentation is largely vestigial: the CS, DS, SS, and ES segments have their base and limit fields ignored (treated as zero and maximum respectively), though FS and GS retain functional base registers used by OS kernels and threading libraries for per-CPU and per-thread data (Intel SDM Vol. 3A, §3.2.4).
Pure segmentation architectures, such as those used in the original Intel 8086 and protected-mode 286, remain relevant in embedded contexts. The memory in embedded systems sector still deploys segment-based memory models on microcontrollers where flat paging is unavailable or undesirable.
ARM's Memory Protection Unit (MPU), defined in the ARMv7-M and ARMv8-M architecture profiles (ARM Architecture Reference Manual, ARMv8-A), provides a segment-like region model with up to 16 configurable regions, each assigned base address, size, and access attributes — without full paging overhead.
Swap
Swap extends the effective memory capacity by designating a portion of block storage — historically a raw partition, currently often a file such as Linux's swapfile — as a backing store for evicted pages. When the OS page reclaim algorithm (in Linux, the PFRA, Page Frame Reclamation Algorithm) selects a page for eviction, the page is written to swap, its present bit is cleared, and the freed frame is made available. On a subsequent access, a page fault triggers a swap-in operation, reading the page back from storage into a physical frame.
Swap performance is directly constrained by storage latency. Rotational hard disks deliver random read latencies of approximately 5–10 ms; NVMe and storage class memory devices reduce this to tens of microseconds, making NVMe-backed swap materially different in performance profile from HDD-backed swap. Cloud memory optimization practices in virtual machine environments increasingly rely on NVMe-backed swap or memory balloon drivers rather than conventional swap partitions.
Causal relationships or drivers
Three structural forces drive virtual memory architecture decisions:
1. Address space isolation. Multiprogramming requires that one process cannot read or overwrite another's memory without explicit OS mediation. Paging with per-process page tables is the mechanism that enforces this boundary. The POSIX standard (The Open Group Base Specifications Issue 8) mandates process isolation semantics, which all conforming OS implementations achieve through virtual address translation.
2. Physical memory scarcity relative to working set. Even with installed DRAM growing — server configurations commonly ship with 256 GiB to 6 TiB of physical RAM — aggregate workload working sets regularly exceed physical capacity on shared infrastructure. The OS must therefore implement page replacement policies (LRU approximation, clock algorithm, CLOCK-Pro) to decide which pages to evict. Memory capacity planning practice for enterprise and cloud workloads is directly shaped by working set analysis against available physical memory.
3. Security isolation requirements. Speculative execution vulnerabilities disclosed beginning in January 2018 — Spectre (CVE-2017-5753, CVE-2017-5715) and Meltdown (CVE-2017-5754) — demonstrated that page table structures themselves can become side-channel attack vectors (NIST National Vulnerability Database, CVE-2017-5754). This drove the adoption of Kernel Page-Table Isolation (KPTI) in Linux 4.15 and equivalent mitigations in Windows and macOS, adding a mandatory TLB flush cost at every kernel/user transition and reshaping how OS developers think about page table layout. Memory security and vulnerabilities is a separate domain that documents these attack classes in detail.
Classification boundaries
Virtual memory implementations divide along three orthogonal axes:
Translation scheme: Pure paging (fixed-size pages, no segmentation), pure segmentation (variable-size regions, no paging), or segmented paging (segments subdivided into pages, as in the original IA-32 protected mode). x86-64 in 64-bit mode is effectively pure paging with vestigial segment registers.
Page table structure: Flat single-level (impractical above small address spaces), multi-level hierarchical (2-, 3-, 4-, or 5-level as in x86-64), or inverted page tables (one entry per physical frame rather than per virtual page, used by IBM POWER and some HP PA-RISC implementations to bound table size).
Backing store relationship: Demand paging (pages are loaded only on first access), prepaging (the OS speculatively loads adjacent pages), and copy-on-write (CoW) (fork creates shared mappings marked read-only; a write triggers duplication of only the written page). CoW is the mechanism behind Linux's fork(2) efficiency and container image layering.
Tradeoffs and tensions
Page size vs. TLB coverage vs. fragmentation. Larger pages (2 MiB huge pages) reduce TLB pressure — a single TLB entry covers 512× more memory than a 4 KiB entry — but increase internal fragmentation and complicate memory allocation for workloads with irregular access patterns. Linux Transparent Huge Pages (THP) automates promotion of eligible anonymous mappings to 2 MiB pages, but has documented latency spikes during compaction. Memory bandwidth and latency profiling commonly reveals THP compaction as a source of unpredictable tail latency in latency-sensitive applications.
Swap depth vs. application performance. Aggressive use of swap allows the OS to reclaim physical memory for file cache, improving throughput on I/O-bound workloads. However, swap-induced latency when accessing evicted hot pages degrades response time for latency-sensitive services. Linux kernel parameter vm.swappiness (range 0–200 in kernels ≥ 5.8) controls the balance between anonymous page reclaim and file cache reclaim.
KPTI overhead vs. security posture. KPTI eliminates shared kernel mappings from user-mode page tables, closing Meltdown-class attack paths. The cost — measured at 5–30% throughput reduction on syscall-intensive workloads in early benchmarks (Red Hat Performance Tuning Guide, 2018 errata) — drove the development of PCID (Process-Context Identifiers, CR4.PCIDE) as a mitigation to reduce TLB flush frequency.
ECC memory and page fault interactions. ECC memory error correction hardware can silently correct single-bit errors in page table entries; uncorrected multi-bit errors produce corrupted PTE data, which the OS may interpret as page faults or access violations. This intersection is relevant to memory failure diagnosis and repair workflows in enterprise environments.
Common misconceptions
Misconception: Virtual memory is the same as swap. Swap is one component of a virtual memory system — the backing store used when physical frames are exhausted. Virtual memory as an abstraction exists and operates even when no swap is configured; its primary function is address space isolation and translation, not capacity extension.
Misconception: More swap eliminates the need for more RAM. Swap-in latency from block storage is orders of magnitude higher than DRAM latency. A system under heavy swap pressure is not functionally equivalent to a system with adequate physical memory; it is experiencing I/O-bound stall conditions that degrade throughput and response time proportionally to swap frequency. This is distinct from persistent memory technology, where byte-addressable NVDIMM devices narrow the latency gap between DRAM and storage.
Misconception: 64-bit systems have unlimited address space. The x86-64 canonical addressing constraint limits virtual addresses to 48 significant bits (or 57 with 5-level paging), not the full 64 bits. Addresses outside the canonical range generate a general protection fault (#GP), not a page fault.
Misconception: Paging always outperforms segmentation. On workloads where logical object boundaries align naturally with segments, pure segmentation can avoid the page table walk overhead entirely. The performance comparison is workload-dependent and architecture-dependent, not absolute.
Misconception: Copy-on-write duplicates memory immediately on fork. CoW defers physical duplication until a write occurs. A fork(2) call on a process using 8 GiB of RAM does not immediately consume 16 GiB; physical frames are shared until modified pages require private copies, which is the basis for unified memory architecture designs that pool physical memory across multiple consumers.
Checklist or steps
The following sequence describes the hardware and software steps executed during a demand-paging page fault resolution on a standard x86-64 Linux system:
- Faulting instruction detected — The MMU hardware encounters a page table entry with the present bit (P=0) or a permission violation during address translation.
- #PF exception raised — The CPU pushes the faulting address into CR2, saves register state, and transfers control to the kernel's page fault handler (
do_page_fault/exc_page_faultin Linux). - Fault classification — The handler determines whether the fault is a legitimate access (demand paging, CoW, stack growth) or an illegal access (segmentation fault / SIGSEGV delivery).
- VMA lookup — The kernel searches the process's virtual memory area (VMA) tree for the VMA containing the faulting address to confirm the access is within a mapped region.
- Page allocation — A free physical frame is obtained from the buddy allocator (
alloc_page()). If no free frame is available, the page reclaim path (kswapd or direct reclaim) is invoked. - Backing store read (if swap-backed) — If the PTE encodes a swap entry, an I/O request is issued to the swap device; the faulting process sleeps on the I/O wait queue until completion.
- Page table update — The PTE is written with the physical frame number, P=1, and appropriate permission bits; the dirty and accessed bits are initialized.
- TLB invalidation — The kernel issues a TLB invalidation for the faulting virtual address on the local CPU (and IPI-based TLB shootdown to other CPUs sharing the page table if the mapping is shared).
- Instruction restart — The CPU re-executes the faulting instruction, which now resolves successfully through the populated PTE.
Reference table or matrix
| Feature | Pure Paging (x86-64) | Segmented Paging (IA-32 PM) | ARM MPU (ARMv7-M/v8-M) | Inverted Page Table (IBM POWER) |
|---|---|---|---|---|
| Address unit | Fixed 4 KiB / 2 MiB / 1 GiB pages | Variable segments + 4 KiB pages | Up to 16 fixed regions | Fixed pages, one entry per frame |
| Translation levels | 4-level (PML4→PT) or 5-level | 2-stage: segment selector + paging | Region register lookup | Global hash table |
| TLB structure | Per-core, multi-entry, tagged | Per-core, shared with segmentation | Locked or unlocked entries | Software-managed (hashed HTAB) |
| Virtual address width | 48-bit canonical (57-bit w/ LA57) | 32-bit (48-bit in PAE mode) | 32-bit | 64-bit |
| Swap support | Yes (Linux, Windows, macOS) | Yes (legacy OS/2 |