GPU Memory Architecture: VRAM, GDDR, and Compute Workloads
GPU memory architecture defines how graphics processors store, access, and move data during rendering, simulation, and machine learning inference. This page covers the structural components of video RAM (VRAM), the GDDR and HBM memory standards that supply it, how bandwidth and capacity interact with compute workload demands, and the classification boundaries that determine which architecture is appropriate for a given application class. The distinctions between consumer, professional, and data center GPU memory configurations carry direct consequences for throughput, thermal design, and total system cost.
Definition and scope
A GPU memory subsystem is a dedicated pool of high-speed memory mounted on or adjacent to the graphics processing unit die, physically separate from system DRAM and governed by a memory controller integrated into the GPU itself. This separation from the CPU memory bus allows the GPU to sustain memory bandwidths that general-purpose DRAM channels cannot match — modern GDDR6X implementations, for example, achieve bandwidths exceeding 1,000 GB/s on wide 384-bit memory interfaces, as documented in NVIDIA's publicly published hardware specifications for the GeForce RTX 4090.
The scope of GPU memory architecture encompasses four components:
- Memory type — the DRAM cell technology (GDDR6, GDDR6X, HBM2e, HBM3)
- Interface width — the number of data lines connecting the GPU die to memory chips, measured in bits (64-bit to 384-bit for GDDR; 1,024-bit and above for HBM)
- Capacity — total addressable VRAM, ranging from 8 GB on mainstream consumer GPUs to 80 GB on data center accelerators such as the NVIDIA A100
- Memory controller architecture — partitioned into memory channels, each served by a sub-controller, with error correction optionally applied
The broader context of GPU memory sits within the memory hierarchy in computing, where VRAM occupies a position analogous to L3 cache relative to the GPU's shader cores — fast, local, and directly addressable without latency penalties from crossing the PCIe bus.
JEDEC Solid State Technology Association (jedec.org) publishes the formal specifications for GDDR6 (JESD250) and GDDR6X, which extend the standard via PAM4 (Pulse Amplitude Modulation 4-level) signaling to double effective data rate per pin without increasing the physical clock frequency.
How it works
GPU memory operates through a parallelized burst-read/write model driven by the GPU's memory controller, which issues transactions to multiple DRAM chips simultaneously across the full interface width. The following discrete stages describe the memory access pipeline:
- Address mapping — the GPU memory controller maps shader core requests to physical DRAM rows, columns, and banks using a tiling scheme optimized for 2D spatial locality
- Bank activation — the controller asserts a row address strobe (RAS) to open a DRAM row in a specific bank, copying that row's data into a sense amplifier row buffer
- Column access — column address strobe (CAS) latency determines how many clock cycles elapse before data appears on the data bus; GDDR6 operates at CAS latencies of 14–16 at reference clocks
- Burst transfer — data is transferred in fixed burst lengths (BL8 for GDDR6) across all active interface lines simultaneously
- Precharge and refresh — the bank is precharged between accesses, and all banks undergo periodic DRAM refresh cycles to prevent data decay
Memory bandwidth and latency are the two primary performance axes. Bandwidth (GB/s) scales with interface width multiplied by data rate; latency (nanoseconds) is dominated by the CAS latency and round-trip signaling distance. High-throughput workloads such as matrix multiplication in neural network training are bandwidth-bound, while ray tracing traversal is more latency-sensitive.
GDDR6 vs. HBM3 — structural contrast:
| Parameter | GDDR6 (384-bit, ×12 chips) | HBM3 (4-stack) |
|---|---|---|
| Peak bandwidth | ~936 GB/s | ~3,276 GB/s |
| Interface width | 384-bit | 4,096-bit |
| Capacity (typical) | 24 GB | 96 GB |
| Package form factor | Discrete DRAM packages on PCB | Stacked die on silicon interposer |
| Power efficiency | Moderate | High GB/s per watt |
High Bandwidth Memory (HBM) achieves its bandwidth advantage by stacking DRAM dies vertically and connecting them through silicon interposers using through-silicon vias (TSVs), placing memory physically adjacent to the GPU die rather than across a long PCB trace.
Common scenarios
GPU memory architecture requirements differ substantially by workload class. Three operational categories define most professional and research deployment decisions:
Consumer gaming: Workloads are primarily rasterization and ray tracing with texture-heavy scenes. A 16 GB GDDR6X framebuffer is sufficient for 4K rendering at current asset scales. Memory access patterns are spatially coherent, favoring the cache-friendly tiling structures in GDDR architectures.
Professional visualization and CAD: Applications such as Autodesk Maya rendering and ANSYS simulation require VRAM capacity above raw bandwidth because full scene geometry must reside in VRAM throughout the session. NVIDIA Quadro RTX and AMD Radeon Pro lines address this with ECC-enabled VRAM pools; ECC memory error correction is a mandatory qualification criterion for many enterprise workloads involving unrecoverable computation.
AI and machine learning training: Large language model (LLM) training is the most memory-intensive scenario in the current GPU market. A GPT-class model with 70 billion parameters requires approximately 140 GB of VRAM at FP16 precision to hold weights alone, necessitating multi-GPU configurations with NVLink interconnects or tensor parallelism across nodes. The memory in AI and machine learning landscape increasingly drives data center GPU procurement decisions, with the NVIDIA H100's 80 GB HBM3 subsystem representing the current production ceiling for single-card capacity in that class.
Decision boundaries
Selecting a GPU memory architecture involves resolving tradeoffs across four measurable dimensions: bandwidth ceiling, capacity ceiling, error tolerance, and thermal/power budget.
Bandwidth vs. capacity: GDDR6/GDDR6X maximizes bandwidth per dollar on consumer platforms but is constrained to approximately 24 GB per card on current 384-bit interfaces. HBM3 delivers 3× to 4× the bandwidth of GDDR6 at scale but adds $10,000 or more to GPU unit cost (reflected in NVIDIA H100 SXM list pricing published by NVIDIA's OEM partners). For workloads that are bandwidth-saturated rather than capacity-constrained — such as real-time inference on smaller models — GDDR6X remains competitive.
ECC requirements: Enterprise, scientific, and financial computing workloads typically mandate ECC-protected VRAM to detect and correct single-bit errors without aborting computation. Consumer GDDR6 implementations generally omit ECC or offer it as a capacity-reducing option. Professional GPU lines implement full ECC across the VRAM pool by default.
Unified memory architecture considerations: Apple Silicon and AMD APU platforms implement unified memory, where GPU and CPU share a single physical pool. This eliminates PCIe transfer overhead but couples GPU memory capacity to the total system RAM ceiling, which constrains large-model workloads.
Thermal design: GDDR6X operating at full PAM4 data rates generates substantially more heat per chip than GDDR6. Cards such as the RTX 4090 carry a 450-watt total board power (TBP) rating, a figure published by NVIDIA in its GPU specifications. HBM stacks run cooler per unit of bandwidth delivered but require liquid cooling or advanced vapor chamber solutions when integrated in high-density data center configurations.
For professionals navigating the full memory systems landscape — from DRAM fundamentals through storage-class and persistent tiers — GPU VRAM represents the highest-bandwidth, lowest-latency tier in the external memory hierarchy, purpose-built for parallel throughput rather than general-purpose flexibility.
References
- JEDEC JESD250 — GDDR6 Standard
- JEDEC — High Bandwidth Memory (HBM) Standard (JESD235)
- NVIDIA H100 Tensor Core GPU Architecture Whitepaper
- NVIDIA GeForce RTX 4090 Product Specifications
- AMD RDNA 3 Architecture Whitepaper
- IEEE Xplore — GPU Memory Architecture Survey
- JEDEC — LPDDR and GDDR Memory Technology Overview