NVIDIA GPU Query Engine reference architecture accelerates database queries 7.5x over Central Processing Unit

Token & cost optimization

July 1, 2026 4 min read

Curated by Oleksandr Kuzmenko, AI Product EngineerUpdated July 1, 2026Sources cited on every story

AI-assisted · editor-reviewedHow we use AI

NVIDIA GPU Query Engine reference architecture accelerates database queries 7.5x over Central Processing Unit

NVIDIA has detailed GQE, a reference architecture for running high-throughput SQL queries natively on GPUs. By leveraging NVLink-C2C and nvCOMP decompression, GQE delivers up to 25.5x speedups on analytical query workloads.

Impact: Medium

Why it matters

GQE addresses major memory and I/O bandwidth bottlenecks, allowing databases to perform high-throughput query execution directly on GPU hardware without exhausting SM resources.

TL;DR

01GQE accelerates SQL queries through three layers: query, data, and execution.
02It utilizes Blackwell's Decompression Engine for zero-SM LZ77 decompression.
03Delivers a 7.5x aggregate speedup on benchmarks, reaching up to 25.5x for individual queries.

Key facts

Average Speedup: 7.5x over CPU databases
Peak Query Speedup: 25.5x on TPC-H SF1000
Key Software Libraries: cuDF, CCCL, nvCOMP, nvSHMEM
Hardware Interconnect: NVIDIA NVLink-C2C

The GQE Architectural Layers

GQE transitions SQL queries to hardware-level execution through three primary layers: 1. Query Layer: Parses SQL into logical plans using Substrait, making it compatible with engines like Apache DataFusion, then compiles them into optimized physical plans. 2. Data Layer: Organizes CPU-side memory into non-contiguous column partitions grouped into row groups. It orchestrates asynchronous chunked transfers (cudaMemcpyBatchAsync) to GPU device memory on-demand. 3. Execution Layer: Schedules relational operator tasks across concurrent CUDA streams, overlapping data movement with computation.

Advanced GPU Compression and Decompression

To expand effective memory capacity, GQE integrates the nvCOMP library. On the Blackwell architecture, the hardware-level Blackwell Decompression Engine (DE) can decompress standard LZ77-based formats (such as LZ4, Snappy, and Deflate) at ultra-high throughput without consuming any streaming multiprocessor (SM) resources. Overall, GQE's optimizations deliver an aggregate 7.5x speedup over CPU databases, with individual query gains reaching up to 25.5x.

✓ When to use

When planning large-scale SQL query acceleration on high-bandwidth systems like NVIDIA GB200 NVL4.

#GQE#nvCOMP#cuDF#CCCL#nvSHMEM#Blackwell Decompression Engine

ShareShare on X Share on LinkedIn

Token & cost optimization

July 1, 2026 4 min read

Curated by Oleksandr Kuzmenko, AI Product EngineerUpdated July 1, 2026Sources cited on every story

AI-assisted · editor-reviewedHow we use AI

Impact: Medium

Why it matters

GQE addresses major memory and I/O bandwidth bottlenecks, allowing databases to perform high-throughput query execution directly on GPU hardware without exhausting SM resources.

TL;DR

01GQE accelerates SQL queries through three layers: query, data, and execution.
02It utilizes Blackwell's Decompression Engine for zero-SM LZ77 decompression.
03Delivers a 7.5x aggregate speedup on benchmarks, reaching up to 25.5x for individual queries.

Key facts

Average Speedup: 7.5x over CPU databases
Peak Query Speedup: 25.5x on TPC-H SF1000
Key Software Libraries: cuDF, CCCL, nvCOMP, nvSHMEM
Hardware Interconnect: NVIDIA NVLink-C2C

The GQE Architectural Layers

Advanced GPU Compression and Decompression

✓ When to use

When planning large-scale SQL query acceleration on high-bandwidth systems like NVIDIA GB200 NVL4.

#GQE#nvCOMP#cuDF#CCCL#nvSHMEM#Blackwell Decompression Engine

ShareShare on X Share on LinkedIn

NVIDIA GPU Query Engine reference architecture accelerates database queries 7.5x over Central Processing Unit

The GQE Architectural Layers

Advanced GPU Compression and Decompression

Related stories

Get the morning AI brief

NVIDIA GPU Query Engine reference architecture accelerates database queries 7.5x over Central Processing Unit

The GQE Architectural Layers

Advanced GPU Compression and Decompression

Related stories

Get the morning AI brief