Accurate GGUF VRAM Calculator

Calculate VRAM for GGUF models from GPU layers and context length using an accurate formula.

For an explanation about how this works, consult this blog post: https://oobabooga.github.io/blog/posts/gguf-vram-formula/

GGUF Model URL

GPU Layers

--gpu-layers in llama.cpp.

0 256

Context Length

--ctx-size in llama.cpp.

512 131072

Cache Type

Cache quantization.

fp16 q8_0 q4_0

Estimated VRAM to load the model:

Status