Calculate VRAM for GGUF models from GPU layers and context length using an accurate formula.
For an explanation about how this works, consult this blog post: https://oobabooga.github.io/blog/posts/gguf-vram-formula/
--gpu-layers in llama.cpp.
--gpu-layers
--ctx-size in llama.cpp.
--ctx-size
Cache quantization.