Running even larger LLMs on small hardware

In a previous post [GM4] we ran gemma4 12B on a 16GB RAM budget; today we'll push the envelop and run deepseek-r1:14b-qwen-distill-q4_K_M on the same budget. While last time we got away by fiddling with the model definition, today we'll have to reconfigure the OS itself (in addition to fiddling with the model definition). While … Continue reading Running even larger LLMs on small hardware →

Running large LLMs on small hardware: Gemma 4 12B on a VRAM-constrained Radeon laptop

Google released Gemma 4 12B today. I'm a huge fan of the Gemma model family, they have improved with each iteration and consistently perform on par with larger models. It didn't run at first because it needs more VRAM that my laptop has, but there's a workaround. Here's a short instruction for how to run … Continue reading Running large LLMs on small hardware: Gemma 4 12B on a VRAM-constrained Radeon laptop →

Docker, Ollama, Ubuntu & Radeon GPU

Just a quickie: this is the command I'm using on my Acer Nitro latop to run Ollama in Docker with GPU acceleration: group_id_video=$(getent group video | cut -d: -f3) group_id_render=$(getent group render | cut -d: -f3) docker run -d \ --privileged \ --device /dev/kfd \ --device /dev/dri \ --volume ollama:/root/.ollama \ --volume "/some/path/ollama:/images" \ --group-add … Continue reading Docker, Ollama, Ubuntu & Radeon GPU →