-
Run a 35B AI Model on Mac Mini 16GB + Live Model Swap Guide (2026)
https://thoughts.jock.pl/p/local-llm-35b-mac-mini-gemma-swap-production-2026?r=nr6w1&triedRedirect=true
Justin sent me this. This is a substantial difference between Ollama and llama.cpp: So what does --mmap do? Instead of loading the entire model file into RAM (which is what Ollama tried, and why it choked), llama.cpp memory-maps the file. The OS treats the model like a virtual address space backed by your SSD. And Gemma 4 sounds fantastic.
-
(1) Coding Agent with a Self-Hosted LLM using OpenCode and vLLM - YouTube
https://www.youtube.com/watch?v=0uZpuZQi7Zs&t=48s
In this video, we build a fully self-hosted coding agent powered by the 7B parameter Qwen 2.5 Coder model, running on a GPU instance in Lambda Cloud and serv...