I can't speak to using local models as agentic coding assistants, but I have a headless 128GB RAM machine serving llama.cpp with a number of local models that I use on a daily basis.
- Qwen3-VL picks up new images in a NAS, auto captions and adds the text descriptions as a hidden EXIF layer into the image, which is used for fast search and organization in conjunction with a Qdrant vector database.
- Gemma3:27b is used for personal translation work (mostly English and Chinese).
- Llama3.1 spins up for sentiment analysis on text.
Ah yeah, self-contained tasks like these are ideal, true. I'm more using it for coding, or for running a personal assistant, or for doing research, where open weights models aren't as strong yet.
Understood. Research would make me especially leery; Iād be afraid of losing any potential gains as I'd feel compelled to always go and validate its claims (though I suppose you could mitigate it a little bit with search engine tooling like Kagi's MCP system).
- Qwen3-VL picks up new images in a NAS, auto captions and adds the text descriptions as a hidden EXIF layer into the image, which is used for fast search and organization in conjunction with a Qdrant vector database.
- Gemma3:27b is used for personal translation work (mostly English and Chinese).
- Llama3.1 spins up for sentiment analysis on text.