Do you run a local LLM? What problems are you solving?

x509 · Apr 11, 2026

I hope this is the right forum for this question. The main question is in the thread topic. I've have had some success using Claude AI and ChatGPT, but I don't want to prompt for answers to really private topics.

Some additional questions:

Which local LLMs? What kind of hardware ?

How do results compare with results from Claude AI or ChatGPT?

Do you need to train the model on your local data?

What else ?

DreadKing · Apr 14, 2026

x509 said:
...

A simple local AI stack on Windows 11.. For the interface, I run Open WebUI inside Docker, which connects to a model of my choice via the Ollama library.

To extend the model's capabilities, I write custom Python code that enables the model to directly manage tasks and files on my NAS within my local network.

Additionally, I use Stability Matrix to handle all my image and video generation locally.

[Q] Which local LLMs?
[A] It really comes down to personal preference and your specific hardware.
You can pull a huge variety of models from Ollama, you can easily switch between chatgpt deepseek gemini...
When choosing, you'll see different parameter counts (like 8B, 14B, or 70B). In simple terms, parameters are the internal variables the model learned during training; the higher the number, the 'smarter' and more nuanced the model generally is. However, larger models require more hardware:
8B models: Usually require ~8GB of VRAM
14B - 32B models: Require ~12GB–24GB of VRAM
70B+ models: Require multiple GPUs/Workstation.

[Q] What kind of hardware ?
[A] Mine is..
Acer Predator Helios Neo 16
Intel Core Ultra 9 275HX (24-Core)
NVIDIA GeForce RTX 5070 Ti (12GB)
RAM 32GB DDR5-6400

[Q] How do results compare with results from Claude AI or ChatGPT?
[A] I use niether.

[Q] Do you need to train the model on your local data?
[A] Do YOU want to train the model on your data? If yes then you have an oriented "in that data" agent.

x509 · Apr 14, 2026

DreadKing said:
A simple local AI stack on Windows 11.. For the interface, I run Open WebUI inside Docker, which connects to a model of my choice via the Ollama library.

Thank you and big welcome to this forum. Lots of smart, experenced people to learn from, and to do your contributiuns to the group's overall knowledge. My go-to forum for Windows.

I am a complete noob in this area. I hope you don't mind some noob-noob questions.

Is your configuration necessary or your preference? If a preference what are your reasons?

DreadKing said:
When choosing, you'll see different parameter counts (like 8B, 14B, or 70B). In simple terms, parameters are the internal variables the model learned during training; the higher the number, the 'smarter' and more nuanced the model generally is. However, larger models require more hardware:

Thanks for this explanation. Very clear.appreciate that since I have a 16GB GPU, I can run models of 14B and above. What if a specific model requires say 20 or 24 GB? Does it run at all?

DreadKing said:
8B models: Usually require ~8GB of VRAM
14B - 32B models: Require ~12GB–24GB of VRAM
70B+ models: Require multiple GPUs/Workstation.

RAM 32GB DDR5-6400

I have 64 GB of DDR5-5200. I wish that I could afford to replace that with much raster RAM, but my rich uncle isrelatively young and in excellent health.[

DreadKing said:
[A] Do YOU want to train the model on your data? If yes then you have an oriented "in that data" agent.

Again, as a complete noob, do I need to create that agent?

DreadKing · Apr 14, 2026

Glad to be here! Those aren't "noob" questions at all. Here is the breakdown:

1. Is this configuration necessary or a preference?

It’s a mix of both. Running Open WebUI inside Docker is a preference for stability; it keeps the AI environment isolated so it doesn't mess with my Windows system files. However, using Ollama is a functional necessity because it’s the most efficient way to handle model switching.

2. What happens if a model needs more VRAM than you have? (The 16GB vs 24GB question)Great question. Yes, it will still run, but it won’t be fast.

The "Split" (Offloading): Ollama is smart. If a model needs 20GB and you only have 16GB of VRAM (Video RAM), it will put 16GB on your GPU and "offload" the remaining 4GB to your system RAM.
The Result: Because system RAM (even your DDR5) is much slower than GPU VRAM, the model’s "tokens per second" (typing speed) will drop significantly. It goes from "instant" to "reading speed" or slower.

3. Do you need to create an agent for your data? Short answer: No.
You don't need to. If you just want to talk to an AI about general topics, the base models from Ollama are fine.

The "Agent" approach: I created an agent because I wanted the AI to have a "job"—specifically managing my local network files.
The "Easy" way: If you just want the AI to know about your personal PDFs or documents without coding, Open WebUI (which I use) has a built-in feature called RAG (Retrieval-Augmented Generation). You just upload your files to the chat, and the AI "reads" them. It’s much easier than training a model from scratch!

Your 64GB of RAM is actually a huge advantage here. Even if your RAM is "slower" than mine, the sheer capacity means you can run much larger models (like a 70B model) by offloading them to that 64GB pool. It’ll be slower than a workstation, but it will work!

If you ever need an agent..
Since you're using AI, you can provide it with this prompt: 'Provide a step-by-step guide to installing Docker, Open WebUI, and Ollama models on Windows 11.'

x509 · Apr 14, 2026

DreadKing said:
Glad to be here! Those aren't "noob" questions at all. Here is the breakdown:

Thanks but compared to my overall knowledge of Windows, major applications like MS Office, Adobe Lightroom, plus a raft of utilities, I am still a noob about AI. But I want to learn more, because AI is clearly an important technology for the future and I'm never too old to learn (if a bit old for other things.

z3r010 · Monday at 2:20 PM

I'm burning through credits so fast these days, I'm just investigating if using a local LLM may be the better option, @x509 did you get any further down this rabit hole?

x509 · Monday at 8:36 PM

z3r010 said:
I'm burning through credits so fast these days, I'm just investigating if using a local LLM may be the better option, @x509 did you get any further down this rabit hole?

I have a $20/month subscription to Claude AI, which I use mainly for writing powershell scripts, since I know nada about PowerShell. Also fixing Excel spreadsheet errors. No other real uses.

z3r010 · Tuesday at 1:29 AM

I’ve been taking on projects for sites and servers far beyond what I would have considered before. I used to have a GitHub Copilot subscription on VS Code, but with the recent pricing change, my estimated cost would have jumped from £40 of my old subscription to over £150 a month.

I switched to a $60 Cursor subscription, which seemed like the best value for my needs, but after just two days, I’d already used over 10% of my monthly credits. Yesterday, I ended up doing half my work with Google Antigravity on the £18.99 Google AI plan, so I’m starting to think that getting an Nvidia card and using Qwen or Deepseek locally might be a better long-term option.

Steve C · Tuesday at 2:09 AM

z3r010 said:
I'm burning through credits so fast these days, I'm just investigating if using a local LLM may be the better option, @x509 did you get any further down this rabit hole?

I use CoPilot and Gemini and have no usage limites unlike ChatGPT

z3r010 · Tuesday at 2:27 AM

Steve C said:
I use CoPilot and Gemini and have no usage limites unlike ChatGPT

Using it for projects in something like VS Code or Antigravity is a completely different experience from chatting with it on an unlimited web interface and has high costs and limits.

Steve C · Tuesday at 3:36 AM

x509 said:
I hope this is the right forum for this question. The main question is in the thread topic. I've have had some success using Claude AI and ChatGPT, but I don't want to prompt for answers to really private topics.

Some additional questions:

Which local LLMs? What kind of hardware ?

How do results compare with results from Claude AI or ChatGPT?

Do you need to train the model on your local data?

What else ?

I mainly use Copilot augmented with Gemini & ChatGPT - all free versions. You soon hit the daily use limit for the free version of CGPT. Use them all with extreme caution since they can all make up facts and be confidently wrong! This is why I call them Artifical Idiots! If you have time to waste you can cross reference their answers and watch them argue with each other :lmao:

Alejandro85 · Tuesday at 9:59 AM

z3r010 said:
I'm burning through credits so fast these days, I'm just investigating if using a local LLM may be the better option, @x509 did you get any further down this rabit hole?

Consider that running a local AI would cost you a lot of money in term of hardware and then power to run it (AI is very compute intensive). Unless you have a very cheap electricity available, running a local one for similar usages would likely end up costing more in the long run.
ChatGPT credits are still very cheap, considering that OpenAI absorbs all that cost and is running at an immense loss, at least until the AI bubble burst.

z3r010 · Tuesday at 11:14 AM

Alejandro85 said:
ChatGPT credits are still very cheap, considering that OpenAI absorbs all that cost and is running at an immense loss, at least until the AI bubble burst.

I could easily blow through the $100 monthly allowance in a week or less if I went ahead with some of the things I’d like to do, especially if the credits were free like on a self-hosted setup.

Pricing – Codex | OpenAI Developers

Codex is included in your ChatGPT Free, Go, Plus, Pro, Business, Edu, or Enterprise plan

developers.openai.com

I ran Qwen3.5 this morning on my current system and it was okay-ish, hitting around 44 tokens/s while testing some of my usual tasks. If I swapped the GPU and PSU for an RTX 5090 and a 1200W PSU, it would definitely improve performance massively, and at under £3.5k, it could be cost-effective in the long run.

Do you run a local LLM? What problems are you solving?

Well-known member

My Computer

Member

My Computer

Well-known member

My Computer

Member

My Computer

Well-known member

My Computer

Administrator

My Computers

Well-known member

My Computer

Administrator

My Computers

Well-known member

My Computer

Administrator

My Computers

Well-known member

My Computer

New member

My Computer

Administrator

My Computers