Do you run a local LLM? What problems are you solving?


x509

Well-known member
Member
VIP
Local time
1:39 AM
Posts
672
Location
Western USA
OS
Windows 11 2H25
I hope this is the right forum for this question. The main question is in the thread topic. I've have had some success using Claude AI and ChatGPT, but I don't want to prompt for answers to really private topics.

Some additional questions:

Which local LLMs? What kind of hardware ?

How do results compare with results from Claude AI or ChatGPT?

Do you need to train the model on your local data?

What else ?
 
Windows Build/Version
26200.8117

My Computer

System One

  • OS
    Windows 11 2H25
    Computer type
    PC/Desktop
    Manufacturer/Model
    DIY
    CPU
    AMD 9900X
    Motherboard
    MSI X870E Carbon
    Memory
    64 GB
    Graphics Card(s)
    AMD 9070 XT
    Sound Card
    built-in
    Monitor(s) Displays
    Dell 24"
    Hard Drives
    Sabrent 1 TB NVMe, 4 x SSD (need to check models), 4 x 3.5" HDD, 8-16 TB, all WD
    PSU
    Seasonic 850
    Case
    Fractal Design North XL (which I likw)
    Cooling
    Corsair AIO for CPU, fans for case
    Keyboard
    Das Keyboard 4
    Mouse
    Corsair M65 (white)
    Internet Speed
    1 TB download
    Browser
    Firefox
    Antivirus
    Bitdefender
    Other Info
    Also have Lenovo T14S laptop (me) and Lenovo Slim 71 (wife)
A simple local AI stack on Windows 11.. For the interface, I run Open WebUI inside Docker, which connects to a model of my choice via the Ollama library.

To extend the model's capabilities, I write custom Python code that enables the model to directly manage tasks and files on my NAS within my local network.

Additionally, I use Stability Matrix to handle all my image and video generation locally.

[Q] Which local LLMs?
[A] It really comes down to personal preference and your specific hardware.
You can pull a huge variety of models from Ollama, you can easily switch between chatgpt deepseek gemini...
When choosing, you'll see different parameter counts (like 8B, 14B, or 70B). In simple terms, parameters are the internal variables the model learned during training; the higher the number, the 'smarter' and more nuanced the model generally is. However, larger models require more hardware:
8B models: Usually require ~8GB of VRAM
14B - 32B models: Require ~12GB–24GB of VRAM
70B+ models: Require multiple GPUs/Workstation.

[Q] What kind of hardware ?
[A] Mine is..
Acer Predator Helios Neo 16
Intel Core Ultra 9 275HX (24-Core)
NVIDIA GeForce RTX 5070 Ti (12GB)
RAM 32GB DDR5-6400

[Q]
How do results compare with results from Claude AI or ChatGPT?
[A] I use niether.

[Q] Do you need to train the model on your local data?
[A] Do YOU want to train the model on your data? If yes then you have an oriented "in that data" agent.
 
Last edited:

My Computer

System One

  • OS
    𓂀
A simple local AI stack on Windows 11.. For the interface, I run Open WebUI inside Docker, which connects to a model of my choice via the Ollama library.

Thank you and big welcome to this forum. Lots of smart, experenced people to learn from, and to do your contributiuns to the group's overall knowledge. My go-to forum for Windows.

I am a complete noob in this area. I hope you don't mind some noob-noob questions.

Is your configuration necessary or your preference? If a preference what are your reasons?
When choosing, you'll see different parameter counts (like 8B, 14B, or 70B). In simple terms, parameters are the internal variables the model learned during training; the higher the number, the 'smarter' and more nuanced the model generally is. However, larger models require more hardware:

Thanks for this explanation. Very clear.appreciate that since I have a 16GB GPU, I can run models of 14B and above. What if a specific model requires say 20 or 24 GB? Does it run at all?
8B models: Usually require ~8GB of VRAM
14B - 32B models: Require ~12GB–24GB of VRAM
70B+ models: Require multiple GPUs/Workstation.


RAM 32GB DDR5-6400
I have 64 GB of DDR5-5200. I wish that I could afford to replace that with much raster RAM, but my rich uncle isrelatively young and in excellent health.[


[A] Do YOU want to train the model on your data? If yes then you have an oriented "in that data" agent.

Again, as a complete noob, do I need to create that agent?
 

My Computer

System One

  • OS
    Windows 11 2H25
    Computer type
    PC/Desktop
    Manufacturer/Model
    DIY
    CPU
    AMD 9900X
    Motherboard
    MSI X870E Carbon
    Memory
    64 GB
    Graphics Card(s)
    AMD 9070 XT
    Sound Card
    built-in
    Monitor(s) Displays
    Dell 24"
    Hard Drives
    Sabrent 1 TB NVMe, 4 x SSD (need to check models), 4 x 3.5" HDD, 8-16 TB, all WD
    PSU
    Seasonic 850
    Case
    Fractal Design North XL (which I likw)
    Cooling
    Corsair AIO for CPU, fans for case
    Keyboard
    Das Keyboard 4
    Mouse
    Corsair M65 (white)
    Internet Speed
    1 TB download
    Browser
    Firefox
    Antivirus
    Bitdefender
    Other Info
    Also have Lenovo T14S laptop (me) and Lenovo Slim 71 (wife)
Glad to be here! Those aren't "noob" questions at all. Here is the breakdown:

1. Is this configuration necessary or a preference?

It’s a mix of both. Running Open WebUI inside Docker is a preference for stability; it keeps the AI environment isolated so it doesn't mess with my Windows system files. However, using Ollama is a functional necessity because it’s the most efficient way to handle model switching.

2. What happens if a model needs more VRAM than you have? (The 16GB vs 24GB question)Great question. Yes, it will still run, but it won’t be fast.
  • The "Split" (Offloading): Ollama is smart. If a model needs 20GB and you only have 16GB of VRAM (Video RAM), it will put 16GB on your GPU and "offload" the remaining 4GB to your system RAM.
  • The Result: Because system RAM (even your DDR5) is much slower than GPU VRAM, the model’s "tokens per second" (typing speed) will drop significantly. It goes from "instant" to "reading speed" or slower.
3. Do you need to create an agent for your data? Short answer: No.
You don't need to. If you just want to talk to an AI about general topics, the base models from Ollama are fine.
  • The "Agent" approach: I created an agent because I wanted the AI to have a "job"—specifically managing my local network files.
  • The "Easy" way: If you just want the AI to know about your personal PDFs or documents without coding, Open WebUI (which I use) has a built-in feature called RAG (Retrieval-Augmented Generation). You just upload your files to the chat, and the AI "reads" them. It’s much easier than training a model from scratch!
Your 64GB of RAM is actually a huge advantage here. Even if your RAM is "slower" than mine, the sheer capacity means you can run much larger models (like a 70B model) by offloading them to that 64GB pool. It’ll be slower than a workstation, but it will work!

If you ever need an agent..
Since you're using AI, you can provide it with this prompt: 'Provide a step-by-step guide to installing Docker, Open WebUI, and Ollama models on Windows 11.'
 
Last edited:

My Computer

System One

  • OS
    𓂀
Glad to be here! Those aren't "noob" questions at all. Here is the breakdown:
Thanks but compared to my overall knowledge of Windows, major applications like MS Office, Adobe Lightroom, plus a raft of utilities, I am still a noob about AI. But I want to learn more, because AI is clearly an important technology for the future and I'm never too old to learn (if a bit old for other things. :p
 

My Computer

System One

  • OS
    Windows 11 2H25
    Computer type
    PC/Desktop
    Manufacturer/Model
    DIY
    CPU
    AMD 9900X
    Motherboard
    MSI X870E Carbon
    Memory
    64 GB
    Graphics Card(s)
    AMD 9070 XT
    Sound Card
    built-in
    Monitor(s) Displays
    Dell 24"
    Hard Drives
    Sabrent 1 TB NVMe, 4 x SSD (need to check models), 4 x 3.5" HDD, 8-16 TB, all WD
    PSU
    Seasonic 850
    Case
    Fractal Design North XL (which I likw)
    Cooling
    Corsair AIO for CPU, fans for case
    Keyboard
    Das Keyboard 4
    Mouse
    Corsair M65 (white)
    Internet Speed
    1 TB download
    Browser
    Firefox
    Antivirus
    Bitdefender
    Other Info
    Also have Lenovo T14S laptop (me) and Lenovo Slim 71 (wife)
I'm burning through credits so fast these days, I'm just investigating if using a local LLM may be the better option, @x509 did you get any further down this rabit hole?
 

My Computers

System One System Two

  • OS
    Windows 11 Workstation
    Computer type
    PC/Desktop
    Manufacturer/Model
    doofenshmirtz evil incorporated
    CPU
    Ryzen 9 5950X
    Motherboard
    Asus ROG Crosshair VIII Formula
    Memory
    Corsair Vengeance RGB PRO Black 64GB (4x16GB) 3600MHz AMD Ryzen Tuned DDR4
    Graphics Card(s)
    ASUS AMD Radeon RX 6900 XT 16GB ROG Strix LC OC
    Sound Card
    Sound BlasterX Katana
    Monitor(s) Displays
    3 x27" Dell U2724D & 1 x 34" Dell U3415W
    Hard Drives
    Samsung 980 Pro 1TB M.2 2280 PCI-e 4.0 x4 NVMe Solid State
    Drive
    PSU
    ASUS ROG THOR 850W 80 Plus Platinum
    Case
    ASUS ROG Strix Helios Midi-Tower ARGB Gaming Case
    Cooling
    ASUS ROG Strix LC Performance RGB AIO CPU Liquid Cooler - 360mm
    Keyboard
    Logi Ergo
    Mouse
    Logitech MX Vertical
    Internet Speed
    900/100 Mbps
    Browser
    Chrome
    Antivirus
    Windows Defender, Malwarebytes Pro
    Other Info
    HP M281 Printer
    Logitech Brio Stream webcam
    Yeti X mic
  • Operating System
    Windows 10
    Computer type
    Laptop
    Manufacturer/Model
    Surface Laptop
    CPU
    i7
I'm burning through credits so fast these days, I'm just investigating if using a local LLM may be the better option, @x509 did you get any further down this rabit hole?
I have a $20/month subscription to Claude AI, which I use mainly for writing powershell scripts, since I know nada about PowerShell. Also fixing Excel spreadsheet errors. No other real uses.
 

My Computer

System One

  • OS
    Windows 11 2H25
    Computer type
    PC/Desktop
    Manufacturer/Model
    DIY
    CPU
    AMD 9900X
    Motherboard
    MSI X870E Carbon
    Memory
    64 GB
    Graphics Card(s)
    AMD 9070 XT
    Sound Card
    built-in
    Monitor(s) Displays
    Dell 24"
    Hard Drives
    Sabrent 1 TB NVMe, 4 x SSD (need to check models), 4 x 3.5" HDD, 8-16 TB, all WD
    PSU
    Seasonic 850
    Case
    Fractal Design North XL (which I likw)
    Cooling
    Corsair AIO for CPU, fans for case
    Keyboard
    Das Keyboard 4
    Mouse
    Corsair M65 (white)
    Internet Speed
    1 TB download
    Browser
    Firefox
    Antivirus
    Bitdefender
    Other Info
    Also have Lenovo T14S laptop (me) and Lenovo Slim 71 (wife)
I’ve been taking on projects for sites and servers far beyond what I would have considered before. I used to have a GitHub Copilot subscription on VS Code, but with the recent pricing change, my estimated cost would have jumped from £40 of my old subscription to over £150 a month.

I switched to a $60 Cursor subscription, which seemed like the best value for my needs, but after just two days, I’d already used over 10% of my monthly credits. Yesterday, I ended up doing half my work with Google Antigravity on the £18.99 Google AI plan, so I’m starting to think that getting an Nvidia card and using Qwen or Deepseek locally might be a better long-term option.
 

My Computers

System One System Two

  • OS
    Windows 11 Workstation
    Computer type
    PC/Desktop
    Manufacturer/Model
    doofenshmirtz evil incorporated
    CPU
    Ryzen 9 5950X
    Motherboard
    Asus ROG Crosshair VIII Formula
    Memory
    Corsair Vengeance RGB PRO Black 64GB (4x16GB) 3600MHz AMD Ryzen Tuned DDR4
    Graphics Card(s)
    ASUS AMD Radeon RX 6900 XT 16GB ROG Strix LC OC
    Sound Card
    Sound BlasterX Katana
    Monitor(s) Displays
    3 x27" Dell U2724D & 1 x 34" Dell U3415W
    Hard Drives
    Samsung 980 Pro 1TB M.2 2280 PCI-e 4.0 x4 NVMe Solid State
    Drive
    PSU
    ASUS ROG THOR 850W 80 Plus Platinum
    Case
    ASUS ROG Strix Helios Midi-Tower ARGB Gaming Case
    Cooling
    ASUS ROG Strix LC Performance RGB AIO CPU Liquid Cooler - 360mm
    Keyboard
    Logi Ergo
    Mouse
    Logitech MX Vertical
    Internet Speed
    900/100 Mbps
    Browser
    Chrome
    Antivirus
    Windows Defender, Malwarebytes Pro
    Other Info
    HP M281 Printer
    Logitech Brio Stream webcam
    Yeti X mic
  • Operating System
    Windows 10
    Computer type
    Laptop
    Manufacturer/Model
    Surface Laptop
    CPU
    i7
I'm burning through credits so fast these days, I'm just investigating if using a local LLM may be the better option, @x509 did you get any further down this rabit hole?
I use CoPilot and Gemini and have no usage limites unlike ChatGPT
 

My Computer

System One

  • OS
    Windows 11 Pro
    Computer type
    PC/Desktop
    Manufacturer/Model
    Self build
    CPU
    Core i7-13700K
    Motherboard
    Asus TUF Gaming Plus WiFi Z790
    Memory
    64 GB Kingston Fury Beast DDR5
    Graphics Card(s)
    Gigabyte GeForce RTX 2060 Super Gaming OC 8G
    Sound Card
    Realtek S1200A
    Monitor(s) Displays
    Viewsonic VP2770 & Dell (secondary)
    Screen Resolution
    2560 x 1440
    Hard Drives
    Kingston KC3000 2TB NVME SSD & SATA HDDs & SSD
    PSU
    EVGA SuperNova G2 850W
    Case
    Nanoxia Deep Silence 1
    Cooling
    Noctua NH-D14
    Keyboard
    Microsoft Digital Media Pro
    Mouse
    Logitech Wireless
    Internet Speed
    80 Mb / s
    Browser
    Chrome
    Antivirus
    Defender, Malwarebytes Free & AdwCleaner
I use CoPilot and Gemini and have no usage limites unlike ChatGPT
Using it for projects in something like VS Code or Antigravity is a completely different experience from chatting with it on an unlimited web interface and has high costs and limits.
 

My Computers

System One System Two

  • OS
    Windows 11 Workstation
    Computer type
    PC/Desktop
    Manufacturer/Model
    doofenshmirtz evil incorporated
    CPU
    Ryzen 9 5950X
    Motherboard
    Asus ROG Crosshair VIII Formula
    Memory
    Corsair Vengeance RGB PRO Black 64GB (4x16GB) 3600MHz AMD Ryzen Tuned DDR4
    Graphics Card(s)
    ASUS AMD Radeon RX 6900 XT 16GB ROG Strix LC OC
    Sound Card
    Sound BlasterX Katana
    Monitor(s) Displays
    3 x27" Dell U2724D & 1 x 34" Dell U3415W
    Hard Drives
    Samsung 980 Pro 1TB M.2 2280 PCI-e 4.0 x4 NVMe Solid State
    Drive
    PSU
    ASUS ROG THOR 850W 80 Plus Platinum
    Case
    ASUS ROG Strix Helios Midi-Tower ARGB Gaming Case
    Cooling
    ASUS ROG Strix LC Performance RGB AIO CPU Liquid Cooler - 360mm
    Keyboard
    Logi Ergo
    Mouse
    Logitech MX Vertical
    Internet Speed
    900/100 Mbps
    Browser
    Chrome
    Antivirus
    Windows Defender, Malwarebytes Pro
    Other Info
    HP M281 Printer
    Logitech Brio Stream webcam
    Yeti X mic
  • Operating System
    Windows 10
    Computer type
    Laptop
    Manufacturer/Model
    Surface Laptop
    CPU
    i7
I hope this is the right forum for this question. The main question is in the thread topic. I've have had some success using Claude AI and ChatGPT, but I don't want to prompt for answers to really private topics.

Some additional questions:

Which local LLMs? What kind of hardware ?

How do results compare with results from Claude AI or ChatGPT?

Do you need to train the model on your local data?

What else ?
I mainly use Copilot augmented with Gemini & ChatGPT - all free versions. You soon hit the daily use limit for the free version of CGPT. Use them all with extreme caution since they can all make up facts and be confidently wrong! This is why I call them Artifical Idiots! If you have time to waste you can cross reference their answers and watch them argue with each other :lmao:
 

My Computer

System One

  • OS
    Windows 11 Pro
    Computer type
    PC/Desktop
    Manufacturer/Model
    Self build
    CPU
    Core i7-13700K
    Motherboard
    Asus TUF Gaming Plus WiFi Z790
    Memory
    64 GB Kingston Fury Beast DDR5
    Graphics Card(s)
    Gigabyte GeForce RTX 2060 Super Gaming OC 8G
    Sound Card
    Realtek S1200A
    Monitor(s) Displays
    Viewsonic VP2770 & Dell (secondary)
    Screen Resolution
    2560 x 1440
    Hard Drives
    Kingston KC3000 2TB NVME SSD & SATA HDDs & SSD
    PSU
    EVGA SuperNova G2 850W
    Case
    Nanoxia Deep Silence 1
    Cooling
    Noctua NH-D14
    Keyboard
    Microsoft Digital Media Pro
    Mouse
    Logitech Wireless
    Internet Speed
    80 Mb / s
    Browser
    Chrome
    Antivirus
    Defender, Malwarebytes Free & AdwCleaner
I'm burning through credits so fast these days, I'm just investigating if using a local LLM may be the better option, @x509 did you get any further down this rabit hole?

Consider that running a local AI would cost you a lot of money in term of hardware and then power to run it (AI is very compute intensive). Unless you have a very cheap electricity available, running a local one for similar usages would likely end up costing more in the long run.
ChatGPT credits are still very cheap, considering that OpenAI absorbs all that cost and is running at an immense loss, at least until the AI bubble burst.
 

My Computer

System One

  • OS
    Windows 11
    Computer type
    PC/Desktop
ChatGPT credits are still very cheap, considering that OpenAI absorbs all that cost and is running at an immense loss, at least until the AI bubble burst.
I could easily blow through the $100 monthly allowance in a week or less if I went ahead with some of the things I’d like to do, especially if the credits were free like on a self-hosted setup.


I ran Qwen3.5 this morning on my current system and it was okay-ish, hitting around 44 tokens/s while testing some of my usual tasks. If I swapped the GPU and PSU for an RTX 5090 and a 1200W PSU, it would definitely improve performance massively, and at under £3.5k, it could be cost-effective in the long run.
 

My Computers

System One System Two

  • OS
    Windows 11 Workstation
    Computer type
    PC/Desktop
    Manufacturer/Model
    doofenshmirtz evil incorporated
    CPU
    Ryzen 9 5950X
    Motherboard
    Asus ROG Crosshair VIII Formula
    Memory
    Corsair Vengeance RGB PRO Black 64GB (4x16GB) 3600MHz AMD Ryzen Tuned DDR4
    Graphics Card(s)
    ASUS AMD Radeon RX 6900 XT 16GB ROG Strix LC OC
    Sound Card
    Sound BlasterX Katana
    Monitor(s) Displays
    3 x27" Dell U2724D & 1 x 34" Dell U3415W
    Hard Drives
    Samsung 980 Pro 1TB M.2 2280 PCI-e 4.0 x4 NVMe Solid State
    Drive
    PSU
    ASUS ROG THOR 850W 80 Plus Platinum
    Case
    ASUS ROG Strix Helios Midi-Tower ARGB Gaming Case
    Cooling
    ASUS ROG Strix LC Performance RGB AIO CPU Liquid Cooler - 360mm
    Keyboard
    Logi Ergo
    Mouse
    Logitech MX Vertical
    Internet Speed
    900/100 Mbps
    Browser
    Chrome
    Antivirus
    Windows Defender, Malwarebytes Pro
    Other Info
    HP M281 Printer
    Logitech Brio Stream webcam
    Yeti X mic
  • Operating System
    Windows 10
    Computer type
    Laptop
    Manufacturer/Model
    Surface Laptop
    CPU
    i7
Back
Top Bottom