Can You Build Your Own AI at Home?
Yes. Here’s what you actually need — and what the tutorials leave out.
Most guides on this subject are written by people who read about it. This one is written by someone who built it, and the AI that runs on it. That distinction matters — the gap between how something looks in a tutorial and how it actually behaves is exactly where most people quit.
You can have a private AI running on your own hardware, reachable from every device in your house, with no monthly fee and no data leaving your network. It takes an afternoon. It will frustrate you once. By the end you will have something that is genuinely yours.
Let’s build it.
What You’re Actually Building
Before any commands, here’s the plain English version.
A large language model is the same type of AI that powers ChatGPT and Claude. The difference is that companies like OpenAI run those on their servers — your prompts travel to their hardware, get processed, and the response comes back. You are renting access to someone else’s computer, and everything you type passes through it.
What we are doing is running the model on your hardware. Your prompts never leave. Nobody logs them. The AI doesn’t get smarter from your conversations unless you want it to. You pay for the hardware once and run it as long as you want.
The trade-off is real and worth naming upfront: local models are not as capable as the frontier models from OpenAI and Anthropic. For most everyday tasks — summarizing, drafting, answering questions, helping you think through a problem — you will barely notice the difference. For deep multi-step reasoning or cutting-edge research tasks, you will feel the gap. That is the honest version. Most tutorials skip it.
Our Setup — What We’re Working With
Every build is different. Here is the hardware this article is based on, so you have a real baseline to compare against.
:::spec-box The Westminster Node This is the machine Silas runs on. Your results may vary — but this is what we know works.
| Component | Spec |
|---|---|
| CPU | Intel Core i7 (12th gen) |
| RAM | 16GB DDR5 |
| GPU | NVIDIA RTX 4060 Ti — 8GB VRAM |
| Storage | 1TB NVMe SSD |
| OS | Windows 11 |
| Location | Westminster, MA |
What this runs comfortably: Models up to 9B parameters at full quality. Multiple models loaded simultaneously. Overnight autonomous tasks while doing other work.
The honest constraint: 8GB VRAM is the real limiting factor. It runs well but leaves less headroom than a 12GB or 16GB card would. If you are buying hardware specifically for this, 12GB VRAM is the sweet spot in 2026. :::

Before You Start — What You Need in Place
This is the section most tutorials skip. If you hit a wall in the first ten minutes, it is almost always one of these.
The Minimums
A reasonably modern PC or laptop. Windows, Mac, or Linux all work. You do not need the hardware in our spec box above — that is what we use, not what you need to start. A machine from the last five years with 8GB of RAM will run smaller models. A machine with a dedicated GPU will run larger ones faster. The GPU matters more than the CPU for this.
About 20GB of free storage. Models are large files. The one we recommend in this article is around 5GB. You will want room for a couple more as you experiment. An SSD is strongly preferred over a spinning hard drive — model load times on a spinning drive are painful.
A terminal you are comfortable opening.
On Windows: PowerShell or Command Prompt. On Mac: Terminal.
On Linux: whatever you already use. You do not need to be a command
line expert. Every command in this article is copy-paste ready.
If you have never opened a terminal before, spend five minutes with it
first — open it, type echo hello, press Enter. That is the whole
skill level required here.
A stable internet connection for the initial download. After the first model download everything runs offline. But that first download is 5–8GB so a slow connection means a long wait. Coffee is appropriate.
Makes It Significantly Better
A dedicated GPU with 8GB+ VRAM. Not required, but this is the single hardware upgrade that changes the experience most. A GPU with 8GB VRAM runs models that are meaningfully smarter and faster than what a CPU alone can handle. If you are on a laptop with integrated graphics, start with a smaller model and see how it feels before deciding whether to upgrade.
Windows 11 or a recent macOS. Ollama works on older systems but the experience is smoother on current OS versions. If you are on Windows 10, it still works — just be prepared for one extra troubleshooting step around execution policies if PowerShell gives you grief.
Basic familiarity with installing software. If you can install an app from a website and run an installer, you have everything you need. This is not a developer-only project.
You Do NOT Need
- Any coding experience
- A cloud account of any kind
- Docker (we use it later in the series, not here)
- Python pre-installed (Ollama is self-contained)
- Administrator access beyond what you already have
If you have the minimums above, you are ready. Let’s go.

Step 1: Install Ollama
Ollama is the tool that makes this possible. It handles downloading models, running them, and exposing them as a local API your other software can talk to. Think of it as the engine under everything else.
Go to ollama.com and download the installer for your operating system. Windows, Mac, and Linux are all supported.
Run the installer. It takes about 90 seconds. When it finishes, Ollama runs quietly in the background — you will see a small icon in your system tray on Windows.
That’s the whole installation. There is no license key, no account creation, no subscription. It is free and open source.

Step 2: Choose Your First Model
This is where most tutorials give you bad advice. They either recommend the smallest model because it’s “easy to run” or the largest model because it “sounds impressive.” Neither is right.
The practical rule in 2026: match the model to your VRAM.
| VRAM Available | Recommended Model | Why |
|---|---|---|
| 4–6GB | Gemma 3 4B or Llama 3.2 3B | Fast, capable, fits comfortably |
| 8GB | Qwen 3 8B or Llama 3.1 8B | Strong quality, our daily driver |
| 12GB+ | Llama 3.1 70B (quantized) | Near-frontier quality locally |
If you do not have a dedicated GPU, do not worry — Ollama runs on CPU too. It will be slower, but it works. Start with a 3B model and go from there.
Our recommendation if you have 8GB VRAM: Start with qwen3:8b.
It is what we run daily. It handles complex questions well, responds in
seconds, and fits comfortably in 8GB. It is the model Silas uses for
most tasks when operating locally.
To download and run it, open a terminal and type:
ollama run qwen3:8b
The first run downloads the model — about 5GB. After that it loads from your local drive in seconds. You will see a prompt appear. Type something. It will respond.

That is your local AI running. Everything from here is building on top of it.
Step 3: Make It Reach Every Device
Right now your AI is only accessible from the computer it’s running on. That is useful but limited. The goal is to make it available from your phone, your laptop, anywhere you are — without exposing it to the internet.
The tool for this is Tailscale.
Tailscale creates a private encrypted network between all your devices. Your phone, your laptop, and your AI server all behave as if they are on the same local network, even when they are not. Your server never touches the public internet. Nothing is exposed that should not be.
Setup takes about five minutes:
- Go to tailscale.com and create a free account
- Install Tailscale on the machine running Ollama
- Install Tailscale on your phone and any other devices you want connected
- Sign in on all devices with the same account
Once installed, Tailscale assigns each device a private IP address —
something like 100.x.x.x. From any device on your Tailscale network,
you can reach Ollama at:
http://[your-server-tailscale-ip]:11434
To make Ollama listen on that address instead of just localhost, run:
OLLAMA_HOST=0.0.0.0 ollama serve
On Windows, set OLLAMA_HOST=0.0.0.0 as an environment variable and
restart Ollama.
Now open a chat app that supports custom Ollama endpoints — we use Open WebUI (more on that in the next article) — point it at your Tailscale IP, and you have a private AI accessible from anywhere you take your phone.

Step 4: Make It Persistent
One thing tutorials skip: if your machine goes to sleep, your AI goes offline.
On Windows, go to Settings → System → Power → Screen and sleep and set sleep to Never when plugged in. Your machine will run continuously, Ollama will stay available, and every device on your Tailscale network will have access whenever they need it.
For the full setup we run — where the AI handles overnight tasks, monitors the house, and prepares morning briefings — the machine runs 24 hours a day. Our electricity cost for this is negligible compared to any cloud subscription.
What’s Going to Frustrate You
Every honest tutorial has this section. Here are the things that will catch you:
The first model download takes longer than you expect. 5–8GB over a typical home connection is 10–20 minutes. This is normal. Every subsequent load is instant because it reads from your drive.
Ollama on Windows sometimes doesn’t start automatically after reboot.
Set it to run at startup: search for “Task Scheduler” in Windows,
create a basic task that runs ollama serve at login. Or just add
it to your startup folder. It sounds annoying. It takes three minutes.
The first response feels slow. The model is loading into VRAM on first run. After that initial load, responses come back in seconds. If every response feels slow, your model is too large for your available VRAM — go down a size.
Open WebUI won’t connect.
99% of the time this is the Ollama host setting. Ollama by default only
listens on localhost. If you’re connecting from another device or
container, you need OLLAMA_HOST=0.0.0.0. Set it, restart Ollama,
try again.
Your phone can’t reach the server.
Check that Tailscale is running on both devices and both show as
connected in the Tailscale admin panel. If Ollama is running but
unreachable, re-check the OLLAMA_HOST setting above.
Where You Are Now
If you followed this through, you have:
- A local AI model running on your hardware
- No data leaving your network
- Access from every device you own via Tailscale
- A persistent server that runs whether you’re sitting at it or not
This is the foundation. Everything we build from here — memory, tools, autonomous tasks, voice, home automation integration — runs on top of what you just set up.
The next article covers giving your AI a proper interface with Open WebUI, and then the one after that is where things get genuinely interesting: giving it memory so it actually knows who you are.
Our Stack, If You Want to See the Full Picture
This article covers the foundation layer. Our complete setup goes further — local models for different tasks, memory that persists across sessions, overnight autonomous workflows, and integration with the physical space. We document all of it here, in the same voice, written by the people and the AI who built it.
In Practice Media — built by people who did it.
Next in the series: Setting Up Open WebUI — A Proper Interface for Your Local AI