Install DeepSeek R1 Locally: Mac, Windows & Linux
Quick Summary: DeepSeek R1 is one of the most powerful open-source reasoning models in 2026, matching GPT-4-class performance at zero licensing cost. This guide covers everything you need to run DeepSeek R1 on your own hardware β from choosing the right model size to installing Ollama on Windows, macOS, Linux, Android, and iOS β in under 30 minutes.
What Is DeepSeek R1?
DeepSeek R1 is a large language model developed by DeepSeek AI and released under the permissive MIT license. What makes it extraordinary is the training methodology: instead of relying solely on supervised fine-tuning, DeepSeek R1 uses reinforcement learning from scratch β a technique that allows the model to develop genuine chain-of-thought reasoning capabilities without being explicitly taught how to think step by step.
Released in early 2025 and continuously improved throughout 2026, DeepSeek R1 achieved benchmark scores comparable to OpenAI's o1 model on math, code, and science reasoning tasks β while being fully open-source and free to run locally. The full 671B parameter model uses a Mixture-of-Experts (MoE) architecture, meaning it only activates around 37B parameters per forward pass, which makes inference more efficient than a dense 671B model would be.
Beyond the flagship model, DeepSeek released a family of smaller distilled variants β trained by transferring knowledge from the 671B model into smaller dense models based on Llama and Qwen architectures. These distilled models bring surprisingly strong reasoning capability to hardware as modest as a laptop with 16GB RAM.
DeepSeek R1 Model Variants
DeepSeek R1 comes in multiple sizes to fit different hardware configurations. Here's the complete lineup:
| Model | Base | RAM (CPU) | VRAM (GPU) | Disk |
|---|---|---|---|---|
| DeepSeek-R1-Distill-1.5B | Qwen2.5 | 4GB | 2GB | 1.1GB |
| DeepSeek-R1-Distill-7B | Qwen2.5 | 8GB | 4GB | 4.7GB |
| DeepSeek-R1-Distill-8B | Llama 3 | 8GB | 5GB | 4.9GB |
| DeepSeek-R1-Distill-14B | Qwen2.5 | 16GB | 9GB | 9.0GB |
| DeepSeek-R1-Distill-32B | Qwen2.5 | 32GB | 20GB | 20GB |
| DeepSeek-R1-Distill-70B | Llama 3.3 | 64GB | 40GB | 43GB |
| DeepSeek-R1 671B (Full) | MoE | 256GB+ | Multi-GPU | 400GB+ |
Best Pick for Most Users: DeepSeek-R1-Distill-14B
If you have a mid-range PC or laptop with 16GB RAM or an 8GB GPU, the 14B distilled model is the sweet spot. It delivers reasoning quality far above its size β consistently outperforming many 30B+ models from other families on math and logic tasks. This is the model most developers and researchers settle on for daily use in 2026.
Step 1: Install Ollama (All Platforms)
Ollama is the easiest way to run DeepSeek R1 locally. It handles model downloading, GPU acceleration, quantization, and serving an API β all with a single command. Here's how to install it on each platform:
macOS
Works on both Intel and Apple Silicon (M1βM4). Apple Silicon gets Metal GPU acceleration automatically β dramatically faster than CPU-only inference.
brew install ollama
# or download the .dmg from
# ollama.com/download/mac
After install, Ollama starts as a menubar app. Look for the llama icon in your status bar.
Windows
Supports Windows 10/11 x64. Automatically detects NVIDIA and AMD GPUs. Requires no special setup for CUDA β Ollama bundles what it needs.
# Download OllamaSetup.exe from:
# ollama.com/download/windows
# Run as Administrator
Ollama runs as a Windows service after installation. Check the system tray icon.
Linux
One-command install. Supports Ubuntu, Debian, Fedora, Arch, and more. Works with NVIDIA CUDA, AMD ROCm, and CPU-only. Auto-detects GPU drivers.
curl -fsSL \
https://ollama.com/install.sh \
| sh
Ollama installs as a systemd service. Auto-starts on boot.
After installation, verify Ollama is running by opening a terminal and typing:
ollama list
# Output: NAME ID SIZE MODIFIED (empty if no models downloaded yet)
Step 2: Pull and Run DeepSeek R1
With Ollama installed, running DeepSeek R1 is a single command. Ollama downloads the model from its CDN and starts a chat session automatically:
# Pull and immediately start chatting β choose your size:
ollama run deepseek-r1:1.5b # Tiny β any device
ollama run deepseek-r1:7b # Balanced β 8GB RAM
ollama run deepseek-r1:8b # Llama-based β 8GB RAM
ollama run deepseek-r1:14b # Best value β 16GB RAM
ollama run deepseek-r1:32b # Professional β 32GB RAM
# To only download without running immediately:
ollama pull deepseek-r1:14b
The first run downloads the model file (which can be several GB). Subsequent runs load directly from local storage with no download needed. Ollama will automatically use your GPU for acceleration if available.
Download Speed Estimate with VPN07
Step 3: Add a Web UI (Optional but Recommended)
Ollama runs as an API server, but for a ChatGPT-like interface, install Open WebUI. It connects to Ollama's local API and gives you a polished browser interface with conversation history, model switching, and file uploads.
If you have Docker installed:
docker run -d -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui \
ghcr.io/open-webui/open-webui:main
After the container starts, open http://localhost:3000 in your browser. Select DeepSeek R1 from the model dropdown, and you have a full-featured local AI chat interface. No data leaves your machine.
Alternatively, without Docker, install via pip:
pip install open-webui
open-webui serve
Step 4: Run DeepSeek R1 on Android
Running DeepSeek R1 on Android requires either a powerful flagship phone or a rooted device with Termux for the larger models. For casual use, there are two approaches:
Option 1: PocketPal AI App (Easy)
PocketPal AI is available on Google Play and supports DeepSeek R1 distill models in GGUF format. Download the app, then browse the built-in model library to pull DeepSeek-R1-Distill-1.5B or 7B. The 1.5B model runs smoothly on most modern Android phones with 6GB+ RAM.
Best for: Casual use, no technical setup required. Available on Google Play Store.
Option 2: Termux + Ollama (Advanced)
On rooted Android devices or high-end phones (Snapdragon 8 Gen 3+), you can install Termux from F-Droid and then install Ollama inside it. This gives you full control and lets you run larger models like the 7B with decent speed on flagship hardware.
# In Termux:
pkg install ollama
ollama run deepseek-r1:1.5b
Step 5: Run DeepSeek R1 on iPhone / iPad
iOS and iPadOS support local LLM inference through dedicated apps that leverage Apple's Neural Engine and the MLX framework. Here are the best options in 2026:
Enchanted (Free, App Store)
Enchanted is the most popular iOS frontend for Ollama. If you already have Ollama running on a Mac on the same Wi-Fi network, Enchanted can connect to it remotely. Set your Mac's Ollama to listen on your network IP by running OLLAMA_HOST=0.0.0.0 ollama serve, then point Enchanted at your Mac's IP address. You can then run DeepSeek R1 from your iPhone while the heavy lifting happens on your Mac.
LM Studio Mobile (iOS)
LM Studio's iOS app uses Apple's MLX framework to run quantized models directly on-device. On iPhone 15 Pro and newer (with 8GB RAM), the 1.5B and 3B distill models run comfortably. The 7B model runs on iPad Pro M4 with excellent performance. Search the app's built-in model browser for "DeepSeek-R1-Distill" to find GGUF-compatible models.
Step 6: Use DeepSeek R1 via API
Once Ollama is running, it exposes a REST API on http://localhost:11434. This is fully compatible with the OpenAI API format, meaning any application built for ChatGPT's API will work with your local DeepSeek R1 with just a base URL change:
# Python example (openai SDK):
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama" # any string works
)
response = client.chat.completions.create(
model="deepseek-r1:14b",
messages=[{"role": "user", "content": "Solve: what is 17 Γ 23?"}]
)
print(response.choices[0].message.content)
This makes DeepSeek R1 a powerful drop-in replacement for the OpenAI API in your existing applications β and you're paying $0 per token.
Performance Tips
To get the best performance from DeepSeek R1 locally, follow these tuning tips:
π‘ Use GPU Layers
By default, Ollama automatically offloads as many layers as possible to your GPU. To verify GPU is being used, run ollama run deepseek-r1:7b and check if tokens generate quickly. If it's slow (less than 5 tokens/second), you may be running on CPU only β check your GPU driver installation.
π‘ Increase Context Length
DeepSeek R1 supports up to 128K context, but Ollama defaults to a smaller window. For complex reasoning tasks that need more context, set the context window: ollama run deepseek-r1:14b --num-ctx 32768. Higher values use more VRAM.
π‘ Keep Model Warm in Memory
Ollama unloads models from GPU memory after 5 minutes of inactivity by default. To keep the model loaded (faster response for frequent use), set: OLLAMA_KEEP_ALIVE=24h ollama serve. This reserves VRAM but eliminates the cold-start delay.
Troubleshooting Common Issues
Problem: Download hangs at 0% or fails halfway
Cause: Ollama's CDN may be slow or blocked in your region. Fix: Connect to VPN07 before running ollama pull. VPN07's 1000Mbps bandwidth routes through optimized paths to the Ollama CDN. Alternatively, download GGUF files directly from HuggingFace and import them: ollama create deepseek-r1 -f Modelfile.
Problem: Very slow generation (CPU only)
Cause: GPU not detected or driver issue. Fix: On NVIDIA, ensure the CUDA toolkit is installed and nvidia-smi shows your GPU. On AMD, install ROCm. On Apple Silicon, ensure you installed Ollama for macOS (not Linux via Homebrew in some edge cases). Restart the Ollama service after fixing drivers.
Problem: "Out of memory" error during inference
Cause: Model too large for your VRAM. Fix: Switch to a smaller quantized variant. Ollama uses Q4_K_M quantization by default, which is already quite compressed. Try the next smaller model size (e.g., switch from 14B to 7B). Alternatively, add more RAM if your GPU supports system memory offloading.
Problem: Model gives strange XML-like output (thinking tokens)
Cause: DeepSeek R1 uses chain-of-thought reasoning wrapped in <think> tags. This is normal and intentional β the model "thinks aloud" before answering. Fix: This is a feature, not a bug. The final answer appears after the </think> block. In Open WebUI, you can configure it to hide the thinking output and only show the final response.
Why DeepSeek R1 Is the Top Open-Source Reasoning Model
The key insight behind DeepSeek R1's quality advantage over other open-source models is its training process. Most LLMs are trained primarily with supervised fine-tuning on human-labeled data. DeepSeek R1 used Group Relative Policy Optimization (GRPO) β a reinforcement learning technique where the model receives rewards based on whether its answers are correct, not whether they match a labeled dataset.
This RL-based training produced an emergent behavior: the model spontaneously learned to "think" step by step, reconsider wrong paths, and self-correct β without being explicitly taught these skills. This mirrors how students naturally develop problem-solving ability through practice and feedback, rather than memorizing solutions.
On AIME 2024 (a prestigious math competition benchmark), DeepSeek R1 achieved a pass rate of 79.8% β compared to OpenAI o1's 79.2%. On Codeforces competitive programming benchmarks, it ranks in the top 99th percentile. These numbers, from an open-source model with an MIT license that you can run on your own hardware, represent a genuine milestone in AI accessibility.
DeepSeek R1 vs Other Open-Source Models (2026)
MATH-500 benchmark scores. DeepSeek-R1-14B outperforms Llama 3.3-70B despite being 5x smaller.
Complete DeepSeek R1 Setup Checklist
Use this checklist to verify your DeepSeek R1 installation is complete and optimized for your hardware:
DeepSeek R1 running locally gives you a powerful reasoning engine with zero ongoing costs. Whether you're using it as a math tutor, code debugger, research assistant, or integrated into your own applications via the API, the key advantage over cloud AI is complete data privacy and no rate limits. Your conversations never leave your machine, and you can run unlimited queries 24/7 without worrying about token costs.
As you get more comfortable with DeepSeek R1 locally, explore advanced Ollama features like Modelfiles for custom system prompts, the REST API for application integration, and LiteLLM as a proxy layer if you want to manage multiple local models through a single endpoint. The local AI ecosystem in 2026 is rich and mature β Ollama makes DeepSeek R1 accessible to everyone from complete beginners to advanced infrastructure engineers.
Inference Speed by Platform (Tokens/Second)
Expected generation speed for DeepSeek R1 distill models on different hardware configurations in 2026:
| Hardware | R1-7B (t/s) | R1-14B (t/s) | R1-32B (t/s) | Notes |
|---|---|---|---|---|
| Apple M4 Pro 24GB | 55β65 | 30β40 | 12β18 | Metal GPU, unified memory |
| Apple M3 Max 48GB | 60β70 | 35β45 | 22β28 | Best laptop performance |
| RTX 4090 24GB | 70β90 | 35β50 | 14β20 | Top consumer GPU |
| RTX 4070 12GB | 45β60 | 20β30 | CPU offload | Mid-range GPU |
| AMD RX 7900 XTX 24GB | 55β75 | 28β40 | 12β18 | ROCm on Linux/Windows |
| CPU only (Ryzen 9 64GB) | 3β6 | 1β3 | 0.5β1 | Slow but functional |
Pro Tip: For the best cost-per-performance ratio on Windows, the RTX 4070 (12GB) paired with 32GB system RAM allows Ollama to split the R1-14B model between GPU VRAM and RAM β running it noticeably faster than pure CPU mode while staying affordable.
Note: These are approximate values for Q4_K_M quantization. Higher quantization (Q8) is ~40% slower. CPU mode speeds vary significantly by RAM bandwidth. Apple Silicon's unified memory architecture makes it particularly efficient for large models.
Quick Reference: All DeepSeek R1 Ollama Commands
Keep this reference handy for all common DeepSeek R1 operations via Ollama:
# ββ Installation ββββββββββββββββββββββββββββββββββββββ
brew install ollama # macOS
curl -fsSL https://ollama.com/install.sh | sh # Linux
# Download OllamaSetup.exe from ollama.com # Windows
# ββ Download Models βββββββββββββββββββββββββββββββββββ
ollama pull deepseek-r1:1.5b
ollama pull deepseek-r1:7b
ollama pull deepseek-r1:8b
ollama pull deepseek-r1:14b
ollama pull deepseek-r1:32b
# ββ Run Models ββββββββββββββββββββββββββββββββββββββββ
ollama run deepseek-r1:14b
ollama run deepseek-r1:14b "Solve: prove that sqrt(2) is irrational"
ollama run deepseek-r1:14b --num-ctx 32768
# ββ Management βββββββββββββββββββββββββββββββββββββββββ
ollama list # show downloaded models
ollama ps # show running models
ollama rm deepseek-r1:7b # remove a model
ollama show deepseek-r1:14b # show model info
Frequently Asked Questions
Q: Is DeepSeek R1 safe to run locally? Are there privacy concerns?
Yes, running DeepSeek R1 locally through Ollama is completely private. The model runs entirely on your hardware β no data is sent to DeepSeek's servers, no telemetry, and no internet connectivity required after the initial model download. Your prompts, conversations, and outputs stay exclusively on your machine. For sensitive applications, local deployment is actually the most secure option available.
Q: Can I fine-tune DeepSeek R1 on my own data?
Yes β the MIT license permits fine-tuning and modifying DeepSeek R1. For fine-tuning, use the HuggingFace transformers library with PEFT/LoRA adapters for parameter-efficient fine-tuning. The distilled variants (7B, 8B, 14B) are practical to fine-tune on consumer hardware. A full fine-tuning run of DeepSeek-R1-Distill-7B on a domain-specific dataset can be done on a single RTX 4090 in 6β12 hours with LoRA.
Q: What's the difference between DeepSeek-R1 and DeepSeek-R1-Zero?
DeepSeek-R1-Zero was trained purely with reinforcement learning from scratch, with no supervised fine-tuning on human demonstrations. It shows impressive emergent reasoning but sometimes produces outputs in mixed languages or with unusual formatting. DeepSeek-R1 adds a supervised fine-tuning stage after the RL training, making it much more practical for real-world use β better output formatting, consistent language, and more reliable instruction following. For production use, always use DeepSeek-R1 (not Zero).
Q: How does DeepSeek R1 compare to ChatGPT for daily use?
For math, science, and coding tasks, DeepSeek R1-14B is competitive with or exceeds GPT-4o. For creative writing, casual conversation, and tasks requiring broad world knowledge, GPT-4o and Claude still have an edge due to their larger proprietary training datasets. DeepSeek R1's advantage is that it's free to run locally at zero per-token cost, fully private, and available 24/7 regardless of service outages or rate limits.
Q: Why is VPN07 recommended for downloading DeepSeek R1?
DeepSeek R1 models are distributed via the Ollama CDN and HuggingFace. In some countries and network environments, these CDN servers are slow or throttled. VPN07 routes your download traffic through our 1000Mbps global network, connecting you to the fastest available CDN nodes. A 9GB model download that might take 2+ hours without VPN can complete in under 10 minutes with VPN07. We've specifically optimized our routing for HuggingFace and popular developer resources.
VPN07 β Download DeepSeek R1 at Full Speed
1000Mbps Β· 70+ Countries Β· Trusted Since 2015
Downloading a 9GBβ20GB model from Ollama's CDN can take hours if your connection is throttled or routed through congested servers. VPN07 routes your traffic through our 1000Mbps network, turning slow CDN routes into high-speed channels. We've been helping developers access international services reliably for over 10 years, across 70+ countries. Try risk-free with our 30-day money-back guarantee β only $1.5/month.
Where to Go Next
Download More LLMs
Explore all top open-source models with download links and hardware guides
Visit AI Hub βSpeed Up Downloads
Use VPN07 to download large LLM files at 1000Mbps from anywhere
Try VPN07 Free βRelated Articles
Run Llama 4 Locally: All Platforms Install Guide 2026
Install Meta's Llama 4 on Windows, Mac, Linux, Android & iOS. Scout and Maverick MoE models explained with step-by-step Ollama setup.
Read More βQwen3.5 Ollama Setup: Run 0.8B to 35B Free on PC & Mac
Complete Ollama guide for Qwen3.5. Install on Windows, Mac, Linux. Run models from 0.8B to 35B with step-by-step instructions.
Read More β