Install DeepSeek R1 Locally: Mac, Windows & Linux

Quick Summary: DeepSeek R1 is one of the most powerful open-source reasoning models in 2026, matching GPT-4-class performance at zero licensing cost. This guide covers everything you need to run DeepSeek R1 on your own hardware — from choosing the right model size to installing Ollama on Windows, macOS, Linux, Android, and iOS — in under 30 minutes.

What Is DeepSeek R1?

DeepSeek R1 is a large language model developed by DeepSeek AI and released under the permissive MIT license. What makes it extraordinary is the training methodology: instead of relying solely on supervised fine-tuning, DeepSeek R1 uses reinforcement learning from scratch — a technique that allows the model to develop genuine chain-of-thought reasoning capabilities without being explicitly taught how to think step by step.

Released in early 2025 and continuously improved throughout 2026, DeepSeek R1 achieved benchmark scores comparable to OpenAI's o1 model on math, code, and science reasoning tasks — while being fully open-source and free to run locally. The full 671B parameter model uses a Mixture-of-Experts (MoE) architecture, meaning it only activates around 37B parameters per forward pass, which makes inference more efficient than a dense 671B model would be.

Beyond the flagship model, DeepSeek released a family of smaller distilled variants — trained by transferring knowledge from the 671B model into smaller dense models based on Llama and Qwen architectures. These distilled models bring surprisingly strong reasoning capability to hardware as modest as a laptop with 16GB RAM.

671B

Max Params

MIT

License

Model Sizes

Free

Open Source

DeepSeek R1 Model Variants

DeepSeek R1 comes in multiple sizes to fit different hardware configurations. Here's the complete lineup:

Model	Base	RAM (CPU)	VRAM (GPU)	Disk
DeepSeek-R1-Distill-1.5B	Qwen2.5	4GB	2GB	1.1GB
DeepSeek-R1-Distill-7B	Qwen2.5	8GB	4GB	4.7GB
DeepSeek-R1-Distill-8B	Llama 3	8GB	5GB	4.9GB
DeepSeek-R1-Distill-14B	Qwen2.5	16GB	9GB	9.0GB
DeepSeek-R1-Distill-32B	Qwen2.5	32GB	20GB	20GB
DeepSeek-R1-Distill-70B	Llama 3.3	64GB	40GB	43GB
DeepSeek-R1 671B (Full)	MoE	256GB+	Multi-GPU	400GB+

Best Pick for Most Users: DeepSeek-R1-Distill-14B

If you have a mid-range PC or laptop with 16GB RAM or an 8GB GPU, the 14B distilled model is the sweet spot. It delivers reasoning quality far above its size — consistently outperforming many 30B+ models from other families on math and logic tasks. This is the model most developers and researchers settle on for daily use in 2026.

Step 1: Install Ollama (All Platforms)

Ollama is the easiest way to run DeepSeek R1 locally. It handles model downloading, GPU acceleration, quantization, and serving an API — all with a single command. Here's how to install it on each platform:

macOS

Works on both Intel and Apple Silicon (M1–M4). Apple Silicon gets Metal GPU acceleration automatically — dramatically faster than CPU-only inference.

brew install ollama
# or download the .dmg from
# ollama.com/download/mac

After install, Ollama starts as a menubar app. Look for the llama icon in your status bar.

Windows

Supports Windows 10/11 x64. Automatically detects NVIDIA and AMD GPUs. Requires no special setup for CUDA — Ollama bundles what it needs.

# Download OllamaSetup.exe from:
# ollama.com/download/windows
# Run as Administrator

Ollama runs as a Windows service after installation. Check the system tray icon.

Linux

One-command install. Supports Ubuntu, Debian, Fedora, Arch, and more. Works with NVIDIA CUDA, AMD ROCm, and CPU-only. Auto-detects GPU drivers.

curl -fsSL \
  https://ollama.com/install.sh \
  | sh

Ollama installs as a systemd service. Auto-starts on boot.

After installation, verify Ollama is running by opening a terminal and typing:

ollama list

# Output: NAME ID SIZE MODIFIED (empty if no models downloaded yet)

Step 2: Pull and Run DeepSeek R1

With Ollama installed, running DeepSeek R1 is a single command. Ollama downloads the model from its CDN and starts a chat session automatically:

# Pull and immediately start chatting — choose your size:

ollama run deepseek-r1:1.5b # Tiny — any device

ollama run deepseek-r1:7b # Balanced — 8GB RAM

ollama run deepseek-r1:8b # Llama-based — 8GB RAM

ollama run deepseek-r1:14b # Best value — 16GB RAM

ollama run deepseek-r1:32b # Professional — 32GB RAM

# To only download without running immediately:

ollama pull deepseek-r1:14b

The first run downloads the model file (which can be several GB). Subsequent runs load directly from local storage with no download needed. Ollama will automatically use your GPU for acceleration if available.

Download Speed Estimate with VPN07

7B Model

~4.7GB

~2 min @ VPN07

14B Model

~9GB

~4 min @ VPN07

32B Model

~20GB

~10 min @ VPN07

Step 3: Add a Web UI (Optional but Recommended)

Ollama runs as an API server, but for a ChatGPT-like interface, install Open WebUI. It connects to Ollama's local API and gives you a polished browser interface with conversation history, model switching, and file uploads.

If you have Docker installed:

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

After the container starts, open http://localhost:3000 in your browser. Select DeepSeek R1 from the model dropdown, and you have a full-featured local AI chat interface. No data leaves your machine.

Alternatively, without Docker, install via pip:

pip install open-webui
open-webui serve

Step 4: Run DeepSeek R1 on Android

Running DeepSeek R1 on Android requires either a powerful flagship phone or a rooted device with Termux for the larger models. For casual use, there are two approaches:

Option 1: PocketPal AI App (Easy)

PocketPal AI is available on Google Play and supports DeepSeek R1 distill models in GGUF format. Download the app, then browse the built-in model library to pull DeepSeek-R1-Distill-1.5B or 7B. The 1.5B model runs smoothly on most modern Android phones with 6GB+ RAM.

Best for: Casual use, no technical setup required. Available on Google Play Store.

Option 2: Termux + Ollama (Advanced)

On rooted Android devices or high-end phones (Snapdragon 8 Gen 3+), you can install Termux from F-Droid and then install Ollama inside it. This gives you full control and lets you run larger models like the 7B with decent speed on flagship hardware.

# In Termux:
pkg install ollama
ollama run deepseek-r1:1.5b

Step 5: Run DeepSeek R1 on iPhone / iPad

iOS and iPadOS support local LLM inference through dedicated apps that leverage Apple's Neural Engine and the MLX framework. Here are the best options in 2026:

Enchanted (Free, App Store)

Enchanted is the most popular iOS frontend for Ollama. If you already have Ollama running on a Mac on the same Wi-Fi network, Enchanted can connect to it remotely. Set your Mac's Ollama to listen on your network IP by running OLLAMA_HOST=0.0.0.0 ollama serve, then point Enchanted at your Mac's IP address. You can then run DeepSeek R1 from your iPhone while the heavy lifting happens on your Mac.

LM Studio Mobile (iOS)

LM Studio's iOS app uses Apple's MLX framework to run quantized models directly on-device. On iPhone 15 Pro and newer (with 8GB RAM), the 1.5B and 3B distill models run comfortably. The 7B model runs on iPad Pro M4 with excellent performance. Search the app's built-in model browser for "DeepSeek-R1-Distill" to find GGUF-compatible models.

Step 6: Use DeepSeek R1 via API

Once Ollama is running, it exposes a REST API on http://localhost:11434. This is fully compatible with the OpenAI API format, meaning any application built for ChatGPT's API will work with your local DeepSeek R1 with just a base URL change:

# Python example (openai SDK):

from openai import OpenAI

client = OpenAI(

base_url="http://localhost:11434/v1",

api_key="ollama" # any string works

)

response = client.chat.completions.create(

model="deepseek-r1:14b",

messages=[{"role": "user", "content": "Solve: what is 17 × 23?"}]

)

print(response.choices[0].message.content)

This makes DeepSeek R1 a powerful drop-in replacement for the OpenAI API in your existing applications — and you're paying $0 per token.

Performance Tips

To get the best performance from DeepSeek R1 locally, follow these tuning tips:

💡 Use GPU Layers

By default, Ollama automatically offloads as many layers as possible to your GPU. To verify GPU is being used, run ollama run deepseek-r1:7b and check if tokens generate quickly. If it's slow (less than 5 tokens/second), you may be running on CPU only — check your GPU driver installation.

💡 Increase Context Length

DeepSeek R1 supports up to 128K context, but Ollama defaults to a smaller window. For complex reasoning tasks that need more context, set the context window: ollama run deepseek-r1:14b --num-ctx 32768. Higher values use more VRAM.

💡 Keep Model Warm in Memory

Ollama unloads models from GPU memory after 5 minutes of inactivity by default. To keep the model loaded (faster response for frequent use), set: OLLAMA_KEEP_ALIVE=24h ollama serve. This reserves VRAM but eliminates the cold-start delay.

Troubleshooting Common Issues

Problem: Download hangs at 0% or fails halfway

Cause: Ollama's CDN may be slow or blocked in your region. Fix: Connect to VPN07 before running ollama pull. VPN07's 1000Mbps bandwidth routes through optimized paths to the Ollama CDN. Alternatively, download GGUF files directly from HuggingFace and import them: ollama create deepseek-r1 -f Modelfile.

Problem: Very slow generation (CPU only)

Cause: GPU not detected or driver issue. Fix: On NVIDIA, ensure the CUDA toolkit is installed and nvidia-smi shows your GPU. On AMD, install ROCm. On Apple Silicon, ensure you installed Ollama for macOS (not Linux via Homebrew in some edge cases). Restart the Ollama service after fixing drivers.

Problem: "Out of memory" error during inference

Cause: Model too large for your VRAM. Fix: Switch to a smaller quantized variant. Ollama uses Q4_K_M quantization by default, which is already quite compressed. Try the next smaller model size (e.g., switch from 14B to 7B). Alternatively, add more RAM if your GPU supports system memory offloading.

Problem: Model gives strange XML-like output (thinking tokens)

Cause: DeepSeek R1 uses chain-of-thought reasoning wrapped in <think> tags. This is normal and intentional — the model "thinks aloud" before answering. Fix: This is a feature, not a bug. The final answer appears after the </think> block. In Open WebUI, you can configure it to hide the thinking output and only show the final response.

Why DeepSeek R1 Is the Top Open-Source Reasoning Model

The key insight behind DeepSeek R1's quality advantage over other open-source models is its training process. Most LLMs are trained primarily with supervised fine-tuning on human-labeled data. DeepSeek R1 used Group Relative Policy Optimization (GRPO) — a reinforcement learning technique where the model receives rewards based on whether its answers are correct, not whether they match a labeled dataset.

This RL-based training produced an emergent behavior: the model spontaneously learned to "think" step by step, reconsider wrong paths, and self-correct — without being explicitly taught these skills. This mirrors how students naturally develop problem-solving ability through practice and feedback, rather than memorizing solutions.

On AIME 2024 (a prestigious math competition benchmark), DeepSeek R1 achieved a pass rate of 79.8% — compared to OpenAI o1's 79.2%. On Codeforces competitive programming benchmarks, it ranks in the top 99th percentile. These numbers, from an open-source model with an MIT license that you can run on your own hardware, represent a genuine milestone in AI accessibility.

DeepSeek R1 vs Other Open-Source Models (2026)

DeepSeek-R1-14B

82%

Llama 3.3-70B

75%

Mistral-22B

68%

Gemma 3-27B

71%

MATH-500 benchmark scores. DeepSeek-R1-14B outperforms Llama 3.3-70B despite being 5x smaller.

Complete DeepSeek R1 Setup Checklist

Use this checklist to verify your DeepSeek R1 installation is complete and optimized for your hardware:

Ollama installed and running (ollama list works)

Correct model size selected for your hardware

GPU acceleration verified (fast token generation)

REST API accessible at localhost:11434

Open WebUI running at localhost:3000 (optional)

VPN07 connected for downloads when needed

DeepSeek R1 running locally gives you a powerful reasoning engine with zero ongoing costs. Whether you're using it as a math tutor, code debugger, research assistant, or integrated into your own applications via the API, the key advantage over cloud AI is complete data privacy and no rate limits. Your conversations never leave your machine, and you can run unlimited queries 24/7 without worrying about token costs.

As you get more comfortable with DeepSeek R1 locally, explore advanced Ollama features like Modelfiles for custom system prompts, the REST API for application integration, and LiteLLM as a proxy layer if you want to manage multiple local models through a single endpoint. The local AI ecosystem in 2026 is rich and mature — Ollama makes DeepSeek R1 accessible to everyone from complete beginners to advanced infrastructure engineers.

Inference Speed by Platform (Tokens/Second)

Expected generation speed for DeepSeek R1 distill models on different hardware configurations in 2026:

Hardware	R1-7B (t/s)	R1-14B (t/s)	R1-32B (t/s)	Notes
Apple M4 Pro 24GB	55–65	30–40	12–18	Metal GPU, unified memory
Apple M3 Max 48GB	60–70	35–45	22–28	Best laptop performance
RTX 4090 24GB	70–90	35–50	14–20	Top consumer GPU
RTX 4070 12GB	45–60	20–30	CPU offload	Mid-range GPU
AMD RX 7900 XTX 24GB	55–75	28–40	12–18	ROCm on Linux/Windows
CPU only (Ryzen 9 64GB)	3–6	1–3	0.5–1	Slow but functional

Pro Tip: For the best cost-per-performance ratio on Windows, the RTX 4070 (12GB) paired with 32GB system RAM allows Ollama to split the R1-14B model between GPU VRAM and RAM — running it noticeably faster than pure CPU mode while staying affordable.

Note: These are approximate values for Q4_K_M quantization. Higher quantization (Q8) is ~40% slower. CPU mode speeds vary significantly by RAM bandwidth. Apple Silicon's unified memory architecture makes it particularly efficient for large models.

Quick Reference: All DeepSeek R1 Ollama Commands

Keep this reference handy for all common DeepSeek R1 operations via Ollama:

# ── Installation ──────────────────────────────────────

brew install ollama # macOS

curl -fsSL https://ollama.com/install.sh | sh # Linux

# Download OllamaSetup.exe from ollama.com # Windows

# ── Download Models ───────────────────────────────────

ollama pull deepseek-r1:1.5b

ollama pull deepseek-r1:7b

ollama pull deepseek-r1:8b

ollama pull deepseek-r1:14b

ollama pull deepseek-r1:32b

# ── Run Models ────────────────────────────────────────

ollama run deepseek-r1:14b

ollama run deepseek-r1:14b "Solve: prove that sqrt(2) is irrational"

ollama run deepseek-r1:14b --num-ctx 32768

# ── Management ─────────────────────────────────────────

ollama list # show downloaded models

ollama ps # show running models

ollama rm deepseek-r1:7b # remove a model

ollama show deepseek-r1:14b # show model info

Frequently Asked Questions

Q: Is DeepSeek R1 safe to run locally? Are there privacy concerns?

Yes, running DeepSeek R1 locally through Ollama is completely private. The model runs entirely on your hardware — no data is sent to DeepSeek's servers, no telemetry, and no internet connectivity required after the initial model download. Your prompts, conversations, and outputs stay exclusively on your machine. For sensitive applications, local deployment is actually the most secure option available.

Q: Can I fine-tune DeepSeek R1 on my own data?

Yes — the MIT license permits fine-tuning and modifying DeepSeek R1. For fine-tuning, use the HuggingFace transformers library with PEFT/LoRA adapters for parameter-efficient fine-tuning. The distilled variants (7B, 8B, 14B) are practical to fine-tune on consumer hardware. A full fine-tuning run of DeepSeek-R1-Distill-7B on a domain-specific dataset can be done on a single RTX 4090 in 6–12 hours with LoRA.

Q: What's the difference between DeepSeek-R1 and DeepSeek-R1-Zero?

DeepSeek-R1-Zero was trained purely with reinforcement learning from scratch, with no supervised fine-tuning on human demonstrations. It shows impressive emergent reasoning but sometimes produces outputs in mixed languages or with unusual formatting. DeepSeek-R1 adds a supervised fine-tuning stage after the RL training, making it much more practical for real-world use — better output formatting, consistent language, and more reliable instruction following. For production use, always use DeepSeek-R1 (not Zero).

Q: How does DeepSeek R1 compare to ChatGPT for daily use?

For math, science, and coding tasks, DeepSeek R1-14B is competitive with or exceeds GPT-4o. For creative writing, casual conversation, and tasks requiring broad world knowledge, GPT-4o and Claude still have an edge due to their larger proprietary training datasets. DeepSeek R1's advantage is that it's free to run locally at zero per-token cost, fully private, and available 24/7 regardless of service outages or rate limits.

Q: Why is VPN07 recommended for downloading DeepSeek R1?

DeepSeek R1 models are distributed via the Ollama CDN and HuggingFace. In some countries and network environments, these CDN servers are slow or throttled. VPN07 routes your download traffic through our 1000Mbps global network, connecting you to the fastest available CDN nodes. A 9GB model download that might take 2+ hours without VPN can complete in under 10 minutes with VPN07. We've specifically optimized our routing for HuggingFace and popular developer resources.

📥 All Top Open-Source LLMs: DeepSeek R1, Llama 4, Gemma 3, Phi-4 download links & Ollama commands in one place. View Hub →

VPN07 — Download DeepSeek R1 at Full Speed

1000Mbps · 70+ Countries · Trusted Since 2015

Downloading a 9GB–20GB model from Ollama's CDN can take hours if your connection is throttled or routed through congested servers. VPN07 routes your traffic through our 1000Mbps network, turning slow CDN routes into high-speed channels. We've been helping developers access international services reliably for over 10 years, across 70+ countries. Try risk-free with our 30-day money-back guarantee — only $1.5/month.

$1.5

Per Month

1000Mbps

Bandwidth

70+

Countries

30 Days

Money Back

Start Free Trial → View Pricing

Where to Go Next

Download More LLMs

Explore all top open-source models with download links and hardware guides

Visit AI Hub →

Speed Up Downloads

Use VPN07 to download large LLM files at 1000Mbps from anywhere

Try VPN07 Free →

More AI Guides

Tutorials for Llama 4, Gemma 3, Phi-4, Mistral and more

Read Blog →

Run Llama 4 Locally: All Platforms Install Guide 2026

Install Meta's Llama 4 on Windows, Mac, Linux, Android & iOS. Scout and Maverick MoE models explained with step-by-step Ollama setup.

Qwen3.5 Ollama Setup: Run 0.8B to 35B Free on PC & Mac

Complete Ollama guide for Qwen3.5. Install on Windows, Mac, Linux. Run models from 0.8B to 35B with step-by-step instructions.