Updated on March 19, 2025 • Published on January 27, 2025 • 12 min read

Run DeepSeek R1 locally on macOS

Jarek Ceborski

What is DeepSeek R1?

DeepSeek R1 is a first-generation reasoning model developed by DeepSeek AI that excels at complex reasoning tasks with performance comparable to OpenAI-o1 model. Published in January 2025, the model was trained using innovative reinforcement learning techniques to enhance reasoning capabilities. The research paper introducing DeepSeek R1 can be found at arXiv:2501.12948↗.

What makes DeepSeek R1 unique is its training approach - it was developed through large-scale reinforcement learning (RL) with minimal reliance on supervised fine-tuning. This resulted in the model naturally developing powerful reasoning behaviors. While the initial DeepSeek-R1-Zero model (trained purely with RL) showed remarkable reasoning capabilities, it had challenges with repetition and readability. The final DeepSeek R1 model addressed these issues by incorporating some initial supervised data before RL training.

Benefits of Running Locally

Running DeepSeek R1 locally on your Mac offers several advantages:

Privacy: Your data stays on your device and doesn't go through external servers
Offline Usage: No internet connection required once models are downloaded
Cost-effective: No API costs or usage limitations
Low Latency: Direct access without network delays
Customization: Full control over model parameters and settings

For macOS users, options to run LLMs like DeepSeek R1 include using specialized platforms such as Ollama or LM Studio, which make it easy to download and run models without complex setups.

Understanding Model Variants

Distilled Models

Model distillation is a technique where a smaller model (student) is trained to mimic the behavior of a larger model (teacher). In DeepSeek R1's case, the researchers demonstrated that the reasoning patterns of the large 671B parameter model could be effectively transferred to smaller models, making them more accessible while maintaining strong performance. This process allows smaller models to achieve better results compared to models trained directly through reinforcement learning at those sizes.

Llama vs. Qwen Base Models

DeepSeek R1 distilled models are based on two different foundation model architectures, each with its own characteristics:

Llama-based models (8B and 70B variants):

Built on Meta's Llama 3 architecture, which uses a traditional transformer architecture with optimizations for compute efficiency
Key features:
- Rotary positional embeddings (RoPE) for better handling of sequential data
- Group Query Attention (GQA) for improved parallel processing
- Sliding Window Attention for handling longer sequences
- Strong performance on English language tasks and coding
- Extensively tested and widely adopted in the open-source community

Qwen-based models (1.5B, 7B, 14B, and 32B variants):

Based on Alibaba's Qwen 2.5 architecture, which introduces several architectural innovations
Key features:
- Multi-query attention mechanism optimized for both English and Chinese
- Enhanced context window (up to 32K tokens)
- Native support for Chinese text segmentation
- Better handling of mixed Chinese-English content
- Improved performance on mathematical reasoning tasks
- Optimized for both academic and commercial applications

The choice between Llama and Qwen variants depends on your specific needs:

Choose Llama variants for:
- Primarily English language applications
- Code generation and analysis
- Projects requiring extensive community support
- Applications needing proven stability
Choose Qwen variants for:
- Multilingual applications, especially involving Chinese
- Mathematical and scientific tasks
- Projects requiring longer context windows
- Applications needing balanced performance across different domains

Hardware Requirements

Here's a breakdown of popular DeepSeek R1 models available on Ollama, along with their approximate sizes and hardware recommendations:

Model	Parameters	Size	VRAM (Approx.)	Recommended Mac
`deepseek-r1:1.5b`	1.5B	1.1 GB	~2 GB	M2/M3 MacBook Air (8GB RAM+)
`deepseek-r1:7b`	7B	4.7 GB	~5 GB	M2/M3/M4 MacBook Pro (16GB RAM+)
`deepseek-r1:8b`	8B	4.9 GB	~6 GB	M2/M3/M4 MacBook Pro (16GB RAM+)
`deepseek-r1:14b`	14B	9.0 GB	~10 GB	M2/M3/M4 Pro MacBook Pro (32GB RAM+)
`deepseek-r1:32b`	32B	20 GB	~22 GB	M2 Max/Ultra Mac Studio
`deepseek-r1:70b`	70B	43 GB	~45 GB	M2 Ultra Mac Studio
`deepseek-r1:1.5b-qwen-distill-q4_K_M`	1.5B	1.1 GB	~2 GB	M2/M3 MacBook Air (8GB RAM+)
`deepseek-r1:7b-qwen-distill-q4_K_M`	7B	4.7 GB	~5 GB	M2/M3/M4 MacBook Pro (16GB RAM+)
`deepseek-r1:8b-llama-distill-q4_K_M`	8B	4.9 GB	~6 GB	M2/M3/M4 MacBook Pro (16GB RAM+)
`deepseek-r1:14b-qwen-distill-q4_K_M`	14B	9.0 GB	~10 GB	M2/M3/M4 Pro MacBook Pro (32GB RAM+)
`deepseek-r1:32b-qwen-distill-q4_K_M`	32B	20 GB	~22 GB	M2 Max/Ultra Mac Studio
`deepseek-r1:70b-llama-distill-q4_K_M`	70B	43 GB	~45 GB	M2 Ultra Mac Studio

Note: VRAM (Video RAM) usage can vary based on the model, task, and quantization. The above is an approximation. Models ending in q4_K_M are quantized for lower resource usage.

Step-by-Step Guide to Running DeepSeek R1 locally on macOS using Ollama and Kerlig

Demo of setting up DeepSeek R1 via Ollama and Kerlig and running it locally on Mac

Install and run Ollama
- Go to ollama.com↗ and download the macOS installer
- Install Ollama on your Mac
- Open Ollama after installation
Add DeepSeek R1 model to Kerlig
- Download Kerlig↗ and open it
- Go to Settings → Integrations → Ollama
- In Add Custom Model section:
  - Enter a display name (e.g., "DeepSeek R1 7B")
  - Enter the model name (e.g., deepseek-r1:7b)
- Click Add
- Toggle the switch to enable and download, wait for download to finish (you can close Settings while downloading continues in background)
Run DeepSeek R1
- Open Kerlig
- Enter your prompt - any question you want to ask
- Select newly added DeepSeek R1 7B as model
- Click Run or press Enter

Usage Recommendations

For optimal performance with DeepSeek R1 models:

Choose a model size appropriate for your Mac's specifications
Start with smaller models first to test performance
Monitor system resources during initial usage
Ensure adequate free storage space for model downloads
Keep Ollama running in the background while using Kerlig
Avoid adding system prompts - include all instructions within the user prompt
For mathematical problems, include directives like: "Please reason step by step, and put your final answer within \boxed"

Licensing

DeepSeek R1 models are licensed under the MIT License and support commercial use, including modifications and derivative works. Note that:

Qwen-based distilled models (1.5B, 7B, 14B, 32B) are derived from Qwen-2.5 series (Apache 2.0 License)
Llama-based 8B model is derived from Llama3.1-8B-Base
Llama-based 70B model is derived from Llama3.3-70B-Instruct

Troubleshooting

For large models, consider closing other resource-intensive applications
If a model fails to load, check your available system memory
Ensure Ollama is running before attempting to use models in Kerlig
If experiencing issues, try restarting Ollama and Kerlig

Conclusion

Remember that model downloads are persistent - once downloaded, you can use them offline in future sessions.

By following these steps, you can successfully run DeepSeek R1 locally on your Mac.

The Revolutionary Reasoning Capabilities of DeepSeek R1

What truly sets DeepSeek R1 apart from previous generations of language models is its groundbreaking approach to reasoning. Unlike traditional language models that primarily focus on pattern recognition and next-token prediction, DeepSeek R1 was specifically designed to develop genuine reasoning capabilities.

How DeepSeek R1's Reasoning Works

DeepSeek R1's reasoning capabilities stem from its unique training methodology:

Pure Reinforcement Learning Approach: Instead of relying heavily on supervised fine-tuning with human demonstrations, DeepSeek R1 was trained primarily through reinforcement learning from the ground up. This approach allows the model to develop reasoning strategies organically rather than mimicking human-provided solutions.
Chain-of-Thought Emergence: One of the most remarkable aspects of DeepSeek R1 is that it naturally develops chain-of-thought reasoning without explicit instruction. The model breaks down complex problems into logical steps, working through each component systematically before arriving at a conclusion.
Self-Consistency Verification: DeepSeek R1 frequently verifies its own work, checking intermediate results and correcting errors before providing final answers. This self-monitoring process significantly improves accuracy on complex tasks.
Recursive Problem Decomposition: When facing difficult problems, the model automatically decomposes them into simpler sub-problems, solves each individually, and then integrates the results—a strategy closely resembling human expert problem-solving.
Abstract Reasoning Transfer: DeepSeek R1 can transfer reasoning patterns across domains, applying logical frameworks from one field to solve seemingly unrelated problems in another—demonstrating true reasoning rather than mere memorization.

Game-Changing Advantages Over Traditional Models

DeepSeek R1's reasoning-first approach delivers several revolutionary advantages:

Superior Performance on Novel Problems: While traditional models often struggle with problems they haven't explicitly seen during training, DeepSeek R1 excels at novel challenges by applying fundamental reasoning principles rather than pattern matching.
Mathematical Reasoning Breakthrough: DeepSeek R1 demonstrates unprecedented capabilities in mathematical reasoning, achieving 69.8% accuracy on AIME (American Invitational Mathematics Examination) problems in the 14B model—a feat that was previously thought impossible for models of this size.
Reduced Hallucination: The step-by-step reasoning process naturally reduces hallucinations, as each logical step constrains the next, preventing the model from drifting into unfounded conclusions.
Explainable Outputs: By showing its reasoning process, DeepSeek R1 provides transparent explanations for its answers, making it more trustworthy and useful for critical applications in fields like medicine, science, and engineering.
Generalization to Unseen Tasks: The reasoning capabilities generalize remarkably well to tasks not included in the training process, suggesting that DeepSeek R1 has developed fundamental cognitive abilities rather than task-specific heuristics.

Real-World Applications of DeepSeek R1's Reasoning

The advanced reasoning capabilities of DeepSeek R1 open up new possibilities across multiple domains:

Scientific Research: Formulating hypotheses and designing experiments based on existing literature
Education: Creating custom explanations that adapt to a student's level of understanding
Software Development: Designing algorithms and debugging complex code issues
Financial Analysis: Evaluating investment strategies through multi-step risk assessment
Medical Diagnosis: Working through differential diagnoses based on patient symptoms and test results

DeepSeek R1 represents a paradigm shift in AI development—moving from models that primarily predict text to systems that can truly reason through complex problems. This transition from pattern-recognition to genuine reasoning marks a significant step toward more general artificial intelligence.

DeepSeek R1 Reasoning in Action: A Demo!

To fully appreciate the reasoning capabilities of DeepSeek R1, seeing it in action provides the most compelling evidence. The following demonstration showcases how the DeepSeek R1 14b model tackles a research-oriented task requiring analytical thinking and information synthesis.

Demo of DeepSeek R1 14b with reasonig running locally on Mac

Understanding the Demonstration

In this video, we present DeepSeek R1 14b with a research-oriented query:

Query: Help me find information about deep research models (e.g. from OpenAI, Perplexity, Google, DeepSeek) for a blog post that I'm planning to write.

Watch as DeepSeek R1:

Identifies the key components: Recognizing the different research organizations and the types of models to focus on
Structures the information: Creates a logical framework to organize details about various research models
Provides comprehensive analysis: Offers detailed information about each organization's cutting-edge models
Compares and contrasts: Highlights the unique aspects and capabilities of different research models
Analyzes technical details: Explains the architectural innovations that make each model distinctive
Identifies practical applications: Connects model capabilities to real-world use cases
Synthesizes a comprehensive overview: Pulls together the disparate information into a cohesive narrative

What's remarkable about this demonstration is not just the breadth of information DeepSeek R1 provides, but how it structures and reasons through the information. The model doesn't simply list facts; it approaches the task systematically, organizing information in a way that would be genuinely useful for writing a blog post.

Key Observations

Several aspects of this demonstration highlight DeepSeek R1's reasoning capabilities:

Structured Knowledge Organization: The model naturally organizes information into logical categories without explicit instructions to do so.
Analytical Comparison: Without being prompted, the model identifies meaningful points of comparison between different research models.
Nuanced Understanding: The model demonstrates a sophisticated grasp of the technical differences between various research approaches.
Contextual Relevance: Information is presented with awareness of what would be most useful for blog post creation.
Synthesized Insights: Rather than just presenting isolated facts, the model draws connections and identifies trends across the research landscape.

This demonstration exemplifies why DeepSeek R1 represents a significant advancement in AI reasoning capabilities, showing how even the 14b parameter model can tackle complex research tasks that require sophisticated information processing and organization—capabilities that extend far beyond simple pattern matching.

FAQ: Frequently Asked Questions About DeepSeek R1

Q: Which model size should I choose?

Answer: Consider these recommendations:

Basic tasks: 1.5B model
Balanced performance: 7B/8B models
Maximum capabilities: 14B+ models (if hardware supports it)
Start with smaller models and scale up if needed

Q: What are the minimum hardware requirements for DeepSeek R1?

Answer: Requirements vary by model size:

1.5B model: M2/M3 MacBook Air with 8GB RAM
7B/8B models: M2/M3 MacBook Pro with 16GB RAM
14B model: M2/M3 Pro MacBook Pro with 32GB RAM
32B/70B models: M2 Max/Ultra Mac Studio

Q: How much storage space do the models require?

Answer: Storage requirements per model:

1.5B model: ~1.1 GB
7B model: ~4.7 GB
14B model: ~9.0 GB
32B model: ~20 GB
70B model: ~43 GB

Q: What is the response speed of different model sizes?

Answer: Speed varies by model and hardware:

1.5B model: 1-2 seconds on M2/M3 Macs
7B/8B models: 2-4 seconds on 16GB RAM machines
14B+ models: 4-8 seconds on higher-end hardware
Quantized (q4_K_M) versions improve speed by 30-50%

Q: What tasks is DeepSeek R1 best suited for?

Answer: DeepSeek R1 excels at:

Mathematics and complex problem-solving
Coding and algorithm challenges
Scientific reasoning
Step-by-step explanations
Analytical thinking tasks

Q: How do different model sizes compare in performance?

Answer: Performance scales with model size:

1.5B model: 28.9% AIME accuracy, outperforms GPT-4 on certain math tasks
7B model: 55.5% AIME accuracy
14B model: 69.7% AIME accuracy, close to larger models
All sizes demonstrate strong reasoning capabilities due to distillation technology

Q: How does DeepSeek R1 compare to other local LLMs?

Answer: DeepSeek R1 distinguishes itself by:

Outperforming larger models on math and coding tasks
Excelling at step-by-step problem solving
Strong scientific reasoning capabilities
Efficient resource usage through quantization
Competitive performance at smaller model sizes

Q: Can DeepSeek R1 run offline?

Answer: Yes, after downloading through Ollama, DeepSeek R1 runs completely offline, providing:

Complete privacy
No API costs
Reliable access
Lower latency

Q: Can multiple models run simultaneously?

Answer: While possible, it's recommended to run one model at a time for optimal performance, especially on machines with limited RAM.

Q: Does performance differ on Apple Silicon vs Intel Macs?

Answer: Yes, Apple Silicon Macs (M1/M2/M3) offer significantly better performance due to:

Unified memory architecture
Optimized ML capabilities
Specific optimizations for Apple Silicon