Run DeepSeek R1 locally on macOS
Updated on February 4, 2025   •   Published on January 27, 2025   •   8 min read

Run DeepSeek R1 locally on macOS

Jarek CeborskiJarek Ceborski

What is DeepSeek R1?

DeepSeek R1 is a first-generation reasoning model developed by DeepSeek AI that excels at complex reasoning tasks with performance comparable to OpenAI-o1 model. Published in January 2025, the model was trained using innovative reinforcement learning techniques to enhance reasoning capabilities. The research paper introducing DeepSeek R1 can be found at arXiv:2501.12948.

What makes DeepSeek R1 unique is its training approach - it was developed through large-scale reinforcement learning (RL) with minimal reliance on supervised fine-tuning. This resulted in the model naturally developing powerful reasoning behaviors. While the initial DeepSeek-R1-Zero model (trained purely with RL) showed remarkable reasoning capabilities, it had challenges with repetition and readability. The final DeepSeek R1 model addressed these issues by incorporating some initial supervised data before RL training.

Benefits of Running Locally

Running DeepSeek R1 locally on your Mac offers several advantages:

  1. Privacy: Your data stays on your device and doesn't go through external servers
  2. Offline Usage: No internet connection required once models are downloaded
  3. Cost-effective: No API costs or usage limitations
  4. Low Latency: Direct access without network delays
  5. Customization: Full control over model parameters and settings

For macOS users, options to run LLMs like DeepSeek R1 include using specialized platforms such as Ollama or LM Studio, which make it easy to download and run models without complex setups.

Understanding Model Variants

Distilled Models

Model distillation is a technique where a smaller model (student) is trained to mimic the behavior of a larger model (teacher). In DeepSeek R1's case, the researchers demonstrated that the reasoning patterns of the large 671B parameter model could be effectively transferred to smaller models, making them more accessible while maintaining strong performance. This process allows smaller models to achieve better results compared to models trained directly through reinforcement learning at those sizes.

Llama vs. Qwen Base Models

DeepSeek R1 distilled models are based on two different foundation model architectures, each with its own characteristics:

Llama-based models (8B and 70B variants):

  • Built on Meta's Llama 3 architecture, which uses a traditional transformer architecture with optimizations for compute efficiency
  • Key features:
    • Rotary positional embeddings (RoPE) for better handling of sequential data
    • Group Query Attention (GQA) for improved parallel processing
    • Sliding Window Attention for handling longer sequences
    • Strong performance on English language tasks and coding
    • Extensively tested and widely adopted in the open-source community

Qwen-based models (1.5B, 7B, 14B, and 32B variants):

  • Based on Alibaba's Qwen 2.5 architecture, which introduces several architectural innovations
  • Key features:
    • Multi-query attention mechanism optimized for both English and Chinese
    • Enhanced context window (up to 32K tokens)
    • Native support for Chinese text segmentation
    • Better handling of mixed Chinese-English content
    • Improved performance on mathematical reasoning tasks
    • Optimized for both academic and commercial applications

The choice between Llama and Qwen variants depends on your specific needs:

  • Choose Llama variants for:
    • Primarily English language applications
    • Code generation and analysis
    • Projects requiring extensive community support
    • Applications needing proven stability
  • Choose Qwen variants for:
    • Multilingual applications, especially involving Chinese
    • Mathematical and scientific tasks
    • Projects requiring longer context windows
    • Applications needing balanced performance across different domains

Hardware Requirements

Here's a breakdown of popular DeepSeek R1 models available on Ollama, along with their approximate sizes and hardware recommendations:

ModelParametersSizeVRAM (Approx.)Recommended Mac
deepseek-r1:1.5b1.5B1.1 GB~2 GBM2/M3 MacBook Air (8GB RAM+)
deepseek-r1:7b7B4.7 GB~5 GBM2/M3/M4 MacBook Pro (16GB RAM+)
deepseek-r1:8b8B4.9 GB~6 GBM2/M3/M4 MacBook Pro (16GB RAM+)
deepseek-r1:14b14B9.0 GB~10 GBM2/M3/M4 Pro MacBook Pro (32GB RAM+)
deepseek-r1:32b32B20 GB~22 GBM2 Max/Ultra Mac Studio
deepseek-r1:70b70B43 GB~45 GBM2 Ultra Mac Studio
deepseek-r1:1.5b-qwen-distill-q4_K_M1.5B1.1 GB~2 GBM2/M3 MacBook Air (8GB RAM+)
deepseek-r1:7b-qwen-distill-q4_K_M7B4.7 GB~5 GBM2/M3/M4 MacBook Pro (16GB RAM+)
deepseek-r1:8b-llama-distill-q4_K_M8B4.9 GB~6 GBM2/M3/M4 MacBook Pro (16GB RAM+)
deepseek-r1:14b-qwen-distill-q4_K_M14B9.0 GB~10 GBM2/M3/M4 Pro MacBook Pro (32GB RAM+)
deepseek-r1:32b-qwen-distill-q4_K_M32B20 GB~22 GBM2 Max/Ultra Mac Studio
deepseek-r1:70b-llama-distill-q4_K_M70B43 GB~45 GBM2 Ultra Mac Studio

Note: VRAM (Video RAM) usage can vary based on the model, task, and quantization. The above is an approximation. Models ending in q4_K_M are quantized for lower resource usage.

Step-by-Step Guide to Running DeepSeek R1 locally on macOS using Ollama and Kerlig

  1. Install and run Ollama

    • Go to ollama.com and download the macOS installer
    • Install Ollama on your Mac
    • Open Ollama after installation
  2. Add DeepSeek R1 model to Kerlig

    • Download Kerlig and open it
    • Go to Settings → Integrations → Ollama
    • In Add Custom Model section:
      • Enter a display name (e.g., "DeepSeek R1 7B")
      • Enter the model name (e.g., deepseek-r1:7b)
    • Click Add
    • Toggle the switch to enable and download, wait for download to finish (you can close Settings while downloading continues in background)
  3. Run DeepSeek R1

    • Open Kerlig
    • Enter your prompt - any question you want to ask
    • Select newly added DeepSeek R1 7B as model
    • Click Run or press Enter

Usage Recommendations

For optimal performance with DeepSeek R1 models:

  1. Choose a model size appropriate for your Mac's specifications
  2. Start with smaller models first to test performance
  3. Monitor system resources during initial usage
  4. Ensure adequate free storage space for model downloads
  5. Keep Ollama running in the background while using Kerlig
  6. Avoid adding system prompts - include all instructions within the user prompt
  7. For mathematical problems, include directives like: "Please reason step by step, and put your final answer within \boxed"

Licensing

DeepSeek R1 models are licensed under the MIT License and support commercial use, including modifications and derivative works. Note that:

  • Qwen-based distilled models (1.5B, 7B, 14B, 32B) are derived from Qwen-2.5 series (Apache 2.0 License)
  • Llama-based 8B model is derived from Llama3.1-8B-Base
  • Llama-based 70B model is derived from Llama3.3-70B-Instruct

Troubleshooting

  • For large models, consider closing other resource-intensive applications
  • If a model fails to load, check your available system memory
  • Ensure Ollama is running before attempting to use models in Kerlig
  • If experiencing issues, try restarting Ollama and Kerlig

Conclusion

Remember that model downloads are persistent - once downloaded, you can use them offline in future sessions.

By following these steps, you can successfully run DeepSeek R1 locally on your Mac.


FAQ: Frequently Asked Questions About DeepSeek R1

Q: Which model size should I choose?

Answer: Consider these recommendations:

  • Basic tasks: 1.5B model
  • Balanced performance: 7B/8B models
  • Maximum capabilities: 14B+ models (if hardware supports it)
  • Start with smaller models and scale up if needed

Q: What are the minimum hardware requirements for DeepSeek R1?

Answer: Requirements vary by model size:

  • 1.5B model: M2/M3 MacBook Air with 8GB RAM
  • 7B/8B models: M2/M3 MacBook Pro with 16GB RAM
  • 14B model: M2/M3 Pro MacBook Pro with 32GB RAM
  • 32B/70B models: M2 Max/Ultra Mac Studio

Q: How much storage space do the models require?

Answer: Storage requirements per model:

  • 1.5B model: ~1.1 GB
  • 7B model: ~4.7 GB
  • 14B model: ~9.0 GB
  • 32B model: ~20 GB
  • 70B model: ~43 GB

Q: What is the response speed of different model sizes?

Answer: Speed varies by model and hardware:

  • 1.5B model: 1-2 seconds on M2/M3 Macs
  • 7B/8B models: 2-4 seconds on 16GB RAM machines
  • 14B+ models: 4-8 seconds on higher-end hardware
  • Quantized (q4_K_M) versions improve speed by 30-50%

Q: What tasks is DeepSeek R1 best suited for?

Answer: DeepSeek R1 excels at:

  • Mathematics and complex problem-solving
  • Coding and algorithm challenges
  • Scientific reasoning
  • Step-by-step explanations
  • Analytical thinking tasks

Q: How do different model sizes compare in performance?

Answer: Performance scales with model size:

  • 1.5B model: 28.9% AIME accuracy, outperforms GPT-4 on certain math tasks
  • 7B model: 55.5% AIME accuracy
  • 14B model: 69.7% AIME accuracy, close to larger models
  • All sizes demonstrate strong reasoning capabilities due to distillation technology

Q: How does DeepSeek R1 compare to other local LLMs?

Answer: DeepSeek R1 distinguishes itself by:

  • Outperforming larger models on math and coding tasks
  • Excelling at step-by-step problem solving
  • Strong scientific reasoning capabilities
  • Efficient resource usage through quantization
  • Competitive performance at smaller model sizes

Q: Can DeepSeek R1 run offline?

Answer: Yes, after downloading through Ollama, DeepSeek R1 runs completely offline, providing:

  • Complete privacy
  • No API costs
  • Reliable access
  • Lower latency

Q: Can multiple models run simultaneously?

Answer: While possible, it's recommended to run one model at a time for optimal performance, especially on machines with limited RAM.

Q: Does performance differ on Apple Silicon vs Intel Macs?

Answer: Yes, Apple Silicon Macs (M1/M2/M3) offer significantly better performance due to:

  • Unified memory architecture
  • Optimized ML capabilities
  • Specific optimizations for Apple Silicon

References