This is an automated archive made by the Lemmit Bot.
The original was posted on /r/selfhosted by /u/yoracale on 2025-01-31 18:03:16+00:00.
Hey guys! We previously wrote that you can run R1 locally but many of you were asking how. Our guide was a bit technical, so we at Unsloth collabed with Open WebUI (a lovely chat UI interface) to create this beginner-friendly, step-by-step guide for running the full DeepSeek-R1 Dynamic 1.58-bit model locally.
This guide is summarized so I highly recommend you read the full guide (with pics) here:
- You don’t need a GPU to run this model but it will make it faster especially when you have at least 24GB of VRAM.
- Try to have a sum of RAM + VRAM = 80GB+ to get decent tokens/s
To Run DeepSeek-R1:
1. Install Llama.cpp
- Download prebuilt binaries or build from source following this guide.
2. Download the Model (1.58-bit, 131GB) from Unsloth
- Get the model from Hugging Face.
- Use Python to download it programmatically:
from huggingface_hub import snapshot_download snapshot_download( repo_id="unsloth/DeepSeek-R1-GGUF", local_dir="DeepSeek-R1-GGUF", allow_patterns=["*UD-IQ1_S*"] )
- Once the download completes, you’ll find the model files in a directory structure like this:
DeepSeek-R1-GGUF/ ├── DeepSeek-R1-UD-IQ1_S/ │ ├── DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf │ ├── DeepSeek-R1-UD-IQ1_S-00002-of-00003.gguf │ ├── DeepSeek-R1-UD-IQ1_S-00003-of-00003.gguf
- Ensure you know the path where the files are stored.
3. Install and Run Open WebUI
-
This is how Open WebUI looks like running R1
-
If you don’t already have it installed, no worries! It’s a simple setup. Just follow the Open WebUI docs here:
-
Once installed, start the application - we’ll connect it in a later step to interact with the DeepSeek-R1 model.
4. Start the Model Server with Llama.cpp
Now that the model is downloaded, the next step is to run it using Llama.cpp’s server mode.
🛠️Before You Begin:
- Locate the llama-server Binary
- If you built Llama.cpp from source, the llama-server executable is located in:llama.cpp/build/bin Navigate to this directory using:cd [path-to-llama-cpp]/llama.cpp/build/bin Replace [path-to-llama-cpp] with your actual Llama.cpp directory. For example:cd ~/Documents/workspace/llama.cpp/build/bin
- Point to Your Model Folder
- Use the full path to the downloaded GGUF files.When starting the server, specify the first part of the split GGUF files (e.g., DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf).
🚀Start the Server
Run the following command:
./llama-server \ --model /[your-directory]/DeepSeek-R1-GGUF/DeepSeek-R1-UD-IQ1_S/DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf \ --port 10000 \ --ctx-size 1024 \ --n-gpu-layers 40
Example (If Your Model is in /Users/tim/Documents/workspace):
./llama-server \ --model /Users/tim/Documents/workspace/DeepSeek-R1-GGUF/DeepSeek-R1-UD-IQ1_S/DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf \ --port 10000 \ --ctx-size 1024 \ --n-gpu-layers 40
✅ Once running, the server will be available at:
http://127.0.0.1:10000
🖥️ Llama.cpp Server Running
Step 5: Connect Llama.cpp to Open WebUI
- Open Admin Settings in Open WebUI.
- Go to Connections > OpenAI Connections.
- Add the following details:
- URL → Key → none