Lemmit.Online bot

Lemmit.Online bot

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/selfhosted by /u/yoracale on 2025-01-31 18:03:16+00:00.

Hey guys! We previously wrote that you can run R1 locally but many of you were asking how. Our guide was a bit technical, so we at Unsloth collabed with Open WebUI (a lovely chat UI interface) to create this beginner-friendly, step-by-step guide for running the full DeepSeek-R1 Dynamic 1.58-bit model locally.

This guide is summarized so I highly recommend you read the full guide (with pics) here:

You don’t need a GPU to run this model but it will make it faster especially when you have at least 24GB of VRAM.
Try to have a sum of RAM + VRAM = 80GB+ to get decent tokens/s

To Run DeepSeek-R1:

1. Install Llama.cpp

Download prebuilt binaries or build from source following this guide.

2. Download the Model (1.58-bit, 131GB) from Unsloth

Get the model from Hugging Face.
Use Python to download it programmatically:

from huggingface_hub import snapshot_download snapshot_download(     repo_id="unsloth/DeepSeek-R1-GGUF",     local_dir="DeepSeek-R1-GGUF",     allow_patterns=["*UD-IQ1_S*"] )

Once the download completes, you’ll find the model files in a directory structure like this:

DeepSeek-R1-GGUF/ ├── DeepSeek-R1-UD-IQ1_S/ │   ├── DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf │   ├── DeepSeek-R1-UD-IQ1_S-00002-of-00003.gguf │   ├── DeepSeek-R1-UD-IQ1_S-00003-of-00003.gguf

Ensure you know the path where the files are stored.

3. Install and Run Open WebUI

This is how Open WebUI looks like running R1
If you don’t already have it installed, no worries! It’s a simple setup. Just follow the Open WebUI docs here:
Once installed, start the application - we’ll connect it in a later step to interact with the DeepSeek-R1 model.

4. Start the Model Server with Llama.cpp

Now that the model is downloaded, the next step is to run it using Llama.cpp’s server mode.

🛠️Before You Begin:

Locate the llama-server Binary
If you built Llama.cpp from source, the llama-server executable is located in:llama.cpp/build/bin Navigate to this directory using:cd [path-to-llama-cpp]/llama.cpp/build/bin Replace [path-to-llama-cpp] with your actual Llama.cpp directory. For example:cd ~/Documents/workspace/llama.cpp/build/bin
Point to Your Model Folder
Use the full path to the downloaded GGUF files.When starting the server, specify the first part of the split GGUF files (e.g., DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf).

🚀Start the Server

Run the following command:

./llama-server \     --model /[your-directory]/DeepSeek-R1-GGUF/DeepSeek-R1-UD-IQ1_S/DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf \     --port 10000 \     --ctx-size 1024 \     --n-gpu-layers 40

Example (If Your Model is in /Users/tim/Documents/workspace):

./llama-server \     --model /Users/tim/Documents/workspace/DeepSeek-R1-GGUF/DeepSeek-R1-UD-IQ1_S/DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf \     --port 10000 \     --ctx-size 1024 \     --n-gpu-layers 40

✅ Once running, the server will be available at:

http://127.0.0.1:10000

🖥️ Llama.cpp Server Running

After running the command, you should see a message confirming the server is active and listening on port 10000.

Step 5: Connect Llama.cpp to Open WebUI

Open Admin Settings in Open WebUI.
Go to Connections > OpenAI Connections.
Add the following details:
URL → Key → none

Adding Connection in Open WebUI

If you have any questions please let us know and also - any suggestions are also welcome! Happy running folks! :)

Beginner guide: Run DeepSeek-R1 (671B) on your own local device