Running DeepSeek-R1-0528 685B Locally: A Comprehensive Hardware Guide to Building Your Own AI Powerhouse

In an era where artificial intelligence is transforming industries, having access to cutting-edge large language models (LLMs) on your own hardware is becoming increasingly appealing. This blog post dives into the hardware and software setup for setting up model like Deepseek-R1-0528 locally.

What is DeepSeek-R1-0528?

DeepSeek-R1-0528 is the latest state-of-the-art LLM (As of 6/8/2025) known for its extensive parameter count and robust performance, making it a favorite among those looking to harness frontier-level intelligence without relying on cloud services.

Hardware Setup: Building the Beast

Motherboard: A dual-socket option like the Gigabyte MZ73-LM0 or MZ73-LM1, supporting 24 DDR5 RAM channels for maximum memory bandwidth. Maximum RAM capacity: 6TB.
CPU: Two AMD EPYC 9004 or 9005 series processors (e.g., 9115 or 9015 for cost savings), optimized for memory bandwidth rather than raw core count.
RAM: Depending on your budget, you can choose from 24x32GB (768 GB) and 24x64GB (1536 GB) DDR5 RDIMM. The higher the ram, the faster the token generation speed.
PSU: Power supply greater than 700W with PCIe 5.1 support is recommended if needed to install GPU that will complement token speed.
Heatsink: A custom SP5 socket cooler (sourced from eBay or AliExpress), with quieter fan options suggested for noise reduction.
SSD: A 1TB+ NVMe SSD to handle the 700GB model weight transfer to RAM.
GPU (Optional): Depending upon your budget, GPU such as NVIDIA V100, NVIDIA P40, NVIDIA RTX 3090, Nvidia A100 can increase the token speed.

Set the BIOS NUMA groups to 0, interleaving memory across all chips to double throughput—a must for optimal performance.

This setup will cost approximately $6000~$20000

Software Setup: Bringing It to Life

Operating System: Install Linux as the base OS.

llama.cpp: Follow the installation guide from this article or llama.cpp to set up this open-source inference framework.
Model Download: Grab the 700GB of Q8_0 quantized weights from Hugging Face or from LLM Hard Drive Store.
Run the Model: Use a simple command like llama-cli -m ./DeepSeek-R1.Q8_0-00001-of-00015.gguf -p "You are a helpful assistant" to test it. Generation speed of 6-8 tokens per second, with a short load period followed by real-time responses.

Test with different models provided by LLM Hard Drive Store to see which one fits your need.

Final Thoughts

Running DeepSeek-R1-0528 locally offers privacy, customization, and the satisfaction of a DIY AI powerhouse. With a generation speed near real-time and support for extensive context, it’s ideal for developers, researchers, or anyone curious about LLMs. However, the setup requires technical know-how and a significant upfront investment.

References:

PC Build: Run Deepseek-V3-0324:671B

X Post by Matthew Carrigan