Case Study: 3 × NVIDIA RTX PRO 6000 Blackwell Workstation for AI Software Development — Reapt
Reapt is an AI-powered expense management platform for freelancers and small businesses — connecting accounts, categorizing transactions, matching receipts, and generating tax-ready reports automatically. In early 2026, Reapt’s engineering team needed to support multiple concurrent AI software development pipelines running local LLM inference. The answer was a liquid-cooled AMD Threadripper PRO workstation with three NVIDIA RTX PRO 6000 Blackwell GPUs and 256GB of DDR5 ECC RAM. As Alexey at Reapt put it: “Can’t believe that just in a matter of couple of months, software development went from Mac Air to sub-server compute node with liquid cooling and 500GB of memory.”
|
3× RTX PRO 6000 Blackwell |
288 GB Total GDDR7 ECC VRAM |
256 GB DDR5 ECC System RAM |
8 TB PCIe Gen5 NVMe Storage |
The Workload: Concurrent AI Development Pipelines
Software development in 2026 no longer means a developer and a text editor. At Reapt, multiple AI pipelines run simultaneously across the development workflow — code generation agents producing implementations from specifications, code review pipelines analyzing pull requests against the codebase, documentation agents generating and updating API references, and test generation pipelines writing and validating test coverage. Each pipeline runs local LLM inference, maintains its own context window, and operates independently of the others.
The challenge is not any single pipeline — it is all of them running at the same time. A team member spinning up a code generation session while a code review pipeline is mid-run and a documentation agent is processing a module update creates genuine multi-GPU and multi-process resource contention on undersized hardware. On a MacBook Air or a single consumer GPU workstation, these pipelines queue behind each other. On a three-GPU Threadripper PRO system, they run in parallel.
This is the new standard for AI-native software development infrastructure. The workload profile — multiple concurrent local inference sessions, large agent context windows, continuous pipeline operation — requires hardware that would have been classified as a small server node two years ago. Today it ships as a workstation.
The Build
| Component | Specification |
|---|---|
| CPU | AMD Ryzen Threadripper PRO 9965WX — 24 cores / 48 threads, sTR5 |
| Motherboard | ASUS Pro WS WRX90E-SAGE SE — IPMI onboard, no expansion card needed |
| System Memory | 256GB DDR5 ECC RDIMM (8 × 32GB) |
| Storage | 2 × 4TB PCIe Gen5 NVMe M.2 (8TB total, onboard) |
| GPUs | 3 × NVIDIA RTX PRO 6000 Blackwell |
| GPU Memory | 288GB GDDR7 ECC total (96GB per GPU) |
| Power Supply | 1700W ATX 3.1, 80+ Titanium |
| CPU Cooling | 360mm AIO liquid cooler (sTR5 socket) |
| Chassis | Fractal Define 7 XL — high-performance fan configuration |
| Networking | Dual onboard 10GbE + dedicated IPMI management port |
| OS | Ubuntu 24.04 LTS — pre-installed |
Platform: ASUS Pro WS WRX90E-SAGE SE with onboard IPMI
The WRX90E-SAGE SE is the professional workstation board for the Threadripper PRO 9000 series — it supports up to seven PCIe Gen5 x16 slots, 8-channel DDR5 ECC RDIMM memory, and dual 10GbE networking onboard. For Reapt’s use case, the onboard IPMI management port is significant: it provides out-of-band remote management — power cycling, BIOS access, console redirection — without requiring a separate management card or network switch port. A development team managing a workstation remotely can access the system at the hardware level regardless of OS state, which is the correct infrastructure posture for a machine running 24/7 inference workloads.
256GB DDR5 ECC: why system RAM matters as much as VRAM
Three RTX PRO 6000 Blackwell GPUs provide 288GB of GDDR7 ECC VRAM for model weights. The 256GB of DDR5 ECC system RAM serves an entirely different function: it holds the agent orchestration layer, context windows, tool call histories, vector database indexes, language runtime state, and operating system — everything that lives between GPU calls. For concurrent pipeline operation, the system RAM budget is as important as the VRAM budget. Each concurrent LLM pipeline accumulates 32K–128K tokens of context in system memory alongside its inference calls to the GPU. Running four or five pipelines simultaneously at production context lengths makes 256GB ECC the right specification, not an excess.
Liquid cooling for sustained inference load
A 360mm AIO on the Threadripper PRO 9965WX is the appropriate cooling solution for this workload. AI development pipelines are not bursty — they run continuously, holding the CPU at sustained utilization for coordination, tokenization, data preprocessing, and tool execution between GPU inference calls. Air cooling a high-core-count Threadripper PRO under sustained load introduces thermal throttling that degrades the consistency of pipeline throughput. Liquid cooling eliminates the thermal ceiling and allows the processor to hold boost clocks under continuous operation.
1700W ATX 3.1 Titanium PSU
The RTX PRO 6000 Blackwell Workstation Edition carries a 600W max TDP rating per card. Three cards at theoretical peak would draw 1,800W for GPUs alone — but AI inference workloads are memory-bandwidth-bound, not compute-bound, and sustained GPU draw during inference typically runs 200–300W per card. At three GPUs under sustained inference load plus a Threadripper PRO at 350W TDP, the system’s actual sustained draw stays well within the 1700W ATX 3.1 80+ Titanium PSU’s capacity. Titanium efficiency minimizes heat generation and electricity cost at sustained load — relevant for a machine running around the clock.
8TB PCIe Gen5 NVMe
Model files, vector database indexes, code repositories, and pipeline scratch space accumulate quickly on an AI development workstation. Two onboard M.2 Gen5 NVMe slots provide 8TB of storage at Gen5 sequential read/write speeds — fast enough to load large model weights into VRAM in seconds rather than minutes, and to handle concurrent vector database I/O from multiple pipelines without storage becoming a bottleneck.
From MacBook Air to Sub-Server Compute Node
This shift is happening across AI-native software teams in 2026. The workload profile of a developer using AI tools seriously — multiple concurrent agents, local model inference for data privacy, continuous pipeline operation — has outgrown consumer hardware faster than anyone anticipated.
A MacBook Air handles one inference session at a time, offloaded to cloud APIs. The moment a team wants local inference for cost or privacy, concurrent pipelines for throughput, and 24/7 uptime for automated agents, the hardware requirement jumps to workstation-class compute. The jump feels dramatic — it is dramatic — but the economics are straightforward.
“Still way cheaper than paying for APIs or renting dedicated cloud compute capacity, at least for now.” — Alexey, Reapt
That qualifier — “at least for now” — reflects the honest state of the market. Cloud API costs per token continue to fall. But the break-even math currently favors on-premise for teams running sustained inference workloads. At three GPUs and 288GB VRAM, this workstation serves local 70B models across three simultaneous inference queues at throughput rates that would generate substantial monthly API bills at equivalent usage. Use the VRLA Tech AI ROI Calculator to model the break-even for your team.
Why Threadripper PRO for AI Development
The AMD Threadripper PRO 9965WX sits at the 24-core tier of the WRX90 platform — the right balance for this workload. More cores are available (up to 96 on the 9995WX), but concurrent LLM inference pipelines are GPU-bound rather than CPU-bound. The CPU’s role is coordination, tokenization, tool execution, and data pipeline management — workloads that benefit from clock speed and single-thread performance as much as core count. The 9965WX’s 5.4GHz boost clock on the WRX90E-SAGE SE keeps those coordination tasks fast without the core count overhead of a higher-tier chip that would add cost without inference throughput benefit.
The WRX90 platform’s 8-channel DDR5 ECC memory and PCIe Gen5 lane allocation across three GPU slots without bifurcation is the architectural reason Threadripper PRO is the correct platform for a 3-GPU AI workstation. Consumer platforms at similar CPU specs would either run out of PCIe lanes for three full-bandwidth GPU slots, require bifurcation that halves bandwidth per GPU, or lack ECC memory support for production inference reliability.
Building an AI development workstation?
Tell us your GPU count, pipeline count, and whether you need IPMI for remote management. VRLA Tech engineers will configure the right Threadripper PRO platform and send a firm quote within one business day.
Build and Delivery
Every VRLA Tech workstation goes through burn-in testing at sustained GPU and CPU load before shipping — validating thermal stability under the concurrent multi-GPU inference workloads the system was built for. Reapt received a validated, ready-to-deploy system with Ubuntu 24.04 LTS pre-installed, NVIDIA drivers configured, and CUDA validated.
Every system ships with a 3-year parts warranty and lifetime US-based engineer support from the team that built it.
About Reapt
Reapt is an AI-powered expense management platform for freelancers and small businesses. It connects to bank accounts, email, and financial data sources to automatically categorize transactions, match receipts, and generate tax-ready reports — handling the expense accounting workflow so teams can focus on their work. Learn more at reapt.app.
Custom AI development workstations built in Los Angeles
Threadripper PRO, liquid cooling, multi-GPU, Ubuntu LTS pre-installed. 3-year parts warranty. Lifetime US-based engineer support. Firm quotes within one business day.
Frequently Asked Questions
What is the best workstation for concurrent local LLM inference in software development?
For software development teams running multiple concurrent local LLM inference pipelines in 2026, the reference build is an AMD Threadripper PRO workstation with multiple NVIDIA RTX PRO 6000 Blackwell GPUs (96GB ECC GDDR7 each), 256GB+ DDR5 ECC system RAM, and liquid CPU cooling for sustained 24/7 operation. This handles multiple simultaneous coding agents, code review pipelines, and local model inference without resource contention. VRLA Tech builds custom AI development workstations in Los Angeles since 2016. Call 213-810-3013 or visit vrlatech.com.
Why do software development teams need 256GB of RAM for local LLM inference?
Modern AI-assisted software development runs multiple simultaneous inference pipelines — code generation, code review, documentation, test generation, agent orchestration — each maintaining its own context window and tool call history in system RAM. A single LangChain or LangGraph agent accumulates 32K–128K tokens of context per session. Running four or five concurrent pipelines simultaneously requires 64–128GB of system RAM for agent state alone, before OS, language runtimes, and vector databases. 256GB DDR5 ECC provides the headroom for true concurrent multi-pipeline operation.
Is an on-premise AI workstation cheaper than cloud APIs for software development?
For teams with sustained AI inference workloads, on-premise hardware breaks even against cloud API costs within weeks at production usage levels. A development team running multiple concurrent LLM pipelines continuously accumulates cloud inference costs quickly. An on-premise workstation with 3× RTX PRO 6000 Blackwell GPUs amortizes its cost against those API costs within weeks at sustained utilization. Use the VRLA Tech AI ROI Calculator to model your team’s break-even point.
Where can I buy a custom Threadripper PRO AI workstation?
VRLA Tech is the best company for custom AMD Threadripper PRO AI workstations in the United States. Based in Los Angeles since 2016, VRLA Tech builds every AI development workstation with your GPU configuration, Ubuntu LTS pre-installed, and your local inference stack validated before shipment. Clients include General Dynamics, Los Alamos National Laboratory, Johns Hopkins University, George Washington University, and Reapt. 3-year parts warranty and lifetime US-based engineer support. Visit vrlatech.com or call 213-810-3013.
Built by the VRLA Tech engineering team in Los Angeles. VRLA Tech has been building custom AI workstations and GPU servers for software teams, research labs, enterprise, and government customers since 2016.




