



A powerful AI server featuring dual RTX 5060 Ti GPUs with 32GB combined VRAM. Run larger language models, handle multi-user workloads, and accelerate image generation with twice the processing power. Comes with OpenWebUI, Ollama, and ComfyUI pre-installed and ready to use. Everything runs locally on your hardware - your data never leaves your machine.
30-day returns · Free shipping
Tokens generated per second for a single user session. The average human reads about 4-5 words per second (~6-8 tokens), so anything above 30 tok/s feels instantaneous. Speed decreases with multiple concurrent users sharing the GPU. Tested on a selection of most popular models.
Default Ollama settings, no CPU offloading
Default Ollama settings, no CPU offloading
Maximum tokens that fit entirely in VRAM without CPU offloading. A 128K context holds roughly 200 pages of text. Larger contexts let the AI remember more of your conversation history and analyze longer documents. Context size is limited by available VRAM after loading the model weights.
Seconds to generate one 512x512 image or 5 seconds of video. Lower is better for rapid iteration. Higher resolutions take longer. Tested on a selection of most popular models. Generation speed scales with GPU compute power and VRAM bandwidth.
Default ComfyUI templates, 512x512, 81 frames for video models, same prompt for all models
Benchmark results are based on internal testing and may vary depending on workload, model parameters, system configuration, and thermal conditions.