Penguin Solutions Unveils GPU Memory Expansion and 11 TB CXL KV Cache Server
Penguin Solutions expanded its OriginAI portfolio with GPU memory appliances for NVIDIA RTX PRO 6000 and B300 GPUs to boost inference performance and utilization. It also launched a production-ready CXL-based MemoryAI KV cache server offering up to 11 TB of memory to cut latency and raise throughput in large-context AI inference.
1. OriginAI Platform Expansion
Penguin Solutions enhanced its OriginAI inference platform by integrating large memory appliances with NVIDIA RTX PRO 6000 and B300 GPU designs. These upgrades leverage over 3.3 billion GPU runtime hours and 30 years of memory expertise to improve GPU utilization, deployment speed and infrastructure reliability for enterprise-scale AI workloads.
2. Launch of MemoryAI KV Cache Server
The company introduced the industry’s first production-ready CXL-based MemoryAI KV cache server featuring 3 TB of DDR5 and up to eight 1 TB CXL add-in cards for a total of 11 TB memory capacity. This server addresses the 70% memory-driven demands of AI inference, reducing time-to-first-token, lowering latency and increasing token throughput across GPU clusters.
3. Enterprise Use Cases and Cluster Management
OriginAI solutions support applications in financial services, healthcare and retail by delivering ultra-low latency for fraud detection, real-time diagnostics and personalized customer engagement. Penguin’s ICE ClusterWare software adds health monitoring, auto-remediation and workload isolation to maintain peak performance and data security in multi-tenant AI clusters.