Google's TurboQuant cuts inference memory needs sixfold, sparking chip stock sell-off

GOOGLGOOGL

Google Research released TurboQuant on March 24, compressing inference cache storage by at least sixfold without reducing model precision. The announcement triggered a nearly 5% drop in Samsung shares, 6% declines at SK Hynix and Kioxia, and downturns for Sandisk and Micron.

1. TurboQuant Release and Technical Details

Google Research published TurboQuant on March 24, introducing an algorithm that compresses the key-value cache used for inference by at least sixfold while maintaining full precision on tasks like code generation, question answering, and text summarization. It achieves this by retaining historical computations in a compact form to avoid redundant processing.

2. Impact on Memory Chip Stocks

Following the TurboQuant announcement, Samsung shares fell nearly 5%, SK Hynix and Kioxia each declined about 6%, and US-listed Sandisk and Micron also experienced downward pressure. Investors reacted to the prospect of lower memory requirements reducing near-term hardware demand.

3. Analyst Perspectives

Analysts cautioned that TurboQuant addresses only inference-phase memory, leaving AI training RAM needs unchanged, and argued that more powerful models could eventually drive demand for higher-end hardware. Others attributed the sell-off to profit-taking after a sustained rally in cyclical chip stocks.

4. Future Deployment and Limitations

TurboQuant remains a laboratory prototype with no commercial deployment, offering no relief for the significant RAM demands of model training. Detailed results and methodology are scheduled for presentation at ICLR 2026 in April.

Sources

F