Google introduces TurboQuant: 6x KV cache compression, 8x LLM speedup with zero accuracy loss
Google Research unveiled TurboQuant, a novel algorithm slashing LLM key-value cache memory by 6x and boosting inference speed up to 8x without quality degradation, pushing AI efficiency frontiers.