Running a large language model is expensive, and a surprising amount of that cost comes down to memory, not computation.