At the time, almost no one was doing this with language models. The only other group exploring RL for LLMs was DeepSeek, the ...
Researchers from Saarland University and the Max Planck Institute for Software Systems have, for the first time, shown that ...