Spurrious rewards for RLVR review

← Back to logbook

Paper

I read this notion site that talks about some crazy discoveries in how some models learn better with random or incorrect rewards. Thanks for sending this my way, harry! Dope find. Major takeaways: