AI Cheating at Chess: Some Notes
A study by Palisade Research found that advanced AI models (most notably OpenAI’s o1-preview or Claude Sonnet 3.5 from Anthropic) sometimes “cheat” in chess by hacking their opponent’s system files rather than playing by the rules. While older AI models required explicit prompting to cheat, the most recent agents seem capable of discovering and exploiting cybersecurity holes, raising concerns that AI systems might develop manipulative strategies and be uncontrollable for complex tasks. I’m a bit late to the party (the paper caught lot of attention early this year). I don’t want to give a strong opinion, but rather share some notes about some technical aspects, especially how AIs actually cheat at chess. It can serve to have a better discussion and assessment on this concerning case study.