This is an automated archive made by the Lemmit Bot.
The original was posted on /r/singularity by /u/FosterKittenPurrs on 2025-01-31 20:22:12+00:00.
I made a simple python script to play Minesweeper via CLI, to ignore any vision issues and test this with all models. Its outputs look like this:
Coordinates: x (horizontal) and y (vertical) start at 0
Current board:
0 0 0 1 - - 1 0 0
0 0 0 1 - - 2 0 0
0 0 0 1 3 - 3 1 0
0 0 0 0 1 2 - 1 0
1 1 1 0 0 1 1 1 0
- - 2 0 0 0 0 0 0
- - 3 1 1 0 0 0 0
- - - - 2 1 1 1 1
- - - - - - - - -
Enter action (r x y to reveal, f x y to flag):
Same random seed each time, so difficulty stays the same. Made it so their usual first move, r 4 4, reveals a large area, so they can demonstrate actual reasoning, not guesswork. This board is trivial for anyone who knows how to play.
I basically tried this with all existing models a few days ago, including the big o1, the big r1, Sonnet 3.5, etc. None of them could solve it.
o3 can do it. Just the plain o3-mini, I didn’t try the o3-mini-high yet.
It might not seem like much, but it shows a clear ability for spatial reasoning, that is well beyond any others. We kind of already new this from the ARC stuff, I guess, but I’m impressed to see it in action.
Python script, for anyone curious: