This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/FosterKittenPurrs on 2025-01-31 20:22:12+00:00.


I made a simple python script to play Minesweeper via CLI, to ignore any vision issues and test this with all models. Its outputs look like this:

Coordinates: x (horizontal) and y (vertical) start at 0

Current board:

0 0 0 1 - - 1 0 0

0 0 0 1 - - 2 0 0

0 0 0 1 3 - 3 1 0

0 0 0 0 1 2 - 1 0

1 1 1 0 0 1 1 1 0

- - 2 0 0 0 0 0 0

- - 3 1 1 0 0 0 0

- - - - 2 1 1 1 1

- - - - - - - - -

Enter action (r x y to reveal, f x y to flag):

Same random seed each time, so difficulty stays the same. Made it so their usual first move, r 4 4, reveals a large area, so they can demonstrate actual reasoning, not guesswork. This board is trivial for anyone who knows how to play.

I basically tried this with all existing models a few days ago, including the big o1, the big r1, Sonnet 3.5, etc. None of them could solve it.

o3 can do it. Just the plain o3-mini, I didn’t try the o3-mini-high yet.

It might not seem like much, but it shows a clear ability for spatial reasoning, that is well beyond any others. We kind of already new this from the ARC stuff, I guess, but I’m impressed to see it in action.

Python script, for anyone curious: