Same random seed each time, so difficulty stays the same. Made it so their usual first move, r 4 4, reveals a large area, so they can demonstrate actual reasoning, not guesswork. This board is trivial for anyone who knows how to play.

I basically tried this with all existing models a few days ago, including the big o1, the big r1, Sonnet 3.5, etc. None of them could solve it.

o3 can do it. Just the plain o3-mini, I didn’t try the o3-mini-high yet.

It might not seem like much, but it shows a clear ability for spatial reasoning, that is well beyond any others. We kind of already new this from the ARC stuff, I guess, but I’m impressed to see it in action.

Python script, for anyone curious:

You must log in or register to comment.

Chat

Singularity

singularity

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

Community locked: only moderators can create posts. You can still comment on posts.

Everything pertaining to the technological singularity and related topics, e.g. AI, human enhancement, etc.

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

1 user / day
5 users / week
14 users / month
40 users / 6 months
1 local subscriber
110 subscribers
16.9K Posts
33 Comments
Modlog

mods:
Lemmit.Online bot