[D] About spatial reasoning VLMs

old.reddit.com

[D] About spatial reasoning VLMs

old.reddit.com

Lemmit.Online botMAB to

Machine LearningEnglish · 1 month ago

Are there any state-of-the-art VLMs which excel at spatial reasoning in images? For e.g., explaining the relationship of a given object with...

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/stalin1891 on 2025-06-11 19:31:27+00:00.

Are there any state-of-the-art VLMs which excel at spatial reasoning in images? For e.g., explaining the relationship of a given object with respect to other objects in the scene. I have tried VLMs like LLaVA, they give satisfactory responses, however, it is hard to refer to a specific instance of an object when multiple such instances are present in the image (e.g., two chairs).

You must log in or register to comment.

Chat

Machine Learning

machinelearning

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

Community locked: only moderators can create posts. You can still comment on posts.

This subreddit is temporarily closed in protest of Reddit killing third party apps, see /r/ModCoord and /r/Save3rdPartyApps for more information.

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

1 user / day
1 user / week
1 user / month
7 users / 6 months
1 local subscriber
20 subscribers
2.38K Posts
1 Comment
Modlog

mods:
Lemmit.Online bot