This is an automated archive made by the Lemmit Bot.
The original was posted on /r/machinelearning by /u/stalin1891 on 2025-06-11 19:31:27+00:00.
Are there any state-of-the-art VLMs which excel at spatial reasoning in images? For e.g., explaining the relationship of a given object with respect to other objects in the scene. I have tried VLMs like LLaVA, they give satisfactory responses, however, it is hard to refer to a specific instance of an object when multiple such instances are present in the image (e.g., two chairs).
You must log in or register to comment.