Lemmit
  • Communities
  • Create Post
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
Lemmit.Online botMAB to /r/TechnologyEnglish · 23 days ago

Study shows that Vision Language Models can't perform rudimentary visual analysis

arxiv.org

external-link
message-square
0
link
fedilink
  • cross-posted to:
  • machinelearning
1
external-link

Study shows that Vision Language Models can't perform rudimentary visual analysis

arxiv.org

Lemmit.Online botMAB to /r/TechnologyEnglish · 23 days ago
message-square
0
link
fedilink
  • cross-posted to:
  • machinelearning
Vision Language Models are Biased
arxiv.org
external-link
Large language models (LLMs) memorize a vast amount of prior knowledge from the Internet that helps them on downstream tasks but also may notoriously sway their outputs towards wrong or biased answers. In this work, we test how the knowledge about popular subjects hurt the accuracy of vision language models (VLMs) on standard, objective visual tasks of counting and identification. We find that state-of-the-art VLMs are strongly biased (e.g., unable to recognize the 4th stripe has been added to a 3-stripe Adidas logo) scoring an average of 17.05% accuracy in counting (e.g., counting stripes in an Adidas-like logo) across 7 diverse domains from animals, logos, chess, board games, optical illusions, to patterned grids. Removing image backgrounds nearly doubles accuracy (21.09 percentage points), revealing that contextual visual cues trigger these biased responses. Further analysis of VLMs' reasoning patterns shows that counting accuracy initially rises with thinking tokens, reaching ~40%, before declining with excessive reasoning. Our work presents an interesting failure mode in VLMs and a human-supervised automated framework for testing VLM biases. Code and data are available at: vlmsarebiased.github.io.
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/technology by /u/Timely_Smoke324 on 2025-10-13 18:45:14+00:00.

alert-triangle
You must log in or register to comment.

/r/Technology

technology

Subscribe from Remote Instance

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]
lock
Community locked: only moderators can create posts. You can still comment on posts.

Subreddit dedicated to the news and discussions about the creation and use of technology and its surrounding issues.

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 5 users / day
  • 15 users / week
  • 39 users / month
  • 158 users / 6 months
  • 1 local subscriber
  • 200 subscribers
  • 23.3K Posts
  • 181 Comments
  • Modlog
  • mods:
  • Lemmit.Online bot
  • BE: 0.19.11
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org