all notebooks (20+) do not care? [D]

old.reddit.com

Kaggle dataset: one of the input features has a >0.99 correlation with the target, yet most/all notebooks (20+) do not care? [D]

old.reddit.com

Lemmit.Online botMAB to

Machine LearningEnglish · 8 months ago

There is this dataset (won't link here as I don't want my kaggle and reddit associated) with a few input features (5-6) used to predict one target...

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/ToThePastMe on 2025-01-15 16:17:30+00:00.

There is this dataset (won’t link here as I don’t want my kaggle and reddit associated) with a few input features (5-6) used to predict one target value.

But one of the features is basically perfectly linearly correlated with the target (>0.99).

An example would be data from a trucking company with a single model of trucks:

Target: truck fuel consumption / year Features: driver’s age, tires type, truck age, DISTANCE TRAVELED / year

Obviously in average the fuel consumption will be linearly proportional with the nb of miles traveled. I mean normally you’d just use that to calculate a new target like fuel/distance.

Yet not a single person/notebook did this kind of normalization. So everyone’s model has >.99 accuracy, as that one feature drowns out everything else.

Is that something other people noticed: more and more the code looks fine (Data loading, training many types of models), maybe thanks to LLMs. But the decision making process is often quite bad?

You must log in or register to comment.

Chat

Machine Learning

machinelearning

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

Community locked: only moderators can create posts. You can still comment on posts.

This subreddit is temporarily closed in protest of Reddit killing third party apps, see /r/ModCoord and /r/Save3rdPartyApps for more information.

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

1 user / day
1 user / week
1 user / month
5 users / 6 months
1 local subscriber
20 subscribers
2.52K Posts
1 Comment
Modlog

mods:
Lemmit.Online bot