This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/The_Bukkake_Ninja on 2024-10-22 21:17:06+00:00.


I have a large volume of backed up documents, photos and pst files that I have consolidated off random discs and portable HDDs over the years onto an Unraid server.

There are a lot of duplicate files on those drives. I’ve run a Czwaka analysis of the files using a Blake3 hash comparison. I followed this guide () on how to configure the comparison - I.e. checking a pre-hash of the first 2kb of the file to eliminate files that clearly aren’t duplicated, then doing a full hash comparison of the remainder.

My (probably dumb) question is - is there any chance that a file flagged as duplicate based on a full Blake3 hash comparison is, in fact, not a duplicate? My assumption is that this is basically mathematically impossible, but I wanted to check with people possessing greater expertise before I went and eliminated all but one copy.

Apologies if this has been fully answered in another thread - I’ve searched this subreddit, but with how bad Reddit search is I could have missed it.