Lemmit.Online bot

Lemmit.Online bot

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/unraid by /u/mrc1600 on 2023-09-19 04:47:59.

I’m really at a loss here, and I don’t really know where to begin.

I’ve had Unraid going strong for a few years now. I used to run my primary desktop experience as a windows VM with GPU passthrough. I self host homeassistant in a VM. My list of Docker containers is quite long and inclusive of SWAG, Authelia, the ARRs and so on. I host a Plex server for my family and a couple of friends. I’m really not new to this, and I’m pretty confident in what I do know, but admit that there’s plenty of things that I don’t know that I don’t know.

Now the issue. I noticed that I was getting some odd notifications from LunaSea (which ties into the ARRs) telling me that Overseer had handled some requests that were quite old. For the most part I brushed it off, but I did notice that Sonarr had some odd items lined up in the activity queue, and had indicate that there was no file in the download folder. I’ve seen this before - usually something to do with permissions - but it was late and I was tired, so I shrugged it off. That was last night, and then today I went to work and never checked in on the server.

Until I get home and things settle down with family and the kids. Then as I’m looking at it, my wife tells me that some episodes of her show were just gone. As she’s telling me this it’s slowly dawning on me. Approximately 80% of my television library, and probably 50% of my movie library is gone. Just vanished. There are entries in the file history on Radarr and Sonarr that show sometime around 2:00am, the file was detected as missing on disk and removed from the movie database. I’ve seen 2:28 and 2:38 listed on the few files I’ve checked out, but haven’t had the heart to poke further at the corpse of my library, so I don’t know exactly when the catastrophe started, or how long it lasted.

Looking at my drives, everything is running as expected (though there are a very high number of writes to parity and reads from the array disks that held the media) and there are no hardware errors.

It appears that this was all limited to my /usr/multimedia share. The damage is extended beyond the folders that the ARRs and download clients have access to, though, as I use that share for comics, audiobooks, and so on. The gut punching part is that I had some family photos in that /multimedia share, and they’re affected as well. My other shares seem entirely unaffected, as far as I can tell.

The data loss is random and incomplete. Some TV shows are completely wiped out (the majority, really) while others have just a few files scattered about. I had all 7 Harry Potter movies and Audiobooks, now I’ve got 4 movies and 3 books.

I know the truth here: the data in question is gone, and I’m not asking for some miraculous way to recover it. Unless there’s something I’m missing, the files were deleted from disk and parity was updated. I don’t know how to his undelete or bring things back out of the trash can before emptying it, so to speak, at the Unraid level. If that’s possible, I’d be overjoyed, but I’m not holding my breath.

What I am hoping for is a solution to prevent something like this from happening again. I guess I had always trusted Unraid with my data, knowing that RAID is not a backup, against a hardware failure. I hadn’t factored in that with increasing complexity of my server would be the increased likelihood of user/application error being the cause of data loss. Hell, I’m not even ruling out malicious intrusion, though I can’t seem to find any evidence of that in the logs.

Some more potentially important points.

I’m still having an issue with permissions, I think. I haven’t had the time to look into it yet, but the /multimedia/downloads folder, which is used by the ARRs and SABnzbd and so forth, is missing. Usually if something goes awry there, I do docker safe repair permissions, and the folders will be recreated. I’m stalled on downloading until I get that sorted out.
I run SWAG and host instances of Bitwarden, Apache Guacamole for remote access, a Firefox docker, a few of my media libraries, and a few others. All run with SSL, and are password protected with either native 2FA with Duo when possible (TOTP when not), or Duo via Authelia, when native 2FA is not offered.
One of those that I recently had served was NextCloud. I was so proud that I finally got it up and running that I gave it access to /mnt/usr so that I could use the external storage plugin to have external access to all shares of my server. I installed clients on my laptop, desktop, and phone, and had only used it for a couple of file transfers. Since that was the most recent thing that I had set up recently (though that’s been running for a few weeks now). Out of an abundance of fear I’ve straight up deleted the docker container.
My TLD is managed via CloudFlare, and I’ve enabled some georestrictions there (I know, not much protection, but something) no narrow my attack surface there.
My network hardware is Ubiquiti, and apart from their service issues right now, I’ve not been notified of any malicious invasions.
As I write this, I realize that I had given homeassistant SMB access to my /multimedia share ages ago. I did have to restore homeassistant from a backup just last night because there was something chewing through CPU cycles that I couldn’t pin down (probably something with ESPresense I had been working on), but that was done a few hours before the meltdown at 2am. I’ve since deleted the access to the share, since I wasn’t really using it anyway.
Last, and most important, is my self inflicted negligence. For the sake of convenience, I did the above and shared my /mnt/usr folder, which is probably not wise, with a docker container. I also have a rootshare share set up and shared via SMB on my local network. I don’t make use of user accounts for share access. I also do all of my media management in a single share so as to optimize symlinks instead of file transfer between shares. I think I’ll tighten up on the user and SMB share permissions a bit moving forward.
Nothing meaningful happened in the logs at that time, other than a string of “failed parsing crontab for user root” but those have been logged for days now.

So that’s my overlong story. If anybody can provide some advice as to where to look for a culprit, or how to prevent this sort of tragedy from happening again, I’m all ears. And not just the benefit for my future self, but to serve as a cautionary tale for others regarding any poor practices I may have entertained. Please let me know if there’s any more valuable information I can provide also.

TLDR: Wife tells me an episode from her show is missing. Come to find everything in my /multimedia share has been randomly selected to deletion. About 80% of television shows, 50% of movies, the majority of my comics (managed by ARRs and Mylar), as well as random other family photos in that share are just gone, apparently occurring around 2am. Drives are all healthy and spinning normally, though array disc reads and parity writes are understandably high. Can’t find anything in any logs (Unraid or docker container) of use, and am just simply at a loss.

Unexplained data loss

Unexplained data loss

This is an automated archive made by the Lemmit Bot.