this post was submitted on 24 Sep 2023
41 points (97.7% liked)

Selfhosted

40198 readers
977 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS
 

Hi, currently I have a almost none backups and I want to change them. I have a PC with Nextcloud on 500gb ssd that I also use for gaming (1tb system drive). Nextcloud would be used to store/sync images, documents, contacts, and calendar from my phone and laptop. I also have an old pc that has 2x 80gb, 120gb, 320gb, and 500gb hdd. I want to use it for other backups like OS snapshots, programming projects, etc. but its not a big hdd but a lot of small hdds. Should I store each backup on 2 drives? Can I automate this? Any suggestions would be helpful.

top 20 comments
sorted by: hot top controversial new old
[–] [email protected] 15 points 1 year ago* (last edited 1 year ago) (1 children)

Don't use a synchronized folder as a backup solution (delete a file by mistake on your local replica -> the deletion gets replicated to the server -> you lose both copies).

old pc that has 2x 80gb, 120gb, 320gb, and 500gb hdd

You can make a JBOD array out of that using LVM (add all disks as PVs, create a single VG on top of that, create a single LV on top of that VG, create a filesystem on top of that LV, format it as ext4 filesystem, mount this filesystem somewhere, access it over SFTP or another file transfer protocol).

But if the disks are old, I wouldn't trust them as reliable backup storage. You can use them to store data that will be backed up somewhere else. Or as an expendable TEMP directory (this is what I do with my old disks).

My advice is get a large disk for this PC, store backups on that. You don't necessarily need RAID (RAID is a high availability mechanism, not a backup). Setup backup software on this old PC to pull automatic daily backups from your server (and possibly other devices/desktops... personally I don't bother with that. Anything that is not on the server is expendable). I use rsnapshot for that, simple config file, basic deduplication, simple filesystem-backed backups so I can access the files without any special software, gets the job done. There are a few threads here about backup software recommendations:

In addition I make regular, manual, offsite copies of the backup server's backups/ directory to removable media (stash the drive somewhere where a disaster that destroys the backup server will not also destroy the offsite backup drive).

Prefer pull-based backup strategies, where hosts being backed up do not have write access to the backup server (else a compromised host could alter previous backups).

Monitor correct execution of backups (my simple solution to that, is to have cron create/update a state file after correct execution, and have the netdata agent check the date of last modification of this file. If it has not been modified in the last 24-25hrs, something is wrong and I get an alert).

[–] [email protected] 2 points 1 year ago (2 children)

Thank you for your detailed response! I will checkout JBOD arrays, if that wont work I will probably buy newer larger disks.

[–] [email protected] 0 points 1 year ago

btrfs has this built in with additional redundancy, so that is by far the better option to combine multiple drives into one large pool.

[–] [email protected] 0 points 1 year ago* (last edited 1 year ago)

JBOD here just means "show me this bunch of old drives as a single drive/partition". It's just a recommendation to at least get something out of these drives - but don't use this as backup storage , these drives are old and if a single one fails, you lose access to the whole array.

If you're not sure what to do with them, just get an USB/SATA dock or adapter, and treat them as old books: copy not-so-valuable stuff on them, and store them in a bookshelf with labels such as Old movies, Wikipedia dumps 2015-2022...

Definitely get a good, new drive for backup storage. And possibly another one for offsite backups.

[–] [email protected] 14 points 1 year ago* (last edited 1 year ago) (3 children)

I really love Kopia.

I mostly use it for cloud backups but it also works great for local/network storage as well.

It's really fast and efficient, supports cutting edge encryption and compression algorithms and the de-duplication and file-splitting features will let you generate frequent snapshots while costing you minimal storage.

Snapshots are also effortless to mount and it even supports error correction to protect against bit-flipping and other long-term storage risks.

It's also cross-platform and FOSS.

De-duplication prevents duplicate bits of data from being stored twice. Even if they are different file names or even synced from different systems.

The rolling hash/file-splitting means if you modify a 25GB file and only change a couple MB then only the changed couple MB will need to be stored. This means you can spend a month modifying small parts of a massive file thousands of times and avoid storing a new 25GB file thousands of times to archive those changes.

[–] [email protected] 2 points 1 year ago* (last edited 1 year ago)

Can second Kopia! The deduplication works like a charm.

I've recently started using Immich (I previously used Google Photos). And since I've backed up a recent Google Takeout archive (unzipped), backing up all of my images in Immich added just a couple hundered megabytes (over ~200GB of images).

I'm personally using https://www.idrive.com/object-storage-e2/ as the target, but any S3 compatible place and many other targets are possible as well.

Edit: also, don't discount paying for some cloud storage for backups entirely: I never wanted to do that since I wanted to host it myself, but there's multiple reasons to have one of your backup targets be a cloud storage (yes, I know I'm in the selfhosted community):

  • it's definitely physically seperate
  • most cloud storage has incredibly reliable storage (which is hard to replicate on most home-storage-budgets)
  • the cost can be very low even compared to buying disks (I pay 20$/year for 1TB, which can hold all of my valuable data easily, obviously not my "bulk stuff").
[–] [email protected] 1 points 1 year ago (1 children)

Thanks!

My first though was: oh no, thats a KDE app

[–] [email protected] 5 points 1 year ago

Haha nope not KDE-related afaik!

Just a great FOSS project.

Did I mention it's also ridiculously fast?

It quite noticeably out-performs any other solution I've tried.

[–] [email protected] 1 points 1 year ago

Kopia sounds nice, thanks! I want to back up my Nextcloud to a Nextcloud of a friend. Should be working with Kopia/WebDAV.

[–] [email protected] 7 points 1 year ago (1 children)

How old are these disks? If wouldn't trust anything of value to an HDD (better to save them on a bunch of good quality DVDs or BluRay disks than relying on such old disks.

[–] [email protected] 4 points 1 year ago (2 children)

Around 15 years. Should I buy something like 2x 1tb hdd and raid them together?

[–] [email protected] 2 points 1 year ago (1 children)

If I've learned something about selfhosting and backups it is that you can trust HDDs to spin for 3-5 years and should still do backups. I myself do backups to HDDs that are only powered on for these backups. I'm still not sure if thats enougth.

Raid is more for an always-on solution, but not great for safe backups. They still might get damaged at the same time, because you bought them at the same time, from the same vendor and they have the same usage time.

[–] [email protected] 1 points 1 year ago (1 children)

Raid is more for an always-on solution, but not great for safe backups. They still might get damaged at the same time

Yes.

I believe it really depends on the amount of data you write to the disks. From my experience: if you've two disks, same model, same brand, same powered on hours they might fail at the same time and you end up with nothing thus for most people it might not even be worth to RAID at all on a home NAS. Have a main disk for always online to write / read from and a second disk that is turned on once a day to rsync all data is. Most likely safer and more reliable, you also get extra protection against accidental deletes.

[–] [email protected] 1 points 1 year ago

These kinds of issues are what drove me to use RaidZ2 (I went over board with using 6-disks): When during resilvering after a broken disk a second disk fails, it'll still keep the data.

[–] [email protected] 1 points 1 year ago

One thing that RAID doesn't do is verify the integrity of your data on read. In other words: if you have silent data corruption somewhere you won't notice.

For many use cases that's acceptable, since it doesn't handle often, but personally I don't like it for any kind or achival/backups. That's why I picked ZFS, which stores and verifies checksums even on non-mirrored/non-raid storage. I've added RaidZ2 (similar to RAID 5 with 2 parity disks) on top of it to be able to recover from checksum errors.

[–] [email protected] 3 points 1 year ago* (last edited 1 year ago)

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I've seen in this thread:

Fewer Letters More Letters
NAS Network-Attached Storage
NUC Next Unit of Computing brand of Intel small computers
RAID Redundant Array of Independent Disks for mass storage
SATA Serial AT Attachment interface for mass storage

4 acronyms in this thread; the most compressed thread commented on today has 8 acronyms.

[Thread #163 for this sub, first seen 24th Sep 2023, 18:15] [FAQ] [Full list] [Contact] [Source code]

[–] [email protected] 3 points 1 year ago

Yes, storing each backup in 2+ places is best.

You can of course automate it. If you're running backups from Linux use anacron rather than cron because anacron tries to run when it can (when the machine turns on), whereas cron doesn't run again if the machine was off when it was time to run.

rsync is the most straightfoward solution. Pros: it won't copy files again if they haven't changed; it can copy remotely over ssh. Con: it has a bit of a learning curve.

BorgBackup would be my next recommendation, it takes distinct backups but doesn't duplicate files between them. It has compression, encryption (optional) and you can run checks on the backups. Con: for remote use you need to run a borg server on the target machine. Another potential con is that it doesn't store the files in a directly usable format like rsync. Borg archives are similar to a zip archive – you can list the files, you can extract them, you can even mount them somewhere and then access the files directly – but you can't access them directly without borg.

[–] [email protected] 2 points 1 year ago

How much data are we talking about? I get confused. Is it 1.5TB or is it 2.5TB?
Then, how backed up do you want to be? Think about if you REALLY need daily backups. While Raid might be cool and flashy; if you don't need it you don't need it and running it only creates cost.

If you have about 2 TB of data then i would just buy 3 external HDD of 2TB size and replace them every 5 Years. Then rotate them around every time you do a backup. Can you automate your backups? Not to the point of you not having to do anything. Unless you choose to pay for 2 cloud storage providers and both offer you to save your backups in an unchangeable state.

[–] [email protected] 2 points 1 year ago

I'm sure there are more elegant solutions out there, but here's my method:

I have an inexpensive hard drive dock connected to my NUC home server via USB (with UASP support). I rotate two large-capacity hard drives between work and home, ensuring that one is always off-site. The drives are wholly encrypted, so I manually decrypt and mount the drive, and run a backup script that pulls any changed data from all devices on the network. I then take that drive to work and bring the other one home.

I have a calendar reminder to do this each month, and I'll sometimes run a backup in between the usual schedule when we're working on important projects at home.

[–] [email protected] 0 points 1 year ago

I personally don't use automation, I just have a Veracrypt volume for storing backups and do them manually. Rarely full-system, mostly just home folder.