How to do backups

How to do backups

So that you don't get bit rot (bit flip) or worse later down the line.

The main point that I see missing from lots of solutions is that they don't “scrub” your data very often, if at all. By scrub, I mean check every byte (typically done with a checksum) of both the source data and the backups to make sure they aren't silently corrupted over time because you haven't been accessing them.

Most solutions (including OS X Time Machine, NTFS, rsync by default) only check the timestamp and filesize parameters of the files to see if they have changed. For data not accessed at all, this isn't good enough.

Thankfully, disk manufacturers include error correction data by default, but it seems to only occur when you access the file, although some seem to mention background scrubbing of data. So, if you don't touch every byte of that hard drive when you run your backup script, the errors can accumulate and can screw you up! Not sure how long exactly yet, it's a birthday/hash collision problem though. See Collision Probability

Ubuntu scrub solution (check for bad blocks): http://linux-sys-adm.com/how-to-check-for-bad-sectors-ubuntu-linux/

Windows you can run a full smart scan with HDDScan. But might want to try DiskCheckup next time for a full scan. Seems like more information and better interface.

A good solution should

Verify every byte of your source data and make sure it isn't corrupted.
Keep history if necessary

I like Dropbox so far if you want an automated way to do this for <5-10GB of data for free. Their 1TB tier is $8.25/month. Just don't launch it on startup, it likes to scrub all your files before letting you use your computer

Collision Probability

Google says they see a 5 bit / hr / 8GB error rate for RAM ECC_memory. Wow…didn't think about ECC RAM, yikes!. They indicated that the errors are correlated with high CPU activity, so probably isn't as useful in our case for HD backups (although it actually is in another sense).

So…pick a value, scale it up to the size of your HD, then use the Birthday_problem#Cast_as_a_collision_problem calculation to find how many flips we need to get a reasonable probability of error. Then make sure we scrub more often than that!

How many years do we need to wait? Assuming 5 flips per hour per 8GB and stopping once 50% probability of collision within the same byte.

sqrt(ln((1/(1-0.5)))*8,000,000,000*2)/(365*24*5) = 2.4 years.

Checksumming File Systems

BTRFS/ZFS/ReFS (windows), they keep a checksum of each file/sector? (not sure which) along with the files in a custom file system. Apparently rsync doesn't keep checksums, so … keep multiple backups as a final arbiter in case a checksum doesn't match or use a checksumming file system.

Test hard drives long term

disk-filltest and maybe Data Test Program (dt). Other options for memory are memtest86 and CPU are mprimes. Contained on http://www.ultimatebootcd.com/ and Hirens boot cd.

disk-filltest

// For writing data ...


// For verifying data...
sudo ./disk-filltest-64bit -C /media/nhergert/foo -r -s 0 -S 100

3 months later, no data corruption on 750GB, interesting

Can check S.M.A.R.T. data on Ubuntu using gsmartcontrol. Make sure to run sudo

Using nohup to log to file and continue to run after I ssh out. http://unix.stackexchange.com/questions/101529/can-a-shell-script-running-in-a-ssh-continue-to-run-if-the-ssh-instance-closes

Synology

Looks great for a DIY collaboration dropbox.

Unfortunately WebDAV doesn't work over the normal quickconnect. Need to set up a DDNS, which seems doable? https://www.synology.com/en-us/knowledgebase/DSM/tutorial/General/How_to_make_Synology_NAS_accessible_over_the_Internet#t3

Backup Data Reliably

Conclusion: Dropbox's $10/TB/month is probably good. If you want to do it yourself, be sure to check the reliability of all your data often, which can be done using the S.M.A.R.T. thorough test or using BTRFS/ZFS “scrub” features.

Pre-done

Amazon glacier is currently $.007/GB/month to store, which is really similar to Dropbox' $10/TB/month.

How do you ensure that the source data is not corrupted? For example, do you checksum all data every time the app is opened? How do you (or anyone) handle an I/O read error from an uncorrectable bad sector? Do you update the sector in the source file if it's corrupted and you have a good backup copy?

“Scrub”: force the hard drive to correct all bits using ECC / file system to correct all bits using checksum/backup on the source drive

Optional questions if you know:

How long between “scrubs” of hard drive are necessary? I'm trying to figure out the rate of bit rot and how much is tolerated by ECC codes before the sector goes bad.
How would I “scrub” a non-zfs/btrfs file system? Would a S.M.A.R.T. long scan do this? Or do I need to use another program to do that?

Thanks!

Nolan

DIY

Run rsync or something similar with your source drive to your backup drive first (to update the files). Then run it and check all the checksums too and note the differences. However, you don't know which is correct for older files . However, if the file plays/loads fine, what does it matter?

rsync --dry-run --checksum --itemize-changes --exclude "*/homerot/*" --exclude "*/551_projects/*" -azR /media/nhergert/Ubuntu/NolanBackup/home/nhergert/DropboxArchive/ /media/nhergert/a5cbb8a7-e29f-4f30-b09f-e1e3bd17746d/home/nhergert/DropboxArchive

BTRFS probably

Sync files using rsync, ssh key copied. On write of changed files, rsync will __ (unsure Periodically run btrfs-scrub to check state of bits in both backups.

Check exit status. If non-zero, send an email or something.

Won't

Bad?

Potentially really slow to sync.

ZFS / BTRFS protect against bit rot, and let you “scrub” the data to verify the checksums.

Ubuntu howto on BTRFS, how to test. Not sure on long term file structure reliability yet. But, just include a copy of that OS on there and you should be fine…

Don't really want to use ZFS as it's generally 64-bit only and unstable on 32-bit.

Datacenter Admin

Erasure coding is more efficient storage-wise than RAID: https://www.intel.com/content/www/us/en/storage/erasure-code-isa-l-solution-video.html. Just requires some CPU overhead. Can use Ceph, which probably supports Intel CPUs. https://en.wikipedia.org/wiki/Ceph_(software)

Table of Contents