Posted on January 28, 2020Tweet
For years I have been trying to reduce my dependency on cloud based services and move to more self-hosted solutions. This can leave you more vulnerable to data loss since you have to manage the data yourself rather than relying on the large tech companies.
Hardware failure such as a dead hard drive could lead to you losing all your important data in a self-hosted solution. A similar hardware failure in a data centre of a cloud provider would not usually lead to any data loss due to the sheer scale of the operation and due to the ongoing data replication that occurs. Important data for me consists of personal documents, memories and photos. At the moment, if was to lose the memories and photos, they would be gone forever which would be really sad. I don't use social media for documenting life events and memories like a lot people do these days. I do have a plan to start printing my photos this year (but I haven't started yet...). Currently this data takes up around 1 TB in disk space and is probably made up of 95% photos. I expect this to double in size in the next two to three years. If I was to use something like Google Drive as a backup for this, it would cost around €10 per month for 2 TB of storage. The problem with this is that you could be paying that €10 per month for years and years. I know there are probably cheaper solutions out there but you will never own the storage. This one of the reasons I didn't want to use cloud storage as part of my backup solution. There is a well known backup strategy referred to as the 3-2-1 backup strategy which is supposed to give you sufficient protection against data loss. Its pretty simple and can be summarised in three quick points:
For a long time, I was missing the last point which most people easily satisfy with cloud storage. Over the Christmas holidays, I was trying to come up with a cost-effective offsite backup solution that would be secure, automated and reliable. Also I had to find a friend that would host it for me. My offsite backup server consists of a Raspberry Pi 4 and a 2 TB USB 3.0 external hard drive. While this solution is not ideal, it does get the job done and has a couple of benefits. The main benefit is that it has a small foot print and low power consumption so it shouldn't really have any impact on my friend. The external HDD is LUKS encrypted so if someone plugged the drive out of the Pi they wouldn't be able to view the data.
This solution would also have to be plug and play as it would be unfair to ask my friend to troubleshoot any issues with it. The backup server connects to my network using OpenVPN. This means that I don't have to make any changes on my friend's network such as port forwarding etc. Traffic is routed from my LAN (where the source data resides) back through the OpenVPN server to the offsite backup. Again this is not ideal and leads to slow enough performance but the files do eventually get there which is all I really care about. At this point I could of just setup a recurring job to run a rsync between the two servers but I wanted to have support for a wide range of backup clients (Windows machines, iPhones etc.). This led me to think "What is one of the most common object storage solutions out there?" - Amazon S3 of course. S3 is widely used in enterprise and the majority of companies delivering backup solutions already provide support for it. That in a roundabout way is how I landed on Minio - Minio is an opensource S3 compatible object storage solution that is used in private enterprise clouds. I'm not an expert in Minio at all but I was very impressed with how easy it was to get running and how it just worked. From my previous experiences with Openstack Swift object storage, Minio just feels simpler and more polished. I would recommend Minio to anyone looking to add object storage capabilities to their homelab setups. Yes Minio is probably overkill for most home use-cases but it works very well!
There were a number of backup software solutions that I could of gone for especially since I now had S3 compatible storage available to me but there is one called Duplicati which I had been wanting to try out for ages. Duplicati appealed to me because it was opensource and offered a lot of features straight out of the box. It supports multiple backends so I could add a further backup server down the line. One of the key features it offers is incremental backups and data deduplication which will help me save space on my little 2 TB external drive. Duplicati is also great for backing up to public cloud storage as it will encrypt your files before uploading them if you want. Duplicati has a web interface which makes it very easy to check the status of backups or to create new backups. I have a duplicati container running within my LAN that has access to one copy of the data. This instance encrypts the data and sends it to my offsite backup server through the VPN tunnel.
All of this just gives me another layer of protection against data loss. I am thinking of going one step further however and having an offline offsite backup mainly due to the big rise in ransomware attacks over the past couple of years. This would consist of a large external hard drive which would be disconnected for the majority of the time. I generally organise my photos monthly so I would be hoping to update this backup on a monthly basis too but if I lost a month or two of photos it wouldn't be the end of the world.
If you are looking for more of an overview of my home setup and the rest of my self-hosted services - you can check it out here - https://www.careyscloud.ie/homelab2020