Fundamentals of backup

When starting to work as a group, it’s essential to share files. There are many ways to do it, like using a shared space, or using a cloud solution and grabbing what you need to work. Whichever you choose, you will need to have backups of everything, and this article focuses on the importance of backup. This applies to individuals too, of course.

TL;DR

  • Backup, backup, backup.

TL;DR, extended

  • Backup is a fundamental component of any digital file management as, no matter what, there will always be an issue. It could be a hardware failure, a software failure (file corruption for example) or a human error, but shit happens, even to big companies (or small ones like ours).
  • The 3-2-1 backup rule is the most basic one : 3 copies of any of your files, on at least 2 media and one of them should be off-site.
  • You need to scale your hardware and solutions to your needs (space, speed, accessibility) as two video productions can have very different space requirements.
  • You can set different backup scenarios depending on the sensitivity of your data. Human made data is often more important than data automatically generated. For example, rendered image sequence that can be re-rendered are probably less important than the lighting task file for which an artist has been working for many days.
  • Finally, don’t assume it’s working. Check and setup notifications, assess your restore scenarios. Backups can save your project from failing, or even your company from being closed.

No matter what, it’s going to fail

As a teacher, what I always say to my students about saving data is: “there are only 3 rules: backup, backup and backup”. Actually, there are many rules and recommendations about files and backups. And they don’t apply only to companies and studios, but to individuals too.

Usually data is stored on hard drives, complex technologies relying on mechanical pieces, spinning at high velocities. And like any mechanical piece, it gets worn down and may break. Of course, there are SSDs and other technologies without mechanical parts, but they are rather expensive for big volumes of space, and they come with their own set of issues.

You should never wonder if it’s going to break: it is going to break. The question is: when? If it’s not a hardware or software problem, it will be human error that destroys your data. So be prepared.
Ask yourself: how much am I going to lose if I break that laptop/pen drive/network directory? If the answer is not “just material”, but years of work or memories lost forever, you should care about backups.

3-2-1

The most common rule is the 3-2-1 backup rule:

  • have 3 copies of any of your files
  • store them on 2 different media
  • at least 1 should be offsite

3 copies is indeed the minimum. Because, if you lose your main copy and only have one spare, what happens if something else breaks while grabbing the backup? You lose everything.

Storing them in two different media means avoiding having 3 copies on the same hard drive or NAS. Because, if that storage burns for example, you still lose everything.

Finally, on two sites means avoiding that all your copies be destroyed at once if, sadly: the building burns, the room is flooded, or a thief takes your laptop and backup hard drives (which I’ve witnessed). A cloud backup could do the work in that case.

Shit Happens, Even to Big Shots

Shit happens all the time. And not only to individuals or small companies, but to big ones too. In my previous work in a big international studio, I’ve witnessed human and hardware mistakes at larger scales. That happens and this is why there are people working to handle it. Relying on luck is not a good business model. That said, not many companies talk about issues they may encounter, because it’s embarrassing. But Pixar did, in this little video:

Read more about it here

And… it’s happened to us

At some point during the production of the feature film Dilili in Paris, I was called up late on a Sunday night. “Flavio, everything from Dilili is missing from the shared drive” Damien told me. Because he was working on an add-on and wanted to try on real data, he’d connected to the server and realised the incident. Indeed, 2 TB had just vanished in the night between Friday to Saturday. 15 people would be coming back to work the next day, with nothing to do. It took 24 hours of hard work to get everything back (and 5 more days to restore our off-site sync), monitoring a lot, comparing and checking, setting priorities restoring files by working sequence. In this way, people could work while the backup was restoring the rest. It was a very stressful moment.

A lack of communication (someone was working Friday night, realised something was wrong but thought it was just a network issue and didn’t mention it) and of notifications (that disappearance should have triggered alerts… if something was setup), made us fix it late. It could have been worse, realising it only Monday, losing a day of work (or more) for many people.
Without a working backup, the company could have shut down that day, not being able to recreate months of work.

Keep in mind that data loss can lead to business failure. The Herald Tribune had an article about this 7 years ago with many examples. So, handling data is a huge responsibility. And everyone, with or without IT support, individuals or companies, has to find solutions.

Scale to Your Needs

Not everyone needs a one petabyte workspace allowing hundreds of users to work at the same time. So one should think about the needs for their incoming project.
When we started the company three years ago, we bet that starting with 8 TB (8 000 GB) shared space should be enough for the parts we were doing on the feature film Dilili in Paris. Yet, we ordered 40 TB on hard drives. Because you have to take into account that the effective space will be much more. Any backup should be as big as the space it needs to save, to avoid it being full and not working anymore. So if there are 2 backups (to get 3 copies of your file), it’s at least 16 TB of hard drive added to the first 8. But then you can also use some safer RAID configurations, and it’s not 8 TB anymore of hard drives, but 12 or 16 for an 8 TB usable space. And you might save one or two spare disks to react quickly to failures and avoid waiting for a delivery.

So take time to consider and calculate, based on your previous works and on the information provided for the project, how much space you need to work and how much space you need to backup.

Because every project has a very specific way of dealing with space. If you don’t use hair and huge VFX simulation in your project, you avoid huge caches and bakes for example. If you are making something with live action footage, it’s going to take a lot of space to store the raw plates. And if your project is 4K instead of 1080p, every single image could be 4 times heavier. And are you working on 8 bits or 10 bits? 16? Or even 32 (do you really need that?)? Those choices will affect the disk usage a lot (and read/write speed). And then, are you working alone? Or are there 30 people (or more) working at the same time on the network? That’s drifting to another topic about the network, and we’ll skip it for now.

List all the parameters before deciding which solution and hardware to buy and setup, think about those details to adapt and scale solutions to your needs.

Be smart, save humans

Sometimes, you can’t afford having unlimited versions of any working file. You might need to prioritise backup. So how to choose?
I think the answer is pretty straightforward: Human work is usually the priority!
What a human has been doing is the most valuable data in most cases: save it no matter what. This is typically what is costing more and will be annoying and expensive to redo.
On the other hand, stuff made by computer scripts or actions, like renderings, cache and bakes, could be re-done without losing too much human time. So you can decide to get a different scenario for such data.

Sit with the supervisors of the project and list priorities, work times, render times, evaluate what you could sacrifice if you need too. Many strategies can apply depending on the project.

For example, you can have a dedicated space for renderings, which is not backup until the shot is validated and you save/backup your master for delivery. Or maybe those render times are crazy and you cannot re-render everything, in which case that’s your priority.

In our most recent project, we decided that we’ll only have one version of lighting and compositing image sequences. Because it’s very heavy as for space usage, and we were missing some, and we could re-render it quickly (on that project the render times were very low because of the technique). But that limits the ability to compare versions.
When I was working on Despicable Me, I made tools saving only the last 3 versions of rendered images sequences of lighting and compositing, while older ones would be deleted. Even there, on a 70 million euros project, we had to make choices as we couldn’t afford to save everything.

Assumption Is the Mother of All Fuck Ups

Work your scenarios and procedures, there are many solutions depending on the situation. But once you have setup a solution, a backup rhythm, methods and schedules: stick to them! Never go away from a procedure, this is when you might lose big.

Let’s imagine two backups, A and B. When B is off-site, A is in sync for one week. Then it’s being pulled off-site for the next week while B goes back on-site to be sync until the next week. So there is always an off-site copy, and you lose as much as a week of work if something wrongs happens on site.
But then a human has to stick to the procedure and make the swap every week. What if they goes on vacation or get sick? Consider every situation while working your scenarios.

If you don’t do your backup task one week, you’ll then forget the next one. From there, consider your backup useless. The best way is to automate as much as you can, so you don’t have to run the copy or sync, you don’t forget. And you can setup a nightly task when no-one is working on the network. Just monitor it and check or setup notifications to be sure everything is running fine.

7 years ago I had lost an entire web server, losing 10 websites, because the backup was not working well for months and I didn’t realise. Friends of mine have lost their entire blog for example and I felt terrible. Don’t assume it’s working, check it is. Assumption is the mother of all fuck ups.

And if the backups are running well, also check and document your restore scenarios, try them before you need to apply them. You might avoid (big) surprises…

This is it for the essentials of backups. We might have forgotten stuff, please tell us. Or you might want to share experiences too? The comments are open, feel free to react.
In an incoming article we will explain in detail what hardware and setup we had to start the company and other solutions you might use.

Thanks for reading!

Fundamentals of backup

Leave a Reply

Your email address will not be published. Required fields are marked *