Four simple rules for digital preservation!
by Heidi Winkler, Digital Preservation Librarian and Scholarly Communication Team member.
April 23-29, 2017 marks the American Library Association’s annual Preservation Week, a celebration of all things library preservation.
These are rules that I use in my work at the institutional level, and they are concepts that will take you far in the work you do on a personal level. These concepts are particularly vital for works-in-progress.
It would break my heart if you lost your only copy of your dissertation the day before it was due because of a hard drive crash, and I’m sure it would break yours, too. Therefore, keep all of this in mind whether you take part in formal open access efforts or not.
1. Have Multiple Copies in Multiple Locations
When I say “multiple copies,” I do mean multiple copies and not just “backing up” an item. You need to consciously create another copy of the work to be put in another geographic location. It is best practice to use common file formats like Microsoft Word, PDF, or JPEG. If you want to go gung ho in your preservation efforts, the Library of Congress has a list of Recommended File Formats.
It’s important to get the copies off your computer. Use a flash drive, external hard drive, CD/DVD, a cloud service, etc. And mix it up a bit – save something in DropBox, email it to yourself, and also save it to a thumb drive.
Store the copies in physically separate locations. (And I don’t just mean keeping your laptop in your bedroom and your external hard drive on the coffee table in the living room.) Other locations mean: your office on campus, cloud services, servers in other locations, etc. Basically, just assure that if natural disaster hits the main place where you keep important digital documents and materials, those items are safe because there’s a copy somewhere else.
It is important to note, though, that simply uploading an item to a website is not digital preservation in and of itself. Just like the items it contains, the internet is fragile. If you’ve even encountered one “404” page, you know that links rot and expire. I highly recommend maintaining both an on- and off-line presence for your digital items.
2. Give Those Copies Good Names
Most of us probably are not very good at giving files names that are designed to last for the long-term, and why should we? Most of the time we are the only ones who will see these files, and certainly we’ll understand our own codes to ourselves forever, right?
Scenario 1: If “blergh.docx” is my journal article, will I remember that in 15 years, long after publication and long after I’ve had lots of other way more important things to remember? No. (e.g. The Libraries have alumni who can’t remember under which first or last name they submitted their thesis or dissertation.)
Scenario 2: Let’s say someone has my computer and needs to open my article, but I’m not there to tell them that it’s saved under “blergh.docx”. How would they ever be able to find it without a lot of detective work? Also, others could delete it by accident, having no idea they just deleted months’ worth of work, just because the file name meant nothing to them.
3. Check on the Copies Regularly
The technical term for checking on files to make sure they are still usable is called “file fixity,” in which one ensures that a digital object has not been corrupted or altered. You can tear a piece of paper and still theoretically be able to read and use the content contained on that paper. If one bit of a digital file becomes corrupted, you might never be able to open that file again.
You should first check for file fixity when you first move a copy to a new location to make sure it copied correctly. Then, open those copies at least once a year to make sure they still can open. If you have a website, it’s also a good time to check for those broken links.
And if you’re feeling especially hardcore about file fixity, download a checksum tool. There are plenty of good open source ones online. A checksum is a series of numbers assigned to a digital item. If you re-check that item later and find that the number has changed, then you know that item has been altered as well.
4. Be Ready to Migrate the Copies to New Media
Finally, file migration. Your CDs, DVDs, thumb drives, and hard drives will break down over time, especially if you’re not actively using them. Plan to create new copies on different media every five years, at the least. Also be on the lookout for obsolescence. Think of floppy disks or VHS tapes; those are media that are basically obsolete. It is getting harder to find players for them, and when you do find them, they may not work anymore. If you’re not actively using these media, they’re deteriorating.
A catchier way of thinking about this is, “If it’s not running, it’s rotting!” And one day, our flash drives and DVDs will join those ranks of obsolete media. This threat is where multiple cloud storage providers come in handy. When you put something in the cloud, you’re putting it on a server that is constantly refreshing itself. It may often be better than just using fixed media. That said, third-party companies rise and fall every day, so don’t put all your digital eggs in one internet storage provider’s basket.
No one “digital preservation solution” exists. You can’t just download some program once and be done with it. Digital preservation is a series of steps that you need to perform on a regular basis. Preservation is an investment of time and resources from the very beginning of a project. But it’s absolutely worth it for those items that you want to keep long-term or those items that define your personal brand. If you can integrate even just one of the above rules into your daily work, you’re improving your chances for a better and hopefully more open digital future.