Hey again! This is the last post of a three-part blog post series. I strongly suggest you go back to Parts 1 and 2 to get some context. Part 1 explains what digital preservation is and why it is important. Part 2 explains what digital preservation storage is and provides information about storage standards, guidelines, and best practices.
What are the options for digital preservation storage implementation?
Now we’re getting into the nitty gritty! This section1 describes some of the storage infrastructure options for digital preservation.
Where to store?
At a high level, storage options can be broken down into two categories:
- Locally managed or “on-premise” (aka “on-prem” or “on-site”) storage
- Cloud or “outsourced” storage, e.g., Amazon Web Services, Microsoft Azure.
Based on an institution’s requirements, technical infrastructure, and resources, one or both options may be feasible. Before deciding on one solution over another, a comparison of the features of each, in relation to the need for long-term management of digital collections, should be undertaken.2
Decisions about what type of storage works best for an institution should be influenced by the following factors:
- The level of reliability or “uptime” required. Do you need immediate access to your digital masters or can there be delays of minutes or hours in retrieving it?
- The number and types of users that need access to it. Who will take responsibility for managing the digital content—digital collections managers only, the entire archives staff, or someone else? Will a version of the content also be publicly accessible?
- Types and amount of digital content. How much storage do you need? At what rate will it grow?
- Redundancy. Is an institution capable of safely managing two or more copies of its digital content locally, or must it rely on cloud storage?
Considering these issues alongside best practices such as those in the NDSA Levels of Digital Preservation,3 levels of effort required, and the resources in place to support them, will help an institution identify the best storage options for its situation. My suggestion is that you create a document, spreadsheet, etc., and gather the right people in a room and hash out some answers and then write those answers down. It’ll be a lot easier to take your requests to your administration, or funders (because you’ll need, you know, money to buy this stuff) if you’ve got a team behind you and you’ve documented their decisions.
There are a variety of storage media options available for local digital preservation storage. Some are widely accepted as preservation-appropriate, while others are recognized as problematic due to their susceptibility to failure and obsolescence.
Removable media, such as portable hard drives, portable flash drives, or CDs and DVDs, are not considered viable as part of an overall preservation strategy. These media are highly susceptible to failure from degradation of the components that comprise them. The reality is that files are often stored on or burned to them and then the media itself is filed away or stored in boxes and not actively monitored. These media are also resource intensive to monitor for errors; human intervention is required to plug in a flash drive or play DVDs to identify errors in the files stored on them.
Two predominant storage media that are considered good alternatives for digital preservation management, instead, are referred to colloquially as “spinning disks” and “magnetic tape.” These are by no means the only options out there, but the information below provides some idea of the options available for the purposes of managing digital content.
Spinning disk storage, part of an IT-managed networked storage environment, is commonly used for digital preservation storage. Spinning disk storage has quick response times (low latency4) and allows for active monitoring, such as fixity monitoring, to take place. This type of storage is often highest in cost because the media is expensive, the servers are always “on” and must be maintained in an environmentally controlled and secure area, and staff must be available to keep the servers up and running and to actively monitor the data on them.
Magnetic data tape, most commonly “LTO tape,” is typically used either for nearline5 or offline storage.6 Magnetic tape media is less expensive than spinning disk or other low latency storage options, and the cost of managing it over time is greatly reduced, especially for offline storage. Like other removable media, its mediated nature slows access and preservation activities such as active fixity monitoring. However, magnetic data tape is much more reliable and far less prone to failure than portable drives and optical disc media such as CDs and DVDs. Magnetic data tape can be stored in tape libraries and loaded into tape robots, which can provide some automation for access and preservation activities.
Here’s the key point: Media that enables digital collections managers to actively monitor the health of their collections is always the best choice when deciding on storage options. Luckily, at the time of writing, this type of storage also tends to be the most prevalent. No matter what choice you select for storage, though, always backup your data at least once and ideally twice (three copies total), or more.
Each type of storage has its own financial and organizational implications, and each institution will need to weigh the factors above to come up with a solution that best suits their needs. In some cases, it will not be an either/or decision but a solution that uses both types of storage to their best effect for the institution’s unique situation.
For example, one institution might have a mandate to maintain all collections, whether digital or physical, onsite. In this case, they may opt for a local-only storage solution. Another institution might not have the infrastructure and staff to manage collections onsite, due to costs or personnel restrictions, and may opt instead for cloud storage (from Amazon, Microsoft, Google, etc.) or a provider like Preservica, Libnova, Exlibris, Arkivum, or DuraCloud that offers a set of preservation services in addition to cloud storage. And, as is more and more often the case, an organization might opt for a hybrid approach. In this case, they may choose to keep a single online copy on local storage so they have quick access to files when they need them. Secondary and tertiary copies may be stored locally on online or nearline storage or in the cloud. Often, yet another copy is stored on magnetic data tape (such as LTOs) in a different geographic location. These second and third backup copies tend to be versions of files that do not need to be accessed readily except for periodic fixity monitoring. This hybrid approach is an excellent way to (a) alleviate single points of technology failure by distributing content across storage solutions and (b) distribute content across geographically diverse locations.
An easy mnemonic for basic digital preservation storage requirements is 3-2-1, which means:
- Three copies of your data,
- On two different media types,
- In more than one geographic location.
Now it is your turn to get your organization ready to answer the following questions to prepare for digital preservation storage. Again, copy these questions to a spreadsheet or document, get your team together (or, if you are a team of one, get comfortable), and discuss and document your answers!
- In your organization, what constitutes a digital asset that requires long-term digital preservation?
- What assets do you have now in digital form?
- Do you store multiple versions of differing quality of a single digital asset? If so, describe. E.g., access and master copies.
- What file types are you acquiring or creating through digitization? E.g., TIF, WAV, MP3.
- What plans do you have for further digitization or acquisition?
- How much (in GB, TB, or PB and total number of files) digital content do you have now?
- What is your estimated growth rate on an annual basis? (Go to the ballpark for this answer if you need to.)
- On what server(s) or other storage media are digital assets located now? List them all.
- How are you tracking what you have and where it is located? E.g., inventory, database.
- How often do you access original digital master assets? E.g., all the time, rarely ever because we have access copies.
- Do you have the server space to manage multiple copies of your assets on-site, or does it make sense to take advantage of the flexibility that cloud storage offers?
- Do you have the staff to manage the on-site storage, or is the cloud a more feasible option?
- Are there any other organizational requirements or constraints that need to be considered for the selection of digital storage? E.g., funding, administrative buy-in.
Once you have answers to these questions, you can begin to make decisions about what storage makes the best sense for your organization. Don’t fear if you don’t have a lot of storage options today. Remember that the “perfect is the enemy of the good.” Good enough = OK for now. If you keep improving on good enough, even incrementally, eventually you’ll hit really good, and then awesome. Maybe you’ll never hit perfection, but as long as you keep working on it, you'll soon have a better-than-good-enough digital preservation storage situation.
Also -check out “Chapter 4: Managing digital audiovisual collections” in Fundamentals of AV Preservation, written by (me and other) AVPeeps for the NEDCC.
Have more questions? Hit us up.
1 This section is taken from Amy Rudersdorf’s chapter “Managing Digital Collections: Section 3: Storage Infrastructure,” part of the NEDCC’s Fundamentals of AV Preservation textbook.
2 First steps can include a review of this comparison table.
3 I write about the Levels of Digital Preservation in Part 1. Remember how I said that blog post was a good place to start?
4 Latency is the measure of how quickly the storage infrastructure responds to requests for access to a digital file,
5 In this case, digital content is available to users with some lag time, which can be a few seconds to a minute or longer.
6 Here, digital content is stored on a piece of media that requires a human to connect it to a computer in order to access the data on it.