Hanover and Holiday Party

Hello Everyone!

I think I can speak for all of us residents when I say we’ve had a busy – and exciting – December.

12374875_10205597831960894_7053831128674262458_o (1)


Earlier this month, we took a resident road trip up to Hanover, New Hampshire for the New England regional Code4Lib conference, which was hosted at Dartmouth University. We had a great time and learned a lot. We also had some fantastic gelato! I’d go back to Hanover just for that gelato…



Fellow residents Alice and Jeff presented and gave a really fantastic overview on what digital preservation is and how to get started. They did a great job of explaining things in a way that was easy to understand and comprehensive.

12374869_10205597832680912_6731500940462823188_o (1)

Aw yeah, file formats (Slide from Alice and Jeff’s presentation)


Stefanie and I presented on how to build a better digital preservation community. We discussed the pros and cons of current methods of communication among professionals (list-servs, Twitter, blogs). To summarize: Pros: There’s a lot out there. Cons: Sometimes it’s hard to find what you need, sometimes it feels like there’s too much out there, and sometimes it feels like the same voices are speaking over and over. We also suggested some possible alternatives (resource-sharing websites, Meetups).

We then opened it up to discussion about ways to build community and got some really great feedback from the audience.  Someone suggested that maybe there should be a central directory of all resources, but then that raises the question: Who would maintain it? Someone suggested that the app Slack could be a good method of communication and mentioned that there’s an NE Code4Lib Slack.  Someone else made a good point, saying that perhaps it’s a good thing that there are so many resources and methods of  communication, because that’s evidence of a strong community.  People seemed very interested in the idea of having meet-ups, and we are hoping to plan an informal meet-up in the next couple of months…stay tuned for more information!

In other news, we had the host event here at Harvard yesterday. For those not familiar, each institution in NDSR has a host event during one month of the year. It’s an opportunity for us all to come together. I worked with Kristen and Andrea to plan it, and we decided that, with it being so close to the holidays, everyone might enjoy a nice NDSR Holiday party. So, we had snacks, as well as digital preservation-related crafts and carols!

If you would like to make your own digital preservation holiday crafts, I’ve included pictures below as a guide. As you can see, we have a binary garland chain – To make it, you simply cut out 1’s and 0’s out of construction paper, cut little holes in them, and string them together. I would recommend looking up what one letter is in binary – if you try to do multiple letters, it might take you a while.


Binary chain

We also have a three-legged stool ornament, both 3D-style and flat.


Celebrate digital preservation harmony

Finally, to spread digital-preservation cheer, consider singing some of these carols, which I re-wrote to celebrate all aspects of digital preservation, from OAIS to obsolescence:

The Twelve Days of Digital Preservation

Rudolph the Red-Line Reindeer

Floppy Disks

Rockin’ Around the 16363

Happy holidays to everyone! Please let me know in comments if you end up singing any of the carols. You know you want to! As my friend Buddy the Elf says, the best way to spread digital preservation cheer is singing loud for all to hear!

File Fixity and Proprietary Systems

Recently I’ve been wrapped up in finding and reading documentation for our systems at the John F. Kennedy Presidential Library.  Before I begin writing my own policies, I need to know what documentation is out there.  One challenge I’ve faced in collecting digital preservation documentation is the secrecy surrounding proprietary software and hardware.  Sometimes the barrier is poorly shared or outdated documentation rather than intentional secrecy.  This post will go over some of the challenges I’ve faced in addressing file fixity with proprietary tools.  To start off, here is a diagram of the tools used for management and storage in our digital archive

JFK systems (2)

The JFK Library uses a digital asset management system (DAMS) called Documentum, created by EMC.  Documentum is mainly used by businesses for managing records and internal documents.  Although it was not created for digital archives, it has some clear advantages for managing a digital repository as large as ours.  We currently store over 70 terabytes and that number is growing as we continue to digitize with the goal of making all the presidential papers available online.  Documentum has excelled at making the digital archive accessible to novice JFK enthusiasts and experienced researchers alike.  

However, a system built for improved access may not be an ideal solution for digital preservation.  The storage system, Centera uses a tool called MD5scrubber to ensure file integrity.  The tool creates an MD5 checksum for each blob.  Right now you might be thinking “Blob?  Does she mean file or folder?” No…I mean blob.  Centera stores the bit sequence for each digital file as a blob.  Blob stands for ‘Binary Large Object’ (although depending on who you believe that acronym may have been invented after the fact because people felt unprofessional saying ‘Blob’ all the time).  The MD5scrubber regularly validates the MD5 checksum for each blob and if the checksum has changed, the blob is moved for review by EMC.  If the file has been corrupted, it is then recreated from the mirror copy at Iron Mountain.  Creating and automatically validating checksums is important for verifying that a digital object has not been altered or corrupted.  

What worries me about this tool is that there is no input from the staff here at the Library.  Archivists are not even alerted if a file is corrupted. EMC staff maintains a seamless experience, despite file corruption and re-creation.  This is great for users who want immediate access to the digital archives, but as an archivist… I have trust issues.  I want our archivists to know how and when checksums are created, stored, and validated.  I also want our archivists to be able to manually validate a checksum when they believe it necessary.  Since Centera controls the checksums behind the scenes our staff can’t view or manage file fixity information through Documentum.  We also can’t confirm that the fixity information is stored separately from digital object, a practice recommended by TRAC to protect against malicious attacks.

Until recently the staff here at the JFK Library was unsure if Centera or Documentum were managing file fixity at all.  One archivist began creating checksums and storing the MD5 value in the ‘technical description’ field, which also stores various information about digitization.  This will allow future archivists to manually validate file fixity, but it cannot be automatically checked since the MD5 is mixed with other information instead of parsed into a unique field. These checksums are stored with the digital object, so they are also vulnerable to attack.

Here is where I explain my victory over the proprietary beast and describe the perfect system for file fixity…right?  

Sorry, but I’m not there yet. Just understanding the current situation feels like a victory at this point.  However I do have some thoughts on a possible solution.  Introducing a third storage location, true digital preservation storage could ease my worries about fixity and allow EMC to continue the great work of providing seamless access to our patrons.  

JFK systems (3)

If the preservation copies and associated metadata were sent to preservation storage our archivists could maintain control over digital holdings without affecting our user’s experience.  I’m still in the research phase of the project, so I haven’t determined a specific solution yet.  Don’t read too much into the cloud shape used in the diagram, cloud storage is just one option I’m looking into.  A third storage location would improve on our existing digital preservation program and allow archivists a higher level of authority over the digital archives.


NDSR Boston’s NEDCC Field Trip

nedcc-logoNDSR Boston took a field trip to scenic Andover Massachusetts this week to visit the facilities of the Northeast Document Conservation Center. Founded in 1973 by the State Libraries of the six New England states, NEDCC has become a leader in the preservation and conservation community providing the highest quality conservation services to cultural heritage institutions in the region. It is fitting that NEDCC’s facilities are located in a renovated historic mill building.

The facility is organized into three functional areas; the conservation lab, the imaging lab and the audio lab.

nedcc-conservation-labThe conservation lab addresses issues related to books, paper and photographs.  We met Todd who described three different treatments for manuscripts with binding issues. He explained that each project is considered in its own context which results in individualized treatments for each object. We also met with Amanda, a photography conservator. She took a moment to examine a few daguerreotypes of mine. She helped me date the photographs and gave me care and handling instructions.

xy-tableThe imaging lab performs preservation level imaging, creating digital surrogates of many types of objects, including objects being treated in the conservation lab.  Terrence and David explained the imaging equipment and NEDCCs approach to imaging. They explained that large production imaging and digitization is performed with sophisticated, high quality digital camera equipment. They also demonstrated their custom designed X/Y positioning table with a vacuum feature for holding materials in place. The table moves on two axes (X and Y), both front-to-back and side-to-side, beneath a stationary camera allowing the greatest flexibility in capturing all types of materials.

ireneThe audio lab captures sound with sophisticated new imaging technology called IRENE. IRENE uses 2D and 3D cameras to photograph the grooves in audio cylinders and discs as they rotate. The captured images are then processed by software that converts the images to sound. The revolutionary procedure allows the audio recordings to be captured without additional wear and tear to the objects. Audio can even be captured from cylinders and discs that have been broken into pieces. It is very cool. The audio lab is expanding and will soon be capable of digitizing audio from magnetic tape.

We also met with Frances and Eva who work in Preservation Services. They have an important role with perhaps a greater impact than their colleagues preserving objects in the labs. Preservation services are the outreach and educational arm of the operation. They are active in the community; fielding questions, performing preservation assessments, attending conferences and educating their stakeholders with workshops and webinars. They survey the community to keep abreast of what projects people are working on and what questions are being asked.

While NEDCC is focused on the conservation of paper-based collections, they have a growing digital presence. In addition to the imaging and audio labs, they are increasingly providing consulting services and assessments for digital materials and collections. They are aware of the importance of digital stewardship and noted that there are challenges to be overcome by traditional preservation administrators in embracing their role in digital preservation. Aspects of preservation and conservation that are common to both digital and traditional preservation are the importance of organizational support and the need for long-term planning and risk management.

If you live or work in the Boston area and you have never been to NEDCC, I strongly encourage you to organize a tour for your cultural heritage institution or local working group. I guarantee that you will not be disappointed.

Thanks for checking in.

Jeff Erickson