Code4Lib

Last week I attended my first annual Code4Lib meeting in Philadelphia. Code4Lib started in 2003 as a mailing list and has since grown to a thriving community of hackers, cataloguers, designers, developers, librarians and even archivists.  This year was the 11th annual conference and there was a significant online presence, including an IRC, slack channel, and the hashtag #c4l16. All presentations and lightning talks from the conference were streamed live and the videos are still available on the Code4Lib YouTube channel.

c4llogo

Code4Lib 2016 Annual Conference logo

The week started off with a day of pre-conference workshops. I attended the Code4Arc workshop which focused on how coding and tech is used slightly differently in the archives world. Since archives have different goals and use different descriptive standards it makes sense to carve out a space exclusive to archival concerns. One common interest was in how multiple tools are connected when they’re implemented in the same archive. Many attendees were implementing ArchivesSpace to handle archival descriptions and were concerned about interoperability with other tools. Another concern was regarding the processing and management of hybrid collections, which contain both analog and digital material. Digital is often addressed as completely separate from analog, but many collections come into the archive containing both and that relationship must be maintained. Archivists in the workshop called for tools to be inclusive of both digital and analog holdings, especially in regards to processing and description.

I joined NDSR-NYC alum Shira Peltzman to kick-off the presentation part of the conference with a discussion of Implementing ‘Good Enough’ Digital Preservation (video here). The goal of our presentation was to make digital preservation attainable, even for those with limited support. We began with a brief overview of the three tenets of digital preservation – bit preservation, content accessibility, and ongoing management- before diving into specific resources and strategies for implementing good enough digital

c4l

Shira and myself presenting on “Good Enough” Digital Preservation

preservation. We defined ‘good enough’ as the most you can do with what you have – based on available staff and budget, collection needs, and institutional priorities. The main offerings from our presentation were expansions on the NDSA Levels of Digital Preservation. We mapped each recommendation to useful tools, resources and policy recommendations based on our experience in NDSR and beyond. We also proposed an additional level to address the access issues related to digital preservation, such as redaction of personal information and making finding aids publicly accessible. Since the NDSA levels are such a common tool for getting started with digital preservation, we hope that these additions will make it easier to move to the next level – no matter what level you are currently at.

Our talk ended with a call for engagement with our NDSA level additions and more generally, to share policies and workflows with the community. The call for shared documentation was a common thread through many presentations at the conference. Dinah Handel and Ashley Blewer also discussed this in their talk “Free Your Workflows (and the rest will follow)” (video here). They made a great point about why people don’t share their documentation – because it’s scary! There’s the constant battle against imposter syndrome, fear of public failure, not to mention the fear that as soon as a policy is polished enough to share widely it is also outdated. All of these are very real reasons to hesitate, but the advantages that come from shared documentation severely outweigh these reasons. And nowhere is this more true than in the realm of open source solutions. Open source projects often rely on the community to notify them of bugs and help create complete and accessible documentation. Shared policies and workflows help to build that community, and help the field understand how tools and strategies are actually implemented.

If you are reading this and thinking – I have documentation that I could share but where would I put it? Worry not! There are a few great places for sharing open documents.

Scalable Preservation Environments (SCAPE) collects published digital preservation policies

Library Workflow Exchange collects all library-related workflows, including digital preservation.

Community Owned digital Preservation Tool Registry (COPTR) is a wiki for all digital preservation tools, and provides a space for users to present their experiences with any given tool. This information is automatically pushed to the interactive tool grid created by Preserving digital Objects With Restricted Resources (POWRR).

Github is known as a repository for code, but it can also be a great storage option for workflows and documentation. It is especially useful for managing version control for live documents.

Do you know of another place to share digital preservation documentation? Let us know in the comments!

 

Advertisements

Digital Preservation UnConference

This Tuesday we hosted our first Digital Preservation UnConference at the John F. Kennedy Presidential Library.  We had a great turnout from a number of institutions around Boston and the larger New England community.  A range of  topics around digital preservation were discussed from social media and web archiving to wrangling data for a system migration.

As you may know, part of the residency program includes hosting an event at your institution.  At the JFK Library we immediately knew we wanted to host a public unconference.  This actually came up in discussions between myself and my host mentor, Erica Boudreau, before I even arrived in Boston.  I had been to a few digital humanities and library themed unconferences and I was excited to see how this format could be used to address the issues specific to digital preservation.

In planning for the event we created a wordpress site, including an UnConference 101, registration information, and directions to the event.  We used the website commenting function to allow attendees to propose sessions ahead of time.  We also created a twitter handle, @jfkdigipres for sharing updates and event information.

audience

Attendees get ready for the day of UnConference-ing!

The day started with brief opening remarks by myself, followed by session proposals from the attendees.  Then we broke for coffee and attendees voted on the proposals.  Once voting was over, myself and a few dedicated volunteers entered session proposals into the schedule.  And with that, the sessions were underway!

Volunteers and attendees collaborated on community notes which recorded the main points and resources discussed in each session.  If you couldn’t make it to the event, or are curious what happened in the sessions you missed, I highly recommend checking out these collaborative notes.  There are some great tools and ideas discussed there.

Since I’m currently looking into plans for a potential system migration, I led a discussion on content migration for digital preservation.  It was great to hear how others are and have dealt with large-scale migrations like this.  There was a great point made about how a system migration is iterative and you always have to keep an eye on the horizon because your current system might lose support or fail to meet your collection requirements in the future.

One twitter user said…

Fellow resident, Jeff Erickson, led a discussion on preparing to use Archives Direct, a tool he’s currently researching for preserving content collected through the Mass. Memories Road Show.  They also discussed the evolution of the tool Archivematica and the necessity of exit strategies when working with cloud storage providers.

The last session I attended was on personal digital collections and how public history is changing in the digital world.  Now that few people are writing physical letters, how will day to day communication be preserved in the future?  Will twitter accounts and email inboxes be included in future donations of personal collections? There were differing opinions on who is responsible for preserving these kinds of collections.  Historical societies and community archives have traditionally taken on these roles, but with limited staff and technical expertise can they continue doing so in the born-digital world?

The event had a strong presence on twitter, where tweets were shared with the hashtag #jfkdigipres.  We collected these tweets through a storify page so we can preserve this discussion around digital preservation.

Overall the event was a great success!  I hope the conversations started here will continue both online and through future digital preservation events.

 

20160223_161518

National Digital Stewardship Residents, past and present, in front of the John F. Kennedy Presidential Library

 

Midyear Review and Reflections

With the mid-point of our National Digital Stewardship Residencies just around the corner I thought this would be a good time to reflect on the first half of my residency. This program has been an incredible experience as a new graduate, fresh out of library school – I graduated two weeks before moving to Boston, so really fresh. I’ve learned a lot already, and I look forward to five more months at the John F. Kennedy Presidential Library.

Having specialized in digital curation during school I felt like I had a strong understanding of digital preservation. However, as in most fields, things are a little different once you start work in an institution. Making plans for a digital preservation program is difficult even with years of institutional knowledge, coming in as a short term resident makes it that much harder. My first month was focused on getting to know the systems and their idiosyncrasies, interviewing staff about procedures, and reading all the documentation, past and present. It was important to look past how we do things now to see the history behind these decisions and consider the way procedures could be improved. So much of documentation is about capturing decisions so you don’t have to guess at the reasoning behind current procedures.

Despite the difficulties in assessing an institution as a new archivist, there are significant benefits to this position. As a new hire (and new to the field in general) residents can be more objective in their assessments. They may be more likely to question previous decisions since they weren’t there for the original decision making. 

I came to the JFK Library equipped with my digital preservation training and a ton of questions. Some of which were easily answered – Where’s the coffee? – and some of which I’m still working on – Which digital preservation storage option best fits our requirements? I recently completed my gap analysis and I’m currently researching possible paths for moving forward. More project updates from myself and the other residents will be presented at the Mid-year Event and I will make my slides available soon after.

In writing for the NDSR Boston blog I’ve discovered that writing about my work forces me to interrogate my own process and decisions. Since reflection and self-assessment is vital to improvement, I decided to write about archival work more often. I created a personal blog to help me achieve this goal and I have been regularly posting project related tidbits, workshop recaps, and more general insights on libraries/archives. If you are just starting out in the field I highly recommend writing about your work as much as you can, professional articles and presentations to informal blogs – it’s worth it.

Residents aren’t just focusing on projects though. 20% of our time is dedicated to professional development activities. We recently presented brief project updates to the Preservation Administrators Interest Group at ALA MidWinter. I’ve also attended more informal meetings like the New England Code4Lib and the metadata themed #Mashcat event. It’s been great to step out of my digital preservation niche and learn about different kinds of digital library projects. I’m also currently working on a presentation for the national Code4Lib meeting, to be held in Philadelphia in March. I will be co-presenting with NDSR alum, Shira Peltzman on implementing ‘good enough’ digital preservation, because you can’t wait for the perfect situation to get started. 

Now that I’ve been to all these events it’s time to host one of my own. Each resident is responsible for organizing some kind of event at their host institution. We have decided to make our event a public Digital Preservation UnConference to be held on February 23rd. All the current residents and mentors will be in attendance but we want to have discussions beyond NDSR to include everyone in the digital preservation field. I’m excited about the unconference model because the program is decided on by attendees. If there is something you want to talk about, but no one else has proposed a session – propose one yourself! More information for proposals and registration is available on the website.

File Fixity and Proprietary Systems

Recently I’ve been wrapped up in finding and reading documentation for our systems at the John F. Kennedy Presidential Library.  Before I begin writing my own policies, I need to know what documentation is out there.  One challenge I’ve faced in collecting digital preservation documentation is the secrecy surrounding proprietary software and hardware.  Sometimes the barrier is poorly shared or outdated documentation rather than intentional secrecy.  This post will go over some of the challenges I’ve faced in addressing file fixity with proprietary tools.  To start off, here is a diagram of the tools used for management and storage in our digital archive

JFK systems (2)

The JFK Library uses a digital asset management system (DAMS) called Documentum, created by EMC.  Documentum is mainly used by businesses for managing records and internal documents.  Although it was not created for digital archives, it has some clear advantages for managing a digital repository as large as ours.  We currently store over 70 terabytes and that number is growing as we continue to digitize with the goal of making all the presidential papers available online.  Documentum has excelled at making the digital archive accessible to novice JFK enthusiasts and experienced researchers alike.  

However, a system built for improved access may not be an ideal solution for digital preservation.  The storage system, Centera uses a tool called MD5scrubber to ensure file integrity.  The tool creates an MD5 checksum for each blob.  Right now you might be thinking “Blob?  Does she mean file or folder?” No…I mean blob.  Centera stores the bit sequence for each digital file as a blob.  Blob stands for ‘Binary Large Object’ (although depending on who you believe that acronym may have been invented after the fact because people felt unprofessional saying ‘Blob’ all the time).  The MD5scrubber regularly validates the MD5 checksum for each blob and if the checksum has changed, the blob is moved for review by EMC.  If the file has been corrupted, it is then recreated from the mirror copy at Iron Mountain.  Creating and automatically validating checksums is important for verifying that a digital object has not been altered or corrupted.  

What worries me about this tool is that there is no input from the staff here at the Library.  Archivists are not even alerted if a file is corrupted. EMC staff maintains a seamless experience, despite file corruption and re-creation.  This is great for users who want immediate access to the digital archives, but as an archivist… I have trust issues.  I want our archivists to know how and when checksums are created, stored, and validated.  I also want our archivists to be able to manually validate a checksum when they believe it necessary.  Since Centera controls the checksums behind the scenes our staff can’t view or manage file fixity information through Documentum.  We also can’t confirm that the fixity information is stored separately from digital object, a practice recommended by TRAC to protect against malicious attacks.

Until recently the staff here at the JFK Library was unsure if Centera or Documentum were managing file fixity at all.  One archivist began creating checksums and storing the MD5 value in the ‘technical description’ field, which also stores various information about digitization.  This will allow future archivists to manually validate file fixity, but it cannot be automatically checked since the MD5 is mixed with other information instead of parsed into a unique field. These checksums are stored with the digital object, so they are also vulnerable to attack.

Here is where I explain my victory over the proprietary beast and describe the perfect system for file fixity…right?  

Sorry, but I’m not there yet. Just understanding the current situation feels like a victory at this point.  However I do have some thoughts on a possible solution.  Introducing a third storage location, true digital preservation storage could ease my worries about fixity and allow EMC to continue the great work of providing seamless access to our patrons.  

JFK systems (3)

If the preservation copies and associated metadata were sent to preservation storage our archivists could maintain control over digital holdings without affecting our user’s experience.  I’m still in the research phase of the project, so I haven’t determined a specific solution yet.  Don’t read too much into the cloud shape used in the diagram, cloud storage is just one option I’m looking into.  A third storage location would improve on our existing digital preservation program and allow archivists a higher level of authority over the digital archives.

 

Policy Planning from iPres

I’m writing to you from the 12th International Conference on Digital Preservation (iPres) in Chapel Hill, North Carolina.  It’s been heartening to talk to other digital preservationists and see that we are all facing the same problems. It was also great meeting all my #digipres Twitter icons in person.

Lanyard with USB

The lanyards came with a hidden USB drive containing all the papers and session abstracts.

My favorite session so far was the Policy and Practice Documentation Clinic organized by Maureen Pennock and NDSR’s own Nancy McGovern.  Recently at the JFK Library I have been creating a framework for our digital preservation policy.  It has mostly consisted of reading other institution’s policies and stealing all the best stuff – with a plan for attribution of course.  If you want to follow in my thieving footsteps SCAPE put together a collection of published preservation policies and is continuing to collect policies as they are submitted.

The Policy and Practice Clinic taught me the importance of taking your time and not trying to create every policy and procedure in one go.  And with this new knowledge, a new plan!

  • Create a Digital Preservation Principles – What are we dedicated to?  What are the principles that stand behind our digital preservation program?
  • Run the principle statements by key stakeholders  – These are the administrators that provide funding, the IT team that will implement technology, and the archivists who will perform the preservation actions.  I need their help if this policy will be implemented past my 9 months.  It’s important to include key stakeholders early and often.
  • Write what digital preservation actions are happening now – It’s vital to understand what is happening to preserve digital content now in order to address gaps.  It’s much easier to say what you plan to do once you know what you are currently doing.  And speaking of plans…
  • Start writing a Digital Preservation Plan – Nancy McGovern made a great point about the difference between a policy and a plan.  A policy is ‘what we do’ and a plan is ‘what we will do.’  Since my focus is to improve digital preservation at the library, the plan is my first priority.
  • Go back to those key stakeholders- remember when I said ‘early and often’?
  • Create Procedure Documents – We need to figure out how we are going to live up the principles I’ve laid down.
  • Make sure the procedures are realistic – Who can tell me if it’s realistic? You guessed it, the key stakeholders!  

I’m sure this plan will require tweaks and updates as I go, and that is why I want stakeholders input at every level.  I’m only here for nine months (two of which are already behind me) so it’s incredibly important that they are invested in carrying the digital preservation torch after I’ve left.

After the presentations by Nancy and Maureen we split into groups and commiserated over our digital preservation woes. I learned that if your institution doesn’t like the word ‘preservation’ call it ‘long term access.’  Whatever it takes to get the buy-in.  My favorite idea was to make the technology enforce the digital preservation policy for you.  People are much more likely to perform these preservation tasks if the system doesn’t give them a choice.

Maureen Pennock was also kind enough to tell us about a new development in the digital preservation world.  The Digital Preservation Handbook is getting an update!  Keep an eye out because the new handbook is coming in April 2016.

If this post was not enough iPres for you, you’re in luck!  Community notes were taken through Google Docs and they are available here.  And you can always read the iPres twitterverse by searching #ipres2015.

NDSR Residents Take the Stage

Last week the residents were invited to present at the Annual Meeting of the New England National Digital Stewardship Alliance.  After only one week at our host institutions we were faced with the daunting task of introducing ourselves and explaining our projects.  Although it was intimidating at first, I think it gave us all the motivation to jump into the deep end of our project.  We needed to learn as much as possible in those first days, there was no time to waste.  We fit a lot of information into each of our five minutes; introducing our personal background, the host institution’s context, and the scope of our project.

NDSR Residents on stage

After the nerves of our own presentation subsided we participated in the ‘unconference’ sessions proposed the morning of the conference and voted on during lunch.  I joined “Implementing Practical Preservation Practices” not only because of the great alliteration, but also because this directly applies to my task of creating a digital preservation policy at the JFK Library. I learned so much from the other attendees, twelve people sitting around a tiny table throwing out ideas and asking each other to share experiences.  I love the unconference model because every session is decided on by the attendees so you know there’s passion for the topic.  I walked away with a long list of articles and tools to research as well as some new ideas on how to address digital storage, file fixity, and other preservation issues.

I think the best piece of advice I received from the conference is to create a vision for what you want to do and let the policy and practice follow.  You can tell I was pretty excited about this idea from my notes.

Picture of Notes "create a vision for the policy"