Ho Ho Ho, Merry World Day for Audiovisual Heritage!

The 27th of October, while being the very special Fourth Day Before Halloween, was also proclaimed World Day for Audiovisual Heritage by UNESCO as of 2005. This year’s theme for the day is “Archives at Risk: Much More To Do” – something you don’t have to tell us digital stewards twice!

Films, radio and television programs, oral histories, music performances – these and countless other audiovisual treasures hold many 20th and 21st century primary records. The Official Website for Audiovisual Heritage 2014 notes that “it is estimated that we have no more than 10 to 15 years to transfer audiovisual records to digital to prevent their loss” – though much has been lost already. And as digital preservation professionals understand, this not only means creating digital surrogates of the records but also designing the means with which to preserve these digital surrogates.

In celebration and awareness for this internationally recognized day, I thought I’d share a little more about MIT’s digital audio preservation project – particularly the cool-cat collection we are starting with. MIT’s Lewis Music Library is a subject-specific library popular with faculty, alumni, and students alike – in fact music is the second most popular minor here at MIT! My office is on the second floor, where on a given day I might see a student composing on computer software or hear the tinkle of piano keys from a performance downstairs, where bio-engineering majors who are piano virtuosos on the side stop by for a lunchtime performance. The Library offers some personal digitizing outlets as well, which you can read more about here.

In their special collections, the Lewis Music Library has 31 shelves full of recordings on reel-to-reel, audiocassette and videocassette tapes, phonographs, DAT tapes, and film. The impetus for my project was actually some funding for a specific digitization initiative that catalyzed the need for preservation and access to the content once it was transferred.

The first set of digital audio content we are testing in our workflow is a batch from the Herb Pomeroy collection. Herb Pomeroy was a jazz trumpeter and music educator from Massachusetts. In his early career, he played with such jazz luminaries at Lionel Hampton and Charlie Parker. In the 1950s, he put together his own big band, gaining national attention and playing at venues such as Carnegie Hall. Though he had an illustrious and influential career as a musician, he was also well known for his teaching career, helping to found the Jazz Workshop, teaching for 41 years at the Berklee College of Music, and becoming the director of the MIT jazz ensemble the Techtonians – later known as the Festival Jazz Ensemble – for 22 years. You can check out interviews with Pomeroy from the Lewis Music Library’s Oral History Project here.

Photo of Herb Pomeroy, in the Berklee College of Music Photographic Collection by Alma Berk, (1940’s - 1980’s), BCA-003. College Archives, Stan Getz Library, Berklee College of Music.

Photo of Herb Pomeroy, in the Berklee College of Music Photographic Collection by Alma Berk, (1940’s – 1980’s), BCA-003. College Archives, Stan Getz Library, Berklee College of Music.

The collection itself is comprised of recordings of performing groups he was coaching, as well as performances he did around town in big bands and smaller groups. He even played at the Chestnut Hill Mall fairly regularly! We will be digitizing a selected portion of the audio content and walking it through the workflow to test it and identify gaps. Simultaneously, we are evaluating Avalon Media System as a dissemination platform, so people everywhere can enjoy it once it is digitized.

I am delighted to be contributing to the preservation and access of such a cool collection because I think it’s these kinds of records that really underline the impact of the work we do as digital stewards and the significance of World Day for Audiovisual Heritage. Digitization and digital preservation not only ensure the endurance of our historical records and cultural heritage; they also mean expanded intellectual access – for everyone. And that is true step towards the democratization of knowledge. I mean, c’mon… this work is so cool, you guys.

Happy heritage holiday,
Tricia P

Becoming a digital archivist, in a nutshell

Jen here, finally chiming in on this blog. (Sorry for the long silence, friends!)

I’ve spent the past couple months swimming (read: drowning) in information about digital preservation, digital repositories, and born-digital material. More specifically, I’ve been gathering any and all information out there about ingesting digital materials into digital repositories, ideally with some kind of digital preservation component involved. I’ve also learned and played around with Northeastern’s soon-to-be-launched (well, soft launched) brand new digital repository system. I’m still trying to take all of it in, but I’m finally feeling a little more prepared to take on these projects.


A little bit of back story: As I was completing the application for the NDSR program in Boston, I needed regular pep talks from my (very patient) partner to keep at it. These pep talks were approximately every hour. See, aside from a couple of really excellent digital libraries / digitization courses I took, my MLIS program didn’t emphasize the technical side of librarianship, let alone the specific technical aspects of digital archives. The instructor for a class on metadata architectures had to revise the syllabus when he realized that none of us had been taught XML in our core, required library technologies class. In an archival arrangement and description class, our professor was horrified to learn that none of us had the faintest idea how to define the word “checksum,” let alone generate a fixity check. I was confident in my ability to learn the technical skills required of digital archivists / digital preservationists and had been fairly successful as a self-taught student of a number of things, but I felt underprepared for the residency, and so I questioned my ability to even apply. I eventually submitted the (oh so awkward) application video and supporting documents and crossed my fingers, and somehow, on a Monday in early June, I got an email offering me the NDSR position at Northeastern University. Cue equal amounts of excitement about returning to the east coast for such a great opportunity and panic about feeling so very underprepared for this kind of position. (Cue also devastation about leaving beautiful Colorado, warm fuzzies about being closer to family, uncertainty about starting my career on the technical side instead of the people-facing / teaching side, eagerness to build up the technical side of my brain. Basically, I had a lot of feelings.)

Fast forward two months and one cross-country road trip with two cats, my partner, and a precariously over-packed car to the first day of the NDSR immersion week. I’ll be honest and admit that I still felt pretty underprepared, but as we went through the Library of Congress’ DPOE modules, all of the disparate pieces of digital archives / digital preservation knowledge I’d picked up along the way started to make sense. Additionally, I got to know both my mentor at Northeastern and Northeastern’s archival collections a little better, and I immediately felt a strong ideological connection to this particular host institution. A lot of archives talk about incorporating “diversity” into their collections, but through my own research, I’ve found that this doesn’t always mean much. Northeastern’s commitment to documenting Boston’s underrepresented communities (e.g. African-American, Chinese, GLBTQ, and Latino communities) in their archives, though, is front and center in both mission and practice. I’d been worried that this residency would pull me away from the things I find most intriguing about archival practice (how we responsibly build truly inclusive, diverse collections), but as it turns out, the NDSR Boston powers-that-be did an excellent job in matching us to our host institutions. Now, instead of thinking of this as a 9-month break from the social justice side of archives, I’m thinking of this residency as a 9-month opportunity to build my technical skills in ways that are relevant to the kinds of collections I’m most interested in highlighting.

Fast forward another month and a half, and I’ve been burning through podcasts on the T for my daily one-hour-each-way commute to and from Northeastern. (Nerdette, TED Radio Hour, Radiolab, and Serial are my top favs, in case anyone was wondering.) The projects as initially described in Northeastern’s proposal may change slightly to more explicitly incorporate more concepts of digital preservation (like, for example, creating an actual digital preservation plan), but the first projects I’m working on are staying true to the initial proposal.

First, though, I’ve been in information-gathering mode. Think of a squirrel hoarding nuts for a long, cold winter. (I’m secretly crossing my fingers for a long, cold winter! I’ve missed them during my years in always-sunny Colorado!) That’s me, minus the bushy tail. This nut-hoarding currently looks like a mess of a Google doc with links to various case studies, success stories, failure stories, webinars, and digital archives white papers and manuals, but eventually, this will become a beautiful (informal) report that I’ll be able to refer back to all year. And likely in whatever job comes after this. And after that. And so on until the information is outdated.

I’ve also been gathering information on Northeastern’s systems in general: the new digital repository, the old digital repository, the management of our digital archives, etc. That, too, is turning into a short cheat-sheet/report, mostly for my own reference as I continue this residency.

(Fun fact: as I was looking for a squirrel cute enough for this blog post, I came across this science article that explains squirrels’ nut-hoarding tendencies as strategies for long-term savings. If squirrels are interested in long-term access (to food) and digital preservationists are interested in long-term access (to data), I think this means that squirrels are preservationists. Please don’t question my logic.)

Second on the docket is my first big project, which has already started (primarily in information-gathering/nut-hoarding mode).

Northeastern University has a strong digital humanities presence, and luckily, the digital humanities folks here seem to have a strong working relationship with Northeastern’s libraries. Our Digital Scholarship Group, for example, is located in the library (in the beautiful and recently constructed Digital Scholarship Commons) and the manager of Northeastern’s digital repository is technically under the auspices of the DSG. The DSG had a big hand in Northeastern becoming the digital home for the Our Marathon archive, a crowd-sourced digital archive that was created following the Boston Marathon bombing of 2013.

our marathon

Our Marathon was a joint effort with a number of community partners, and documented the community’s response to the bombing. As you might guess, a lot of this is emotionally difficult stuff, and there have been a lot of comparisons to, for example, the 9/11 digital archive. While the Our Marathon digital archive was built, thankfully, in consultation with archives staff (which means that we have things like meaningful metadata and some documentation of donated materials; as digital humanities projects go, we’re in pretty ok shape), it was built on an Omeka platform without an explicit plan in place to eventually pull it into the archives’ digital collections. Basically: we have it, but we don’t have it in a stable location that will allow for long-term access. Yet.

That’s where I come in. Once the new digital repository is up and running, we’ll need to ingest all of this born-digital material and its accompanying metadata into the DRS. There are a ton of different file formats (audio, various formats of text, images), for one, and while there’s thankfully a ton of useful metadata, it’s all in Dublin Core. The DRS uses MODS. The metadata also doesn’t always reflect technical or administrative information – like, for example, all of the rights information we need. All of these agreements exist somewhere, but tracking them down will be a task. We’ve also determined that there is useful metadata that Omeka supports (e.g. geotagging) that may or may not translate to the DRS. And, of course, there’s the matter that Our Marathon, as it stands on Omeka, is currently being supported on a volunteer basis by a doctoral student whose time to dedicate to this project is pretty limited.

My task, as I initially understood it, was going to be so easy! Just create a workflow for ingesting this already-full-of-metadata digital collection from one platform to another. Sure! Simple! Workflows are fun! (Nerd confession: I do genuinely enjoy creating workflows.) Oh, the sweet naivete of two-months-ago-Jen.

Stay tuned for how this all shakes out.

Signing off for now,

Free Code, Free Beer and Free Puppies

Shira Peltzman over at NDSR NY has already written up an excellent post about AMIA’s inaugural open-source track this year, but since it so heavily informs the work that I’m doing at WGBH, I wanted to take a little bit of a closer look at some of the themes that were running through this section of the conference.

(…ok, maybe I just wanted an reason to post a lot of puppy gifs. But we’ll get there.)

To recap, open-source software is built upon essentially free code — it’s available publicly and collaboratively developed, so anybody can, in theory, download it, implement it and improve upon it. Hack Day, which Joey and I posted about last week, is supposed to result in the development of open-source tools that can be freely used by the community. (The main site used for collaborative coding is GitHub, and knowledge of how to use GitHub is almost essential for working in the open-source community; the track this year actually started with a demo from LoC’s Lauren Sorensen about how to dip your feet into the GitHub waters. GitHub’s tools for submitting comments and changes to a project and tracking their implementation can actually be pretty useful for other things besides code — the PBCore Committee, for example, is using GitHub to receive comments and review updates to the PBCore metadata standard — but that’s a whole other story.)

Anyway, there are a lot of reasons why open-source is a pretty great thing for archives. For one thing, the fact that the code isn’t locked away behind a proprietary license makes it much more likely that people ten, twenty or fifty years in the future will be able to figure out how it worked and how to recreate or emulate it — reassuring, if you don’t want to run the risk of losing content due to obsolescence, or of uploading a bunch of material and metadata into a system that you can’t then get it back out of. Additionally, open-source technology provides a lot more opportunities for archives to customize and control their own preservation solutions; as an archivist, it’s always fairly unnerving to feel like the survival of your content is completely in somebody else’s hands.

Open-source also tends to sound like a great solution financially for archives. Who wants to pay software licensing fees when you can just download code from GitHub that will do the same thing for free? However, this is where it gets a little tricky. WGBH’s Karen Cariani, in one of the most quotable moments of the open-source stream, explained it like this:

What people expect from open-source software is something like free beer.

Free beer is so close!  Yet … so far ….

However, what you actually tend to get is more like a free puppy.

This puppy just wants to be your friend

At this point you may be thinking, ‘but puppies are great! Way better than beer!’ It’s true, puppies are pretty great. They’re cute and they’re cuddly and when you contribute to their support you get a warm feeling inside of generally doing the right thing for the universe. Still, when someone gives you a puppy, it’s not exactly ‘free’ — now that you’ve got the puppy, you have the responsibility of shelling out a significant amount of cash on food, equipment, vet’s bills … not to mention the responsibility of housetraining it, taking it for daily walks, and cleaning up after it when it forgets all the training you gave it and pees on the floor. And this is all now going to be your job for pretty much the rest of the puppy’s lifespan.

Puppy dramatically fails to catch an easy throw

That’s basically what open-source software is like. You’re getting the initial code for free, and that’s pretty great — but once you’ve got the code, getting it to work is going to involve a significant amount of time, and probably a significant amount of money as well. When you work with a proprietary software company, training the dog and walking it and taking it to the vet (in other words, customizing it, updating it and checking it for bugs) are all part of the company’s job; you’re paying them to take care of all that hassle for you. If you’re jumping on the open-source train, either you then have to hire someone else to make your open-source software behave — and a lot of open-source companies fund themselves by hiring out developers to do that — or figuring out how to do it becomes your job. And if you’re an archivist in a financially-strapped institution, odds are you’re already doing at least two jobs.

The idea here isn’t to discourage people from using open-source tools; far from it! Karen Cariani made this analogy as part of her presentation about WGBH’s decision to work with Hydra, an open-source repository solution that’s being adopted by a number of large archives. (I talked about this a little in my post on change management, too.) All the great reasons that I mentioned above for archives to invest in open-source software remain really solid reasons to invest in open-source software. It’s just important to be aware that it is an investment, and not go in expecting to get a lot of exciting something for nothing.

The thing about open-source software, though, is that the more people become aware of the options, and start talking about them and using them and documenting them and contributing them, the better and easier they all become for everybody. The power of open-source comes from an informed community. The importance of AMIA’s open-source track this year wasn’t even so much about the actual tools presented, although of course there were a lot of fantastic open-source tools presented (in addition to Hydra, the WebVTT standard for time-aligning metadata with web-streaming content got a lot of buzz, and I’ll never pass up an opportunity to give the QCTools project a shout-out, since it’s going to be a godsend for anyone whose job involves error-checking digital video files.) But specific projects aside, in order to be part of the open-source community, it’s important to really understand what open-source is, and what it means — and the frank and open discussions about open-source at AMIA this year played a huge role in broadening that understanding.

– Rebecca (who does really like free puppies)

Keeping Change Afloat

A week or two after finding out that I was going to be heading up to Boston as the NDSR resident for WGBH, I got an email from NDSR DC alum Erica Titkemeyer, asking if I would use the experience from my project to fill in an empty slot for her panel “Sailing the Ship: Supporting and Managing Change at Large Institutions” at the Association of Moving Image Archivists 2014 conference in Savannah, October 7-11.

WGBH is certainly a large institution, and my projects definitely involve change management – the Media, Library and Archives department is squarely in the middle of the process of switching over from a proprietary Artesia digital asset management system to an open-source Hydra-based repository system. One of my responsibilities over the next nine months consists of streamlining and documenting the process of ingesting materials into the DAM during this transitional phase.

…the only problem was that the panel was scheduled for early in October, at which point I still wouldn’t have actually done much of any of that yet. As a result, I’ve spent the lead-up weeks to AMIA pestering most of the WGBH team for their accumulated wisdom about the process of envisioning and implementing a major change in their digital archives management. Here’s the very, very boiled-down version of what I’ve learned so far.


As a major archive operated out of a production institution that does not, in and of itself, has a mission to preserve, the Media, Library and Archives Department often finds itself trying to walk the line between competing directives and obligations. The decision to switch over to an open-source HydraDAM system was a deliberate choice to adopt a system that would serve the functions of an archive without trying to be all things to all people within the broader framework of WGBH. The expiration of the license on the old Artesia system served as catalyst for the switch. Once the combined cost of renewing the license and adding back in all of the special features required for system functionality was taken into account – not to mention the risk of continuing to rely on the old LTO 4 tape robots used by the Artesia system, which would become obsolescent relatively soon – the archival team was able to make the case for adopting an entirely new system without licensing requirements, and having their own LTO 6 decks that could be departmentally controlled without having to rely on the broader WGBH infrastructure.


  1. Cost of development. Open-source tools and systems might not have licensing requirements, but that doesn’t mean they’re free – this was a big theme at AMIA this year, and I’m planning to do a separate post on it later, but for the time being let’s just say that it takes a lot of time and effort to customize an open-source system
  2. Staff turnover partway through the process. Major shifts take a long time to complete, and in a large institution there’s good odds that at least one key person is going to end up rotating out before it’s all over, leaving a gap in planning. At WGBH, the original developer on the process ended up leaving at about the same time the project was begun. It took a year for the staff transition to be complete and the project to get back up to speed.
  3. Institutional policy shifts midway through. Again, the amount of time that these projects take to implement can be real problem – especially when you’re dependent on approval from a larger infrastructure with different priorities than your own departmental goals.
  4. Staging material in transition. Right now, WGBH is no longer using Artesia since the expiration of the license – but the HydraDAM system won’t be up and running for another few months. Making sure all the content stays safe and accessible during this period means a lot of extra effort and temporary workarounds, not to mention a backlog of material.
  5. Resistance to change. Learning a new system is always difficult, and the more people required to learn the new system, the more complicated this becomes. Just next week, WGBH will be doing some staff tours through the new systems to help get everyone comfortable with them and help make the transition easier.

There are no real solutions to a lot of these problems – they’re all part and parcel of the price of change in a digital archive – but the WGBH team did have some tips to share about smoothing the way.


  1. Communication is key. This isn’t going to be news to anyone, but it bears repeating. Specific advice included talking personally with everybody involved, taking the time for reassurance (…including reminding people of the shortcomings of the old system, when necessary), saving the technical details for people who needed to know, being nice to your developers, and feeling comfortable asking the community for help.
  2. Minimize front-end change when possible. Obviously this isn’t always possible, and at WGBH it’s probably going to end up happening in stages; still, if you have a large user community that needs to interact with the system, it can be a good strategy to keep change on the back end, where it’s not as scary.
  3. Don’t over-plan the process.   Any major change is going to involve difficulties and delays; while having a detailed solid project management plan can feel comforting, it also becomes stressful when the inevitable delays occur. WGBH opted for a looser high-level plan, with room built in to adapt to shifts in the schedule.
  4. Be OK saying goodbye. I’ve heard the number five years bandied around a few times when estimating how often digital asset management systems will need to be upgraded. Change is going to keep occurring, so it doesn’t pay to get too attached to any one way of doing things, or too frustrated with the constant need to go through the process of adaptation. In the long run, it’s the content that needs to be permanent – not the systems that manage it.

So that’s pretty much what I spoke about at AMIA (minus all the gratuitous Pirates of the Caribbean gifs; I mean, we did have ships right there in the title.)

The session also included fantastic presentations from Erica Titkemeyer, now the AV Conservator at UNC Chapel Hill’s Southern Folklife Collection, and Crystal Sanchez, the Digital Video Specialist for the Smithsonian’s Digital Asset Management system. Erica is in the process of getting the Southern Folklife Collection’s A/V materials digitized and included in Chapel Hill’s DAM system, and Crystal has spent the past year overseeing the implementation of an updated Artesia system throughout the Smithsonian. If they put their slides up, I’ll come back and link them here; they’re definitely worth checking out.

We ended the session at AMIA by asking if anybody else had advice or thoughts they wanted to share about their own change management processes, so I’ll do the same thing here — input is definitely welcome! In the meantime, Rebecca Fraimow, signing off.

Migration – What’s in a word?: Talk Versus Action

Week 3 is already coming to a close and like Tricia I am swimming in a sea of information – some tried-and-true best practices in digital preservation and some new approaches towards making it actually work. At Harvard Library, my host for the NDSR, we are grappling with formats migration within Harvard’s Digital Repository Service (DRS to us acronym-inclined archivists), understanding that though the topic of migration has been oft-discussed in the field that few sustainable solutions have emerged for ensuring long-term access to digital objects across forms.

To back up for a moment, migration is the process of updating analog and digital objects to keep up with the ever-changing technological landscape, knowing that though the objects themselves might not deteriorate, the means and technologies for viewing and experiencing them often do. Migration has been a popular preservation action within libraries and archives for quite some time, and digital migration (from one digital format to another) has been met with some hand-wringing for several decades. Many studies have noted the loss of significant properties of an object across iterative migrations, losing some of the defining features of the original format such as color space, fonts, timing, and interactivity, to name a few. Without going into too much detail on other processes of access such as emulation (recreating the original environment and external dependencies for the object), there are ongoing debates over what properties of the object should be the focus for preservation. However, everyone can agree that each format comes with its own special challenges and there is no monolithic way to preserve everything with a single click of the mouse. While Harvard is looking to institute a broad workflow and framework for file formats migration, my project focuses on how to implement this while principally being concerned with the needs of three specific, now-obsolete formats — Kodak PhotoCD, RealAudio, and SMIL Playlists.

Before diving into these three formats, I started my work by flinging myself madly into the immense body of literature around digital migration and how many institutions are doing it. Perhaps the first challenge that I came up against was knowing when to distinguish theory from practice. Many new solutions have been brought forth for any number of identified gaps in the workflow – monitoring obsolescence, creating agnostic containers for unsupported/undocumented formats (e.g. XML-based), implementing tools irrespective of its configurability across a repository workflow. While it was at many times tempting to discover such a resource and think “Eureka!”, I had to apply a fair degree of skepticism, particularly if certain sources were a few years old and the respective tools were, as of yet, largely unused in any real workflows. Nonetheless, I was able to compile a bibliography and begin to map the core arguments of these resources to a hypothetical workflow that considers many possible migration strategies based on the specific challenges of the format (I will be finalizing this workflow map throughout October). One end of the workflow that, at this point in the research, seems most in need of refinement are tools for identifying and validating formats – a significant step in the process as you decide whether a format is indeed what it says it is. Identifying and validating a file can empower the digital steward to begin to check off what significant properties will be the focus of preservation for that specific format and determine the best tools and services (that means people too!) for doing the job. For example, Jplyzer is a tool for validating JPEG2000 images, ensuring that the compression algorithms, header info, color space, etc., all comply with the standard. Once a file has been successfully validated it can be passed off to the next step in the workflow (determined based on the format requirements) to ensure that significant properties are taken into account, for example performing a comparison post-migration against the original format through QA tools (e.g. ImageMagick).

While tools for identifying (DROID) and Validating (FITS does both) exist for these purposes, this only covers a finite number of formats which are generally already well-defined and documented. Herein lies the problem…what happens to all those rare obsolete “orphaned” formats? Above that, determining the extent to which tools and the decisions around which ones to use for what format is a manual or automatic process is also a major point for consideration in workflow design. Given the mass of material within a digital repository, a trustworthy tool automating this process is desired, for example Plato or Tarverna. However, more research will need to be conducted to consider the existing architecture of the DRS and the stakeholders involved throughout the process (Tricia’s example below from MIT diagrams this nicely). This just goes to show that every institution is at a different place in solidifying these workflows, and there is not necessarily one model institution that has everything figured out (this is, of course, an ongoing process and no two institutions function –or collect – alike).

All these points are merely considerations as this stage as we look at what solutions could be applied for a more transparent and streamlined migration process. Next steps are delving deeply into the innards of the DRS and crystallizing how the various administrative pockets inform preservation at Harvard. As was noted in my initial research, tools that solve one part of the problem are great, but this doesn’t always guarantee configurability with the existing systems and processes. Sometimes the discovery of new solutions brings up new problems, but that’s what research is all about!

-Joey Heinen