Becoming a digital archivist, in a nutshell

Jen here, finally chiming in on this blog. (Sorry for the long silence, friends!)

I’ve spent the past couple months swimming (read: drowning) in information about digital preservation, digital repositories, and born-digital material. More specifically, I’ve been gathering any and all information out there about ingesting digital materials into digital repositories, ideally with some kind of digital preservation component involved. I’ve also learned and played around with Northeastern’s soon-to-be-launched (well, soft launched) brand new digital repository system. I’m still trying to take all of it in, but I’m finally feeling a little more prepared to take on these projects.


A little bit of back story: As I was completing the application for the NDSR program in Boston, I needed regular pep talks from my (very patient) partner to keep at it. These pep talks were approximately every hour. See, aside from a couple of really excellent digital libraries / digitization courses I took, my MLIS program didn’t emphasize the technical side of librarianship, let alone the specific technical aspects of digital archives. The instructor for a class on metadata architectures had to revise the syllabus when he realized that none of us had been taught XML in our core, required library technologies class. In an archival arrangement and description class, our professor was horrified to learn that none of us had the faintest idea how to define the word “checksum,” let alone generate a fixity check. I was confident in my ability to learn the technical skills required of digital archivists / digital preservationists and had been fairly successful as a self-taught student of a number of things, but I felt underprepared for the residency, and so I questioned my ability to even apply. I eventually submitted the (oh so awkward) application video and supporting documents and crossed my fingers, and somehow, on a Monday in early June, I got an email offering me the NDSR position at Northeastern University. Cue equal amounts of excitement about returning to the east coast for such a great opportunity and panic about feeling so very underprepared for this kind of position. (Cue also devastation about leaving beautiful Colorado, warm fuzzies about being closer to family, uncertainty about starting my career on the technical side instead of the people-facing / teaching side, eagerness to build up the technical side of my brain. Basically, I had a lot of feelings.)

Fast forward two months and one cross-country road trip with two cats, my partner, and a precariously over-packed car to the first day of the NDSR immersion week. I’ll be honest and admit that I still felt pretty underprepared, but as we went through the Library of Congress’ DPOE modules, all of the disparate pieces of digital archives / digital preservation knowledge I’d picked up along the way started to make sense. Additionally, I got to know both my mentor at Northeastern and Northeastern’s archival collections a little better, and I immediately felt a strong ideological connection to this particular host institution. A lot of archives talk about incorporating “diversity” into their collections, but through my own research, I’ve found that this doesn’t always mean much. Northeastern’s commitment to documenting Boston’s underrepresented communities (e.g. African-American, Chinese, GLBTQ, and Latino communities) in their archives, though, is front and center in both mission and practice. I’d been worried that this residency would pull me away from the things I find most intriguing about archival practice (how we responsibly build truly inclusive, diverse collections), but as it turns out, the NDSR Boston powers-that-be did an excellent job in matching us to our host institutions. Now, instead of thinking of this as a 9-month break from the social justice side of archives, I’m thinking of this residency as a 9-month opportunity to build my technical skills in ways that are relevant to the kinds of collections I’m most interested in highlighting.

Fast forward another month and a half, and I’ve been burning through podcasts on the T for my daily one-hour-each-way commute to and from Northeastern. (Nerdette, TED Radio Hour, Radiolab, and Serial are my top favs, in case anyone was wondering.) The projects as initially described in Northeastern’s proposal may change slightly to more explicitly incorporate more concepts of digital preservation (like, for example, creating an actual digital preservation plan), but the first projects I’m working on are staying true to the initial proposal.

First, though, I’ve been in information-gathering mode. Think of a squirrel hoarding nuts for a long, cold winter. (I’m secretly crossing my fingers for a long, cold winter! I’ve missed them during my years in always-sunny Colorado!) That’s me, minus the bushy tail. This nut-hoarding currently looks like a mess of a Google doc with links to various case studies, success stories, failure stories, webinars, and digital archives white papers and manuals, but eventually, this will become a beautiful (informal) report that I’ll be able to refer back to all year. And likely in whatever job comes after this. And after that. And so on until the information is outdated.

I’ve also been gathering information on Northeastern’s systems in general: the new digital repository, the old digital repository, the management of our digital archives, etc. That, too, is turning into a short cheat-sheet/report, mostly for my own reference as I continue this residency.

(Fun fact: as I was looking for a squirrel cute enough for this blog post, I came across this science article that explains squirrels’ nut-hoarding tendencies as strategies for long-term savings. If squirrels are interested in long-term access (to food) and digital preservationists are interested in long-term access (to data), I think this means that squirrels are preservationists. Please don’t question my logic.)

Second on the docket is my first big project, which has already started (primarily in information-gathering/nut-hoarding mode).

Northeastern University has a strong digital humanities presence, and luckily, the digital humanities folks here seem to have a strong working relationship with Northeastern’s libraries. Our Digital Scholarship Group, for example, is located in the library (in the beautiful and recently constructed Digital Scholarship Commons) and the manager of Northeastern’s digital repository is technically under the auspices of the DSG. The DSG had a big hand in Northeastern becoming the digital home for the Our Marathon archive, a crowd-sourced digital archive that was created following the Boston Marathon bombing of 2013.

our marathon

Our Marathon was a joint effort with a number of community partners, and documented the community’s response to the bombing. As you might guess, a lot of this is emotionally difficult stuff, and there have been a lot of comparisons to, for example, the 9/11 digital archive. While the Our Marathon digital archive was built, thankfully, in consultation with archives staff (which means that we have things like meaningful metadata and some documentation of donated materials; as digital humanities projects go, we’re in pretty ok shape), it was built on an Omeka platform without an explicit plan in place to eventually pull it into the archives’ digital collections. Basically: we have it, but we don’t have it in a stable location that will allow for long-term access. Yet.

That’s where I come in. Once the new digital repository is up and running, we’ll need to ingest all of this born-digital material and its accompanying metadata into the DRS. There are a ton of different file formats (audio, various formats of text, images), for one, and while there’s thankfully a ton of useful metadata, it’s all in Dublin Core. The DRS uses MODS. The metadata also doesn’t always reflect technical or administrative information – like, for example, all of the rights information we need. All of these agreements exist somewhere, but tracking them down will be a task. We’ve also determined that there is useful metadata that Omeka supports (e.g. geotagging) that may or may not translate to the DRS. And, of course, there’s the matter that Our Marathon, as it stands on Omeka, is currently being supported on a volunteer basis by a doctoral student whose time to dedicate to this project is pretty limited.

My task, as I initially understood it, was going to be so easy! Just create a workflow for ingesting this already-full-of-metadata digital collection from one platform to another. Sure! Simple! Workflows are fun! (Nerd confession: I do genuinely enjoy creating workflows.) Oh, the sweet naivete of two-months-ago-Jen.

Stay tuned for how this all shakes out.

Signing off for now,


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s