The Last Day

Hello all,

Today is the last day of our residency program! Over the last nine months, we’ve attended and presented at numerous conferences, organized and supported community events, written blog posts here, for The Signal, and our personal blogs, participated in webinars and workshops, and…oh yeah, completed entire projects at our host institutions. We held a Capstone event this week that allowed us to share the final outcomes of our projects with the public and to reflect on what we’ve learned. As we move on to the next phase, I know the lessons learned, challenges faced, and successes we’ve had will stay with us, and most importantly, so will the network and community we’ve built as members of the NDSR program. Thank you to Nancy McGovern, Andrea Goethals, and Kristen Confalone for coordinating this program and guiding us through the process. And thank you to Nancy, Andrea, Erica Boudreau, Joanne Riley, Andrew Elder, and Alix Quan for your support as our hosts and mentors. We appreciate all of the assistance and encouragement we’ve gotten from the greater community as well.

 

Signing off,

NDSR Boston

Documentation & Policies

As we near the end of the residency, I’ve mostly focused my efforts on writing documentation and policies for the State Library of Massachusetts. Prior to my project, there was limited or outdated documentation and no policies in place regarding the management of the library’s digital content. This is the position that many institutions are in when beginning to consider digital preservation; it takes a lot of effort and commitment just to begin, but it is important to do so. Writing both documentation and policies will hopefully go a long way in establishing a digital preservation program at the State Library.

Documentation

In addition to much of the documentation being outdated, there also seemed to be a disconnect between what the documentation said and what the library staff was doing in practice (a common issue!). The documentation needed to be updated to reflect current practices, to act as a record of the decisions we’ve made through this process, and to ensure transparency in the State Library’s activities.

Knowing that documentation was a key deliverable of the project, I tried my best to document my activities as I went. For example, after meetings with institutions such as the Massachusetts State Archives and MassIT (the Massachusetts State IT Department), I typed up and organized my notes, then made them accessible to staff members through our shared server. These conversations informed some of our practice and I wanted the State Library staff to be able to refer to these notes later if needed. I also documented the process of testing and selecting a tool for batch downloading PDFs. We ultimately decided to use DownThemAll, a Firefox add-on, but I researched a few other options first. I included my notes on these options in documentation as a record of the decision-making process. In the last month, I made sure to review these notes and update them, as well as create documentation for any missing pieces.

We now have documentation on processes such as web statistics, batch downloading, renaming files with the command line, using Archive-It, creating and disseminating a survey to state libraries and archives, and our outreach methods. I hope this will help staff members better understand the decisions we made, the approaches we took, and how they can build on the work we’ve done through NDSR.

Policy Creation

In addition to documentation, policies are an important facet of a digital preservation program. Among other things, policies explicitly state an organization’s commitment to a program or project and defines an organization’s role and operating principles.

I wrote policy statements for the State Library’s collection development activities and their digital preservation program. First and foremost, these policies are meant to be working documents that are continually reviewed and revised over time. These are starting points to build on as the organization changes over time.

The collection policy statement is the culmination of what we learned through our analysis of the web statistics. After conducting the assessment, we had a list of the high and low priority documents we aim to collect and a description of the value-based judgments that led us to categorize documents this way. I feel that this adds a level of transparency not previously in place. This statement will be used as a guide when identifying and selecting valuable content moving forward.

The digital preservation policy statement describes the legal mandate that the State Library faces, the scope of the program, challenges faced, the roles and responsibilities of the State Library in tackling digital preservation, and more. Defining details such as the file formats we accept, our focus on collaboration, our intended audience, and our guiding principles allows the State Library to move forward in its efforts to preserve digital content with this as a reference guide.

Updating these policies or creating new policies as the State Library’s delves deeper into digital preservation is key.

Early in the NDSR program, we, as a group, assessed our host institutions against a few benchmarks to get an understanding of where our organizations stood and to better understand the steps we should consider in moving them forward. Using the Five Organizational Stages of Digital Preservation benchmark, I concluded that the State Library was at Stage 2, meaning that they were advancing from understanding the need for digital preservation to taking on a digital preservation project (NDSR!). Though no formal policies were in place, the State Library understood they needed to better manage the digital content in their collections and demonstrated a commitment to doing so through participation in the NDSR program. My hope is that a successful NDSR project, an assessment of the scope of existing content, an increased knowledge of how similar institutions handle these tasks, and the development of documentation and policies will all assist the State Library advance towards Stage 3, in which they build a long-term digital preservation program.

My fellow residents and I will be presenting our project posters at Harvard on Monday, May 23 from 3-4:30. Check here for more information and hope to see you there!


Sources

Anne R. Kenney and Nancy Y. McGovern, “The Five Organizational Stages of Digital Preservation”, http://quod.lib.umich.edu/cgi/t/text/text-idx?c=spobooks;idno=bbv9812.0001.001;rgn=div1;view=text;cc=spobooks;node=bbv9812.0001.001%3A11.

 

Web Statistics: CHECK

We’ve accomplished a big milestone here at the State Library—we have completed our review of the web statistics! One of the main objectives of my project was to perform a comprehensive assessment of Massachusetts state government publications and we chose to use web statistics as a way of accomplishing this goal. The web statistics, gathered by Mass.gov., showed us where on agency websites materials are posted and also, after a categorization process, tells us what kinds of content agencies are producing. Implementing a priority ranking system, we also see what kinds of documents are high priority or low priority (according to the collection policy statement we created at the beginning of this process).

We began working with the web stats as a means for identifying and selecting the content we want to preserve and provide access to through the DSpace repository. As the residents learned in our first few months of NDSR, the identification and selection of content are the first steps an institution should take in planning for current and future preservation needs. Reviewing the documents from the web statistics answered the questions of what content our producers create, what content are we required to keep, and what content do we feel is most valuable to the library. The answers to these questions will inform an inventory of the kinds of content that agencies produce and will help us update the collection policy statement that we began working on in the fall. The policy statement is meant to be a living document that is continually updated as priorities or types of content change.

Having a policy statement then guides the selection of content for long-term preservation and access. Referring to documentation of our practices allows the staff to make well-informed decisions about what kinds of content is most valuable for the library and its patrons, and helps us maximize resources. Rather than spending time and energy capturing things like ephemeral material, we can allocate time and resources towards capturing things like reports or meeting materials. Our policy is something we can use to select materials as well as justification for these decisions if a patron asks why we capture certain items and not others. Documenting these actions and procedures is an important step for the State Library in building their digital preservation practice.

So how many documents did we go through? All told, we reviewed and appraised over 75,000 documents, which is pretty incredible! Many of these documents are already in DSpace and many are low priority, so we do not need to catalog and ingest every single one. I’m currently compiling and analyzing the data we pulled from the statistics (which includes the total number per agency as well as the breakdown of monographic and serial documents). I’ll know more soon about how many high priority documents we need to handle, and then will be working on a plan for the low priority documents as well. All of this will be documented and included in my final report for the State Library. In addition to using the data collected from web statistics in the identification and selection process, the web statistics allow us to use quantitative data as justification for requesting additional resources. Knowing that we have only so many resources in place currently and seeing how much work needs to be done (with data to back that up), we can use this as proof of what resources we should add to handle the workload ahead.

This process was not always shiny or fancy, and at times it was an uphill climb (going through 10,000 documents from one agency was a particular low point for me!), but we continually fine-tuned the workflow until the whole staff got into a steady rhythm. This was a great lesson for me in designing and testing workflows over time, being flexible and open to new ideas, and keeping the big picture in mind. Some of the challenges included managing many, many spreadsheets at once, tracking progress over time (as each staff member was responsible for their own agencies, but I was in charge of the big picture so I needed to be kept up-to-date on everyone’s status without being overbearing), and ensuring we were capturing only the necessary data (which was part of the workflow evolution. We began tracking lots of data, then boiled it down to the most essential to save time). Every tweak or change in the workflow was done in service of getting a better understanding of the scope of state publications, and ultimately I feel we’ve achieved that.

I’m taking the team out for a lunch next week to thank them for all of their help reviewing these and to celebrate this accomplishment. Again, this step meets a major goal for us and will help inform the next steps for my project. With a month left, I’ll be documenting this whole process and including much of my data collection in a final report for the State Library. Thanks for checking in!

-Stefanie

An Update from the State Library

In college, I took several courses that involved working closely with one of the many helpful librarians on campus. She would often refer to our projects as “iterative”– so much so that she would even laugh as she said it. Six months into my residency at the State Library of Massachusetts, the joke is on me as our process has been very iterative. This post will cover what we’ve been up to recently and what is ahead for us in the next few months.

A quick recap: we’re exploring more efficient ways of finding, downloading, and providing access to digital state publications. We’ve been working with web statistics downloaded from Mass.gov to assess the extent of digital publications and to determine what is most valuable to preserve for the Library and its users.

The web statistics workflow has, of course, evolved, requiring flexibility and an open mind. When we began using the statistics, each member of the project team was checking each URL listed, noting the type of document it was, then each of the team members would rank the document on a scale of 1-5 (1 being lowest priority, 5 being highest) using shared spreadsheets. Once we all had a solid understanding of what was highest and lowest priority, we determined that we didn’t need to each rank each type of document, so each staff member would tackle a different agency and enter their own priority rankings. We also created a new spreadsheet to consolidate that data into how many documents there were total and how many of each priority ranking. This gives a bigger picture assessment of how many state publications exist, and how many high priority documents we need to handle quickly. A few weeks later, we then decided to add a category in the spreadsheets to note whether these documents were series, serials, or monographs, which affects the way the items are cataloged. Though these are relatively minor changes in the workflow, they do reflect how important it is to continually check in with the project team about what’s working well and what could be improved. It is very iterative!

While that process is ongoing, we are also examining how to download the thousands of publications we’ve reviewed through the web stats. I researched tools that would help us batch download PDF or Word docs from sites, taking into account the Library’s resources. Though CINCH, a tool developed by the State Library of North Carolina, fits our needs well, the installation requirements were not feasible for us. I began playing around with a Firefox add-on called DownThemAll! (yes, the exclamation mark is part of the name– though it is very exciting). DownThemAll (dTa) allows a user to upload a list of URLs, specify the folder in which you’d like the files saved, then, like magic, the files are fully downloaded (dTa has other features and functions, such as a download accelerator). Any errors are noted and not downloaded, so you can go back and check if this was a 404 error or human error, for example.

The tool is free, easy, and works very well! My concern, however, is that it is not backed by an institution and it’s unclear how much funding or technical support they have. What if I come into work tomorrow and it’s gone? Who do I contact? Though they have some support help, it’s limited (for example, I emailed about an issue three weeks ago, and haven’t heard back). dTa works only with Firefox– what if there’s an issue with the browser and we can no longer access the tool? While the function of the tool works well and will be useful in the short term, I don’t see it being a sustainable solution for batch downloading. This is another part of the process that we’ll need to keep revisiting over time. And if anyone has ideas or suggestions, please let me know!

One big success we’ve had is collaborating with MassIT to gain access to their Archive-It account. Though MassIT manages the account, they’re capturing the material that we need– webpages with links to documents published by state agencies– so it makes perfect sense to work together to use Archive-It to its full capacity. I worked with MassIT to customize the metadata on the site, then I wrote some information to publish on our website about how to access and use Archive-It for the general public. We’re considering how best to incorporate Archive-It into our workflow. While DSpace will remain our central repository, where we can provide enhanced access to publications through metadata, Archive-It is capturing more material than we will be able to, which is a huge help to us. (Archive-It also allows us to print PDF reports to see all PDFs captured in their crawls, and we can use dTa to download them. We’re not currently using this now, but this is an option for the State Library to use going forward.)

With each iteration of the workflow, I feel we are getting closer to solving some of the big questions of my project. We hold weekly staff meetings to check in about the current process. Hearing each staff member’s thoughts on challenges or potential areas of improvement has taught me much about how to continually bring fresh eyes to an ongoing process, and how to keep the big picture in mind while working through smaller details. Flexibility is key not only with this project, but with digital preservation as a whole, as processes, tools, software, and other factors continue to evolve.

I hope to leave the State Library with some options of how to take this project forward, even if not all of the questions have a definitive answer. We’re also now focusing our attention on addressing other issues in the project, such as outreach to state agencies and the cataloging workflow between their OPAC, Evergreen, and DSpace. There’s much to accomplish in the remaining weeks, and I look forward to updating you as we make progress on these goals.

Thank you!
Stefanie

NDSR Tour of the Massachusetts State Archives

This week, my fellow residents, our hosts, and members of the NDSR community visited the Massachusetts State Archives. Located on Columbia Point, the Archives house, preserve, and make accessible public records of the Massachusetts government. massachusetts_state_archives_50_timeline_ca_jul14_800x564We talked with the Electronic Records Archivist Veronica Martzahl about digital preservation efforts and learned about the Archives’ amazing collections from Executive Director Michael Comeau. Thanks to you both and the Archives staff for having us!

Veronica shared what led to the creation of her role at the Archives and informed us about some digital preservation initiatives that are underway. When previous Massachusetts governor Mitt Romney left office, his hard drives were swept clean and no electronic records were transferred to the Archives. This alone would be an issue in terms of government transparency and the importance of leaving a historical record (and definitely not in line with best archival practice!), but it became even more critical when Romney ran for president. This provided the impetus for the Archives to develop a digital preservation program that would ensure better procedures moving forward.

For about two years now, Veronica has been working tirelessly to implement a new digital repository, which included testing, cost analysis, and training, and has had her hands in several other projects as well. In the end, the Archives chose the Preservica Standard Edition for their digital collections. The big take-away from this is that the process was long and challenging. Dealing with factors such as IT constraints, budgeting, and the usual politics involved in government work presented some hurdles, but that there was strong institutional commitment for the project, which is such an important factor in digital preservation. This taught us much about the reality of selecting systems for your institution– something I’m sure all of the residents will deal with sooner or later! We were all very impressed with the amount of work Veronica has achieved, and can see the long-term positive impact that this repository will have for the Archives. 

As the resident at the State Library, I was particularly interested in what we can learn from another government agency working to preserve digital government information. Veronica was kind enough to spend some time with me last October discussing the current state of digital preservation at the State Archives, and I was excited to expand on that today, plus to hear updates since we last talked. One question I often get is, why don’t the library and archives collaborate on digital preservation? In a case of maddening bureaucracy, the Library reports to the Department of Administration and Finance, while the Archives report to the Secretary of the Commonwealth. This fracture often results in some confusion, but the staff at both institutions are very supportive (we often refer users to the Archives for research, and vice versa). I hope the Archives and Library staff can continue to find opportunities for collaboration, especially in regards to digital preservation.

After Veronica caught us up on the digital projects, Michael then provided us with some interesting background information about the Archives, it’s vast collection, and some detail about their emergency preparedness plan. Columbia Point, where the Archives are located, is very close to the water, and susceptible to some serious damage from natural disasters. e96cbf7d40f573aae8d8499bc743ff3eMichael explained that that is partially why the building is designed to be so strong– it has to withhold some intense weather! 

We were able to see the original versions of some founding documents in Massachusetts history and the Bill of Rights, on display in the Commonwealth Museum. As a history nerd, I was pretty jazzed to be in the same room as these materials. Hearing Michael discuss the process of designing a proper space to house these documents was equally interesting. They worked with a scientist at MIT to create a home for these materials, thus protecting their longevity. The encasements they designed have allowed these crucial pieces of history to be well preserved. Though our focus may be on digital preservation, it was a great chance for us to hear a case study around preservation of print materials and to consider how necessary preservation is, regardless of format.

This month we also get to hear about the Digital Commonwealth, get a demonstration in the FITS tool from Harvard, and attend an UnConference at the JFK Library. Looking forward to it!

Upcoming NDSR Events

Hello, and hope you all enjoyed the holidays! Winter has officially arrived in Boston, just in time for ALA Midwinter. Bundle up!

Speaking of ALA, we’ll be sharing the progress we’ve made on our projects at the Preservation Administrators Interest Group (PAIG) on Saturday, January 9th in the Boston Convention & Exhibit Center Room 160AB. The program runs from 8:30-11:30 am. We’re excited to present on what we’ve accomplished so far, and what we are hoping to do in the next five months of our residency. Here is the full agenda for the PAIG meeting.

We also have our NDSR mid-year event coming up on January 26th, from 3-4:30 pm at 90 Mt. Auburn Street, Room 021. This will be another chance to learn about our projects and to celebrate our achievements thus far.

We’re looking forward to sharing our work with you and to continue building a collaborative community around digital preservation. Thanks!

 

Keeping up with digital preservation

Hello readers!

Much has happened in the month since I posted last. My fellow residents and I have been busy attending conferences, participating in events both on-site and off, planning for future presentations, and working hard on our projects.

I feel that my project is in a good place with using web statistics to help us understand the scope of existing state publications and using that to define a collection policy. The team at the State Library has been wonderfully supportive and collaborative, and I feel very lucky to be working with them. I hope in a month or two, I will have a clear definition of what state publications are available through the State Library’s DSpace repository to share with you here.

The residents and I met this week to participate in a discussion with Nancy and Andrea, our program coordinators and the hosts at MIT and Harvard. We talked about preservation storage and protection, as well as what activities we are engaged in as part of our professional development. It was a reminder of how many opportunities exist to get involved in the field—a nice problem to have!

I started thinking about some of the ways in which I keep up with developments and news in the field of archives and digital preservation, and I thought I’d share some of those here. I’d also love to hear what kinds of ways you all engage with the profession; I’m sure there are so many I am leaving off my list.

Here are some of the methods I use to stay current:

Conferences

  • Some of the conferences my fellow residents and I have attended or are planning to attend include: NDSA-New England, iPres, ALA-Midwinter, Code4Lib, New England Code4Lib, regional conferences/annual meetings, Archiving 2016, DPLAfest, SAA

Webinars/Classes/Continuing Education

  • We’re lucky to participate in webinars as part of the residency, with a webinar led by Nancy McGovern and a group discussion facilitated by Nancy and Andrea Goethals. There are also great webinars offered by NISO, SAA, NEDCC, continuing education programs (e.g., Simmons here in Boston), and more!

Blogs

Listservs: I know there are some mixed feelings about listservs, but I really enjoy the digests. I like that there is a communal feel, and that there is a resource that allows you to engage with professionals outside of your institutions and region. I imagine it is especially useful for those lone arrangers out there.

  • I subscribe to the following listservs: SAA Electronic Records, SAA Preservation, SAA Web Archiving, SAA SNAP, SAA Women Archivists, Western Archivists, Archives & Archivists, ALA Digipres.

Twitter: I love Twitter as a means for engaging in brief, quick updates on what’s new and current in the field.

  • I follow a number of librarians, archivists, and institutions, who frequently post about their collections, projects, and programs. They also usually post URLs to articles, conferences, webinars, etc. that keep me up-to-date.
  • I also like the way that some roundtables/groups use Twitter to engage with their community, such as SNAP’s use of “SNAPchats”. Using a shared hashtag, SNAP facilitates discussions around relatable topics and creates an open forum for conversation through Twitter.
  • We also know that conferences now have hashtags they adopt so you can follow a conference from afar. This is a great way to get a sneak peek at what’s happening.

Other ideas that come to mind include volunteering, taking tours, participating on panels, or other social media platforms. Please share the other ways in which you keep up with what’s new and exciting—I’d love to add to this list.

Lastly, a very Happy Thanksgiving to you all! Give your family, friends, pets, and food a little extra love this year—I think we could all use it!

Stefanie

Best Practices Exchange 2015 Wrap-Up

Hello, NDSR community! I’m Stefanie, the resident at the State Library of Massachusetts. My project entails developing more efficient workflows for the acquisition and maintenance of digital publications created by Massachusetts state agencies. The challenge we are facing is that, while the State Library is mandated to collect all state publications for public access, state agencies now post their publications to their individual websites, which are not always stable or reliable.  The library currently seeks these publications out through various methods, but isn’t able to capture even the bulk of existing documents. I’m here to help examine what we can do to improve on this. The last month has been filled with training in their DSpace repository, learning how the library is currently acquiring state publications, conducting a workflow assessment of other state libraries, and familiarizing myself with all things digital preservation.

Over the last few days, I attended the Best Practices Exchange (BPE) annual conference, held at the State Museum in Harrisburg, PA. BPE gathers librarians, archivists, and information professionals to discuss issues around “acquiring, preserving and providing access to government information in the digital era.”  A conference essentially addressing my exact project? Held a month into the residency? In a nearby state? Yes, please! This was a great opportunity for me to engage with professionals who are tackling the same issues I am here at the State Library. I learned way too much to share with you here, but will focus on some highlights…

The most rewarding session for me was what BPE refers to as “Birds of a Feather” rooms. Using the “un-conference” model, an attendee suggests a topic, then interested individuals form break-out group to discuss image2 (1)further. My supervisor, Alix Quan, led the charge and signed us up for a Birds of a Feather room centered around electronic documents workflows. I explained my project to 10 fellow State Librarians and we discussed how our institutions handle electronic state publications. The good and bad news is that it seems we’re all in the same boat. Though each library employs various tools and processes, nobody has figured out the magical formula that enables us to capture every single publication that agencies produce—all within their resources and means. I learned so much from listening to other librarians discuss their processes for acquisitions, their collection policies, the tools they use for access and preservation, and their outreach efforts. I will be touching on each of these facets throughout my project, and having this conversation early on in the residency gives me some ideas of where this project can go.

State House of PA

State House of PA

Moreover, my hope is that this session is the beginning of an ongoing dialogue between state librarians and archivists, so that we can continue working together to create better workflows that benefit our institutions and the public.

Other sessions I attended included: Digitization at Any Scale, Digital Preservation Training and Education, an Archive-It Meetup, and Building Digital Preservation Workflows. Almost each session provided a case study and examined how that institution dealt with its digital content. I very much appreciated having the chance to learn from those who have come before—whether the project was ultimately successful or not. I took something away from each session, especially when it came to learning about the diversity of resources available for digital archival preservation. I see a lot of researching these tools in my future.

I also very much enjoyed Penn State University Archivist Jackie Esposito’s speech, “Archiving Digital Content: Challenges and Solutions”, which offered a frank and direct perspective on approaching digital content.  One key lesson I took away from her image3speech was that nobody knows what the future of digital preservation is. We cannot predict what software, hardware, computers, materials, tools we will be dealing with in 20, 30, or 50 years (though fingers crossed we do finally get those hoverboards that Back to the Future promised us). All we can do now is look into the foreseeable future, and ask what we can do to make sure these materials are preserved for the time we can control. Additionally, its better to do something than nothing. Even if its not the “right answer” (is there a right answer, anyway?), we need to proactively engage in preserving content now, or that material will deteriorate. This was helpful for me as I question what the best thing I can do is for the State Library. I think the best answer I have for now is…something!

Looking ahead, I will continue my assessment of how other state libraries handle their content, engage State Librarians in the ongoing discussion and collaboration that began at BPE, and begin narrowing down some ideas of how we can go about collecting more state publications. I’m excited to find out where this takes us!

Thanks for checking in,
Stefanie