We’ve accomplished a big milestone here at the State Library—we have completed our review of the web statistics! One of the main objectives of my project was to perform a comprehensive assessment of Massachusetts state government publications and we chose to use web statistics as a way of accomplishing this goal. The web statistics, gathered by Mass.gov., showed us where on agency websites materials are posted and also, after a categorization process, tells us what kinds of content agencies are producing. Implementing a priority ranking system, we also see what kinds of documents are high priority or low priority (according to the collection policy statement we created at the beginning of this process).
We began working with the web stats as a means for identifying and selecting the content we want to preserve and provide access to through the DSpace repository. As the residents learned in our first few months of NDSR, the identification and selection of content are the first steps an institution should take in planning for current and future preservation needs. Reviewing the documents from the web statistics answered the questions of what content our producers create, what content are we required to keep, and what content do we feel is most valuable to the library. The answers to these questions will inform an inventory of the kinds of content that agencies produce and will help us update the collection policy statement that we began working on in the fall. The policy statement is meant to be a living document that is continually updated as priorities or types of content change.
Having a policy statement then guides the selection of content for long-term preservation and access. Referring to documentation of our practices allows the staff to make well-informed decisions about what kinds of content is most valuable for the library and its patrons, and helps us maximize resources. Rather than spending time and energy capturing things like ephemeral material, we can allocate time and resources towards capturing things like reports or meeting materials. Our policy is something we can use to select materials as well as justification for these decisions if a patron asks why we capture certain items and not others. Documenting these actions and procedures is an important step for the State Library in building their digital preservation practice.
So how many documents did we go through? All told, we reviewed and appraised over 75,000 documents, which is pretty incredible! Many of these documents are already in DSpace and many are low priority, so we do not need to catalog and ingest every single one. I’m currently compiling and analyzing the data we pulled from the statistics (which includes the total number per agency as well as the breakdown of monographic and serial documents). I’ll know more soon about how many high priority documents we need to handle, and then will be working on a plan for the low priority documents as well. All of this will be documented and included in my final report for the State Library. In addition to using the data collected from web statistics in the identification and selection process, the web statistics allow us to use quantitative data as justification for requesting additional resources. Knowing that we have only so many resources in place currently and seeing how much work needs to be done (with data to back that up), we can use this as proof of what resources we should add to handle the workload ahead.
This process was not always shiny or fancy, and at times it was an uphill climb (going through 10,000 documents from one agency was a particular low point for me!), but we continually fine-tuned the workflow until the whole staff got into a steady rhythm. This was a great lesson for me in designing and testing workflows over time, being flexible and open to new ideas, and keeping the big picture in mind. Some of the challenges included managing many, many spreadsheets at once, tracking progress over time (as each staff member was responsible for their own agencies, but I was in charge of the big picture so I needed to be kept up-to-date on everyone’s status without being overbearing), and ensuring we were capturing only the necessary data (which was part of the workflow evolution. We began tracking lots of data, then boiled it down to the most essential to save time). Every tweak or change in the workflow was done in service of getting a better understanding of the scope of state publications, and ultimately I feel we’ve achieved that.
I’m taking the team out for a lunch next week to thank them for all of their help reviewing these and to celebrate this accomplishment. Again, this step meets a major goal for us and will help inform the next steps for my project. With a month left, I’ll be documenting this whole process and including much of my data collection in a final report for the State Library. Thanks for checking in!