This week my fellow residents and I were fortunate to receive an introduction to the File Information Tool Set (FITS) from Andrea Goethals. Andrea is the Manager of Digital Preservation and Repository Services at Harvard Library, Director of the NDSR Boston program and a developer of the FITS tool. Released in 2009, FITS is a digital preservation tool designed and developed at Harvard Library to identify and validate a wide assortment of file formats, determine technical characteristics, and extract embedded metadata. The technical metadata generated and collected by FITS can be exported in a variety of XML schemas and may be included in other files for digital preservation purposes, such as Harvard Libraries’ inclusion of FITS output in METS files in its preservation repository.
Digital preservation repositories accept into their care electronic files that are created and saved in a growing number of file formats. Proper identification of a file’s format and the extraction of embedded technical metadata are key aspects of preserving digital objects. Proper identification helps determine how digital objects will be managed and extracting embedded technical metadata provides information that future repository staff or users need to render, transform, access and use the digital objects.
There are several tools available that can identify and validate file formats and extract technical metadata. The great thing about FITS is that it bundles many of them together. The current version of FITS, 0.10.0, includes the following applications:
An explanation of each tool can be found on the FITS web site.
While these tools can be used individually, using them under the FITS umbrella is more efficient. FITS runs all the tools simultaneously, saving you time. FITS knows the strengths and weaknesses of the applications and which tools support which file formats. You benefit by installing and running a single application and receiving output from multiple applications that is appropriate to each file format.
Receiving output from multiple tools can help you verify accurate information when the tools agree, or flag a concern when they don’t. It is also helpful that FITS consolidates and normalizes the output, providing a homogenized data set that is easier to interpret. Each tool’s output is converted to a common FITS XML schema ensuring labels and terminology are used consistently. The extracted metadata can then be exported to different technical metadata schemas such as MIX for images, TextMD for text and DocumentMD for documents. Any of these schemas can then be inserted into other files like METS to provide repository documentation suitable for digital preservation.
FITS is an open-source, Java-based application that is freely available from GitHub or the FITS web site. Because it is Java-based it runs on Windows, MAC or Linux platforms from a command-line interface. It also provides an API and can be embedded in other applications; it is one of the included micro services in Archivematica. Using a command-line interface can sometimes be intimidating and confusing, but FITS employs a limited number of intuitive commands.
FITS configuration is managed with several XML files that are easily edited with a text editor. The main configuration file, fits.xml, allows you to prioritize tools, include or exclude certain file formats from processing, enable or disable additional features like generating checksums, and determining the various output options. Another positive for the digital preservation community is that FITS is actively maintained so there is a procedure for addressing bugs and a schedule for releasing updates.
The FITS web site (fitstool.org) is well organized and fully documents the installation, configuration, use, and output options.
I know my post pales in comparison to a live demo of the application. But if it piques your interest, take it for a test drive. You’ve got nothing to lose and you might add a new tool to your digital preservation tool box.
Thanks for reading, Jeff