Physical Description information in Archivists’ Toolkit, ArchivesSpace, ICA-AtoM, and EAD3

We’ve been taking a closer look at how we create and encode Physical Description information.  Our deployment of the Archivists’ Toolkit does not employ the multiple <extent> plugin so we’re a little limited as to how we enter Physical Description information.  We’ve been using the Toolkit “Container Summary” field for recording information about the containers (i.e., boxes, folders, etc.).  The Toolkit exports this information as a simple <extent> note that appears immediately after the regular <extent> note (which provides information about the archival materials, not the containers).

We’re watching the developments with ArchivesSpace, the open-source platform that will supersede the Toolkit later this year.  The ArchivesSpace team is developing a migration tool to migrate data from the Toolkit MySQL database (as well as the database that powers the Archon platform) into the ArchivesSpace MySQL database so users don’t have to import/export EAD or CSV accession records. This got me wondering about the destination of the Container Summary field in ArchivesSpace (see this ArchivesSpace user group thread for more info). We want to be able to distinguish between the two <extent> notes when we’re working with exported EAD and it appears the only way to do that will be to tweak the export routine in the Ruby source code.

At the same time, as we export our finding aids for publishing in our ICA-AtoM database, we’re noticing that Atom properly handles the multiple <extent> notes nested within the first <physdesc> element. But we’re also noticing issues with how AtoM handles multiple Physical Description elements. Specifically, it does not import repeating <physdesc> elements and it does not import the <physfacet> element (see this AtoM user group thread for more info). We’re currently looking at whether we should revise our XSLT to merge multiple <physdesc> notes into one note or tweak the AtoM import code so it accepts the code.

All of this prompted me to make a post to the Encoded Archival Description list to see how other folks are encoding Physical Description information (see thread #7 in the list archives). This generated some helpful discussion but also brought up the pending revisions to Encoded Archival Description (EAD) and its complete overhaul of the <physdesc> elements (see Mike Rush’s response for more info).

The changes will really help us handle complex physical description information that’s required for audiovisual material but it will be some time before ArchivesSpace or ICA-Atom support EAD3. This is probably a good thing as it will allow us more time to think about how the revisions to EAD will affect the display of the descriptive data. For example, with our Rules for Archival Description, the explanatory text allowed under <physdescstructured><descriptivenote> would not typically be nested within the data included in <physdescstructured>.  One of RAD’s many idiosyncrasies is that it asks for descriptive notes relating to the physical description to be included in the “Notes Area,” which comes after the Physical Description Area and Archival Description Area (see Rule 1.8B9: I don’t think that rule is used all that often because in practice, most people just include the note in the Physical Description Area, but it is there nonetheless. This will only become important if anyone attempts to design a stylesheet to display EAD3 data according to the structure prescribed in RAD; they would *technically* need to move the data contained in <physdescstructred><descriptivenote> and place it alongside other elements that appear outside of <did> (e.g., <originalsloc>, <otherfindaid>, etc.).

RAD clearly needs some revisions, but that is another story! The moving targets are making it a little difficult to develop procedures for creating and entering physical description information – do we use granular EAD elements or lump everything together into one <physdesc> note?  Our database holds a bit of both, so we will likely need to account for various EAD scenarios when we work on our XSLT, and AtoM database.

Merging extent types

As we march along with our migration from MS Access to the Archivists’  Toolkit, we’ve been noticing that our list of extent types has become unwieldy.  Our legacy finding aids contained extent types like:

  • cm of textual record; 4 maps
  • 10 centimeters
  • 34 centimetres
  • 2 metres; 12 blueprints

When we migrated our legacy EAD, these terms were imported to AT and added to the list of available extent types.    Additional terms were added to the list during routine accessioning and processing.   Over time, the list grew to include redundant or incorrect extent types.   Vague terms like “boxes” and “item” began appearing.   A repository profile report confirmed the authority control problem by showing the total volume of each measurement divided between the various incarnations of the extent type (e.g., cm,centimetre centimetres, etc.).

At the same time, our project to migrate AT finding aids to ICA-Atom has helped to clarify some of our description practices.  Seeing how physical description information is passed through our legacy finding aids into the AT and then into ICA-Atom has done much to inform our approach to physical description.

So, I set out to fix the problem by creating a set of new, RAD-compliant extent types:

  • cm of textual records
  • cm of textual records and other material
  • cm of graphic material
  • cm of multiple media
  • m of textual records
  • m of textual records and other material
  • m of graphic material
  • m of multiple media

These extent types would address a related issue: how to enter the “specific material designation” in accordance with RAD (see Rules 1.5B1, 1.5B3, and 3.5B1).  All I had to do was merge the old, incorrect measurements into the new, correct measurements.  That is where things went terribly wrong.

The merger of the various “metre” extent types worked well, but the merger of the various “centimetre” extent types failed somehow.  Well, some of them worked.  But I found that I had “cm of textual records” and “cm of textual record.”  I incorrectly thought the term should be “cm of textual record” (in fact, the phrase in Rule 3.5B1 is “textual records“), so I tried to merge the two terms and it did not work.  It would’ve been a substantial number of updated records (ca. 5,000).  The merger finished, but I could still see “cm of textual records” in the main list view.

What was really troubling is that different terms would appear inside the open resource record.  Some records showed the correct term (see screenshot below), but others had blank extent types and others showed completely different terms (e.g., volumes, boxes, etc.).

Screenshot of AT resource record with incorrect extent type

So, there’s the problem.  It was happening everywhere – resource records, accession records, deaccession records, etc.  It was a little nerve wracking!  Were incorrect terms overwriting the extent types that failed to merge?  Did we save the incorrect or blank extent types when we opened and saved resource and accession records?

The answer is no.  The incorrect and blank terms were strange, but they were not overwriting the extent types in records where the merger failed.  phew!

Here is the solution provided by our systems developer, who helped troubleshoot and fix the problem:

Step 1:

Backup your database with a mysqldump

Step 2:

– all the accessionId where the extentType is ‘cm of textual records’

SELECT accessionId FROM Accessions WHERE extentType=’cm of textual records’

Step 3:

Copy the id’s into a program that lets you do regular expression conversions like TextPad ( and preform a regular expression on the id’s to create your update statement.

TextPad Example:

  1. Paste ID’s into textpad, one ID per line
  2. Press F8 or go to Search->Replace
  3. The following values should be entered
    1. Find what: .*
    2. Replace with: UPDATE `ATK`.`Accessions` SET `extentType`=’cm of textual record’ WHERE `accessionId`=’&’;
    3. Conditions: Check the ‘Regular Expression’ checkbox
  4. Click Replace All

Step 4:

You should now have a list of update statements similar to…

UPDATE `ATK`.`Accessions` SET `extentType`=’cm of textual record’ WHERE `accessionId`=’5′;

UPDATE `ATK`.`Accessions` SET `extentType`=’cm of textual record’ WHERE `accessionId`=’11′;

UPDATE `ATK`.`Accessions` SET `extentType`=’cm of textual record’ WHERE `accessionId`=’15′;

UPDATE `ATK`.`Accessions` SET `extentType`=’cm of textual record’ WHERE `accessionId`=’20′;

UPDATE `ATK`.`Accessions` SET `extentType`=’cm of textual record’ WHERE `accessionId`=’21′;

Step 5:

Run your update statements in MySQL

Step 6:

Repeat Steps 2-5 for all affected tables.

In our cases we modified, Accessions, Deaccessions, Resources, ResourcesComponents.  Of course, we’ll have to change the extent type back to “cm of textual records,” but at least we’ve identified a potential reoccurring problem and how to solve it.

RAD/Archivists Toolkit/EAD Crosswalk

We’ve been busy working to get our Archivists’ Toolkit set up to produce RAD finding aids.  A key criteria for us was to have AT-exported EAD that more or less conformed to existing RAD/EAD matrices.  We wanted to know that EAD exported from the AT is EAD that can be manipulated into a RAD multi-level finding aid.  The Canadian Committee on Archival Description released this crosswalk in 2003.  Artefactual Systems, the developers of ICA A-to-M software, have an excellent crosswalk between RAD, EAD, ISAD(G), Dublin Core, MARC, and MODS (also available here).

These crosswalks make it easy enough to see how each RAD rule should be mapped to an EAD element.   What we didn’t know was how the various AT data entry fields exported to EAD.  I came across a couple AT import and export maps, which were really helpful, but they didn’t link the EAD elements to RAD.  We still needed to determine what AT data entry field or note area should be used for a given RAD rule.

And to complicate things even further, AT is built using terminology derived from DACS, the US archival description standard.  Many types of information are the same (e.g., a title is a title) as RAD, but many of the note fields in the AT are by default labelled differently than you might find in a truly RAD-based archival information system.  For example, the Physical Characteristics and Technical requirements note exports to the <phystech> element.  But the RAD/EAD matrices show that RAD Rule 1.8B9a for “Physical condition” should be mapped to the <phystech> element.  The Physical Facet note exports to the <physdesc><physfacet> element – the same element prescribed for RAD Rule 1.5C for “Other physical details.”

Once we got a handle on the terminology, we were easily able to edit the labels for the notes and a few other fields where it seemed like it would be helpful.  The note labels can be edited by using the Setup > Notes, Etc. box and the labels for other fields can be edited by using the Setup > Configure Application box and selecting the table that contains the fields you wish to edit.

The next step was to experiment with the EAD export.  I created a dummy resource record where all the fields were filed out.  It had dummy fonds-level notes, dummy file-level descriptions, dummy names and subjects, etc.  The data was just placeholder text so I could easily see in the HTML or PDF reports how the information was being displayed.

By comparing the EAD exported by the AT with the RAD/EAD crosswalks we began to get a sense of where the exports were mapped properly and where there might be potential issues.   Fond-level descriptions go into our OPAC as MARC records so we did the same with the MARCXML exports.  We needed a way to keep track of our findings so I copied the Artefactual Systems crosswalk into Excel, added a few columns for pertinent AT information:

  1. Description of the RAD rule – Usually cobbled from RAD itself
  2. Archivists’ Toolkit Data Entry Form – Indicates what AT data entry field, note area, or method of encoding should be used for a given RAD rule
  3. AT Export EAD – Indicates how the EAD actually looks when it is exported from AT (helpful to see when AT adds attributes, refs, etc. and to see if it deviates from the crosswalk)
  4. AT MARCXML Export – Indicates how the MARCXML actually looks when it is exported from AT (again, helpful to see where the AT deviates from the crosswalk)
  5. MARCXML Export Issues – Indicates any potential issues with using the AT to export MARCXML records in a RAD context
  6. EAD Export Issues – Indicates any potential issues with using the AT to export EAD records in a RAD context

I also added some rows to describe the areas of description that the AT allows for but are not prescribed in RAD (e.g., abstracts, legal status, etc.).

Anyway, I posted back in June about some of our findings as we were getting started with that project and I’m happy to say now that we’ve mostly finished updating Artefactual Systems’ RAD crosswalk with guidelines about how descriptive information should be added to the AT [1]. It is rough and will definitely be updated as we continue moving forward, but it is already helping us understand where we’ve been making mistakes and to develop procedures for data entry that avoids those mistakes.

So, in summary, here’s what we did to configure AT to generate EAD more in line with the Rules for Archival Description (lists of edited field labels are included in the RAD/AT/EAD crosswalk):

  1. Edit note labels to use language found in RAD
  2. Edit field labels in Resource table
  3. Construct a resource record where (a) every note is used at least once, (b) names and subjects are linked, and (c) every field in the finding aid data tab is filled out [2]
  4. Check AT exported EAD against existing RAD/EAD crosswalks and note errors, anomalies, etc.

[1] RAD/AT/EAD Crosswalk with lists of edited field labels and potential import/export errors for RAD users

[2] I can’t upload XML, ZIP, or TXT files but if anyone wants the dummy EAD record, I can send it by email.

Our next step is to customize the AT XSL sheets so the data is grouped according to RAD.  For example, AT groups the custodial history note under an “Administrative Information” section but we want to group it under an “Archival Description” section.  We’re making good progress with that, and will post the new stylesheets when they are finished.