Physical Description information in Archivists’ Toolkit, ArchivesSpace, ICA-AtoM, and EAD3

We’ve been taking a closer look at how we create and encode Physical Description information.  Our deployment of the Archivists’ Toolkit does not employ the multiple <extent> plugin so we’re a little limited as to how we enter Physical Description information.  We’ve been using the Toolkit “Container Summary” field for recording information about the containers (i.e., boxes, folders, etc.).  The Toolkit exports this information as a simple <extent> note that appears immediately after the regular <extent> note (which provides information about the archival materials, not the containers).

We’re watching the developments with ArchivesSpace, the open-source platform that will supersede the Toolkit later this year.  The ArchivesSpace team is developing a migration tool to migrate data from the Toolkit MySQL database (as well as the database that powers the Archon platform) into the ArchivesSpace MySQL database so users don’t have to import/export EAD or CSV accession records. This got me wondering about the destination of the Container Summary field in ArchivesSpace (see this ArchivesSpace user group thread for more info). We want to be able to distinguish between the two <extent> notes when we’re working with exported EAD and it appears the only way to do that will be to tweak the export routine in the Ruby source code.

At the same time, as we export our finding aids for publishing in our ICA-AtoM database, we’re noticing that Atom properly handles the multiple <extent> notes nested within the first <physdesc> element. But we’re also noticing issues with how AtoM handles multiple Physical Description elements. Specifically, it does not import repeating <physdesc> elements and it does not import the <physfacet> element (see this AtoM user group thread for more info). We’re currently looking at whether we should revise our XSLT to merge multiple <physdesc> notes into one note or tweak the AtoM import code so it accepts the code.

All of this prompted me to make a post to the Encoded Archival Description list to see how other folks are encoding Physical Description information (see thread #7 in the list archives). This generated some helpful discussion but also brought up the pending revisions to Encoded Archival Description (EAD) and its complete overhaul of the <physdesc> elements (see Mike Rush’s response for more info).

The changes will really help us handle complex physical description information that’s required for audiovisual material but it will be some time before ArchivesSpace or ICA-Atom support EAD3. This is probably a good thing as it will allow us more time to think about how the revisions to EAD will affect the display of the descriptive data. For example, with our Rules for Archival Description, the explanatory text allowed under <physdescstructured><descriptivenote> would not typically be nested within the data included in <physdescstructured>.  One of RAD’s many idiosyncrasies is that it asks for descriptive notes relating to the physical description to be included in the “Notes Area,” which comes after the Physical Description Area and Archival Description Area (see Rule 1.8B9: I don’t think that rule is used all that often because in practice, most people just include the note in the Physical Description Area, but it is there nonetheless. This will only become important if anyone attempts to design a stylesheet to display EAD3 data according to the structure prescribed in RAD; they would *technically* need to move the data contained in <physdescstructred><descriptivenote> and place it alongside other elements that appear outside of <did> (e.g., <originalsloc>, <otherfindaid>, etc.).

RAD clearly needs some revisions, but that is another story! The moving targets are making it a little difficult to develop procedures for creating and entering physical description information – do we use granular EAD elements or lump everything together into one <physdesc> note?  Our database holds a bit of both, so we will likely need to account for various EAD scenarios when we work on our XSLT, and AtoM database.

physfacet notes become physdesc notes during import

We’re currently in the middle of a project to convert several legacy MS Access databases into EAD so the data can be imported into the Toolkit.  First, a historical side note:

Our old procedures for creating finding aids looked something like:

  • Copy an existing MS Access database used to create finding aids (these are simple databases with two tables: one for series-level descriptions and one for file-level descriptions)
  • Change available series titles (these are pulled from a drop-down list)
  • Enter data
  • Create queries to pull all file-level descriptions for each series (these queries could be copied from database to database and edited as needed)
  • Create reports that merge the query results into basic EAD tags (these reports could be copied from database to database and edited as needed)
  • Save each report as a text file
  • Merge each report into a fonds-level EAD template (the fonds-level description must be prepared and pasted into the template before merging the Access reports)
  • Convert EAD into HTML using XSL transformation in Oxygen or XMetal
  • Mount HTML on Archives website

This worked well enough for a number of years, but you can see all the problems (duplicate data, onerous time commitment, inconsistent encoding practices, etc.). We’ve already migrated all our legacy EAD into the Toolkit, but there were a handful of large databases that never had EAD exported.   The data just resides in the MS Access database.

Migrating the legacy EAD enlightened us to the many inconsistencies in encoding practices, including:

  • Various “container types” and “extent types”
  • Dates in the title field
  • Incorrect encoding of subject headings

Rather than try to dust off the old procedures (which caused numerous problems when the EAD was imported into the Toolkit), we decided to hire a casual data technician to clean the data the right way.  It has been going really well, but the MS database for the imX Communications fonds (with 10,000+ records) threw a few curveballs.

The database has numerous fields for “Extent” and “Other Physical Details” information (mostly used for film and other a/v material).  We thought: since we have someone who can properly parse this data, why not encode the different fields in proper EAD elements?  By this logic, information that falls under the RAD “Other Physical Details” notes should be wrapped in <physfacet> tags (see this post about crosswalks for more info) .

Unfortunately, it’s not so simple.  The import maps for the Archivists’ Toolkit do not allow for this level of specificity.  Data wrapped in <physfacet> tags are imported, but they are stored as Physical Description Notes, not “Other Physical Details” notes.  This means they could be imported as <physfacet> but exported as <physdesc>.

We are now looking into whether it would work to import these “Other Physical Details” notes with some kind of prefix that could be used in a post-import MySQL query that would change the notesEtcTypeID and chop off the prefix.  Stay tuned…


Be careful copying and pasting subject headings from LCSH

We’ve been making great progress with backend MySQL queries and manual fixes to our extent and container types (more on that later).  We also have developed some procedures and a checklist for reviewing  the finding aids currently in the Toolkit.  We’ll do a comprehensive assessment of each finding aid (at the fonds-level, at least) before they are exported for publishing online.

In preparing those procedures, I noticed some issues with our Subject Headings List (which is being added to by several database users). We’re typically checking Library of Congress Subject Headings (LCSH) and copying and pasting terms as needed.  This has worked reasonably well, but led to an unintentional problem: some terms get entered into the database with unnecessary spaces:

Copying and Pasting terms from LCSH can unintentionally add spaces to the subject heading

It is easy enough to diagnose: the terms do not list in proper alphabetical order when there are spaces:

Subject headings don't sort properly when they are prefixed with spaces

But it is still something to watch out for when adding terms via LCSH.

Merging extent types

As we march along with our migration from MS Access to the Archivists’  Toolkit, we’ve been noticing that our list of extent types has become unwieldy.  Our legacy finding aids contained extent types like:

  • cm of textual record; 4 maps
  • 10 centimeters
  • 34 centimetres
  • 2 metres; 12 blueprints

When we migrated our legacy EAD, these terms were imported to AT and added to the list of available extent types.    Additional terms were added to the list during routine accessioning and processing.   Over time, the list grew to include redundant or incorrect extent types.   Vague terms like “boxes” and “item” began appearing.   A repository profile report confirmed the authority control problem by showing the total volume of each measurement divided between the various incarnations of the extent type (e.g., cm,centimetre centimetres, etc.).

At the same time, our project to migrate AT finding aids to ICA-Atom has helped to clarify some of our description practices.  Seeing how physical description information is passed through our legacy finding aids into the AT and then into ICA-Atom has done much to inform our approach to physical description.

So, I set out to fix the problem by creating a set of new, RAD-compliant extent types:

  • cm of textual records
  • cm of textual records and other material
  • cm of graphic material
  • cm of multiple media
  • m of textual records
  • m of textual records and other material
  • m of graphic material
  • m of multiple media

These extent types would address a related issue: how to enter the “specific material designation” in accordance with RAD (see Rules 1.5B1, 1.5B3, and 3.5B1).  All I had to do was merge the old, incorrect measurements into the new, correct measurements.  That is where things went terribly wrong.

The merger of the various “metre” extent types worked well, but the merger of the various “centimetre” extent types failed somehow.  Well, some of them worked.  But I found that I had “cm of textual records” and “cm of textual record.”  I incorrectly thought the term should be “cm of textual record” (in fact, the phrase in Rule 3.5B1 is “textual records“), so I tried to merge the two terms and it did not work.  It would’ve been a substantial number of updated records (ca. 5,000).  The merger finished, but I could still see “cm of textual records” in the main list view.

What was really troubling is that different terms would appear inside the open resource record.  Some records showed the correct term (see screenshot below), but others had blank extent types and others showed completely different terms (e.g., volumes, boxes, etc.).

Screenshot of AT resource record with incorrect extent type

So, there’s the problem.  It was happening everywhere – resource records, accession records, deaccession records, etc.  It was a little nerve wracking!  Were incorrect terms overwriting the extent types that failed to merge?  Did we save the incorrect or blank extent types when we opened and saved resource and accession records?

The answer is no.  The incorrect and blank terms were strange, but they were not overwriting the extent types in records where the merger failed.  phew!

Here is the solution provided by our systems developer, who helped troubleshoot and fix the problem:

Step 1:

Backup your database with a mysqldump

Step 2:

– all the accessionId where the extentType is ‘cm of textual records’

SELECT accessionId FROM Accessions WHERE extentType=’cm of textual records’

Step 3:

Copy the id’s into a program that lets you do regular expression conversions like TextPad ( and preform a regular expression on the id’s to create your update statement.

TextPad Example:

  1. Paste ID’s into textpad, one ID per line
  2. Press F8 or go to Search->Replace
  3. The following values should be entered
    1. Find what: .*
    2. Replace with: UPDATE `ATK`.`Accessions` SET `extentType`=’cm of textual record’ WHERE `accessionId`=’&’;
    3. Conditions: Check the ‘Regular Expression’ checkbox
  4. Click Replace All

Step 4:

You should now have a list of update statements similar to…

UPDATE `ATK`.`Accessions` SET `extentType`=’cm of textual record’ WHERE `accessionId`=’5′;

UPDATE `ATK`.`Accessions` SET `extentType`=’cm of textual record’ WHERE `accessionId`=’11′;

UPDATE `ATK`.`Accessions` SET `extentType`=’cm of textual record’ WHERE `accessionId`=’15′;

UPDATE `ATK`.`Accessions` SET `extentType`=’cm of textual record’ WHERE `accessionId`=’20′;

UPDATE `ATK`.`Accessions` SET `extentType`=’cm of textual record’ WHERE `accessionId`=’21′;

Step 5:

Run your update statements in MySQL

Step 6:

Repeat Steps 2-5 for all affected tables.

In our cases we modified, Accessions, Deaccessions, Resources, ResourcesComponents.  Of course, we’ll have to change the extent type back to “cm of textual records,” but at least we’ve identified a potential reoccurring problem and how to solve it.

General Material Designations are not genreform statements

Until now, our practice has been to add General Material Designation [GMD] statements to the Title field.  We wrap them in brackets, as per Rule 1.1C, and then use the Archivists’ Toolkit “Wrap in Tag” function to wrap the GMD in a <genreform> tag.  We indicate RAD as the rules and source, so you get something like this:

<unittitle>Allan Story fonds <genreform rules=”RAD” source=”RAD”>[textual record]</genreform></unittitle>

This meant the title field had more than the title, but it worked nicely because the EAD exported correctly – the <genreform> tag was nested within the <unittitle> tag, which is permitted in EAD.  It is also the preferred mapping in RAD/EAD Crosswalks.  More importantly, it meant the GMD is displayed alongside the title proper, which is where we wanted it.

We’ve also been adding subject terms to our Resource Records.  The Archivists’ Toolkit allows for several “types” of subject terms and indicates the corresponding MARC fields:

  • Function (657)
  • Genre / Form (655)
  • Geographic Name (651)
  • Occupation (656)
  • Topical Term (650)
  • Uniform Title (630)

Most of our subject terms are topical terms, but we do have a few occupations and genre/form terms. The Toolkit wraps the terms with EAD tags:

<genreform source=”lcsh”>Autobiography</genreform>
<subject source=”lcsh”>Communism–Canada</subject>
<subject source=”lcsh”>Lumber</subject>
<genreform source=”lcsh”>Memoirs and biographies</genreform>
<subject source=”lcsh”>Personal archives</subject>

This practice has meant that our EAD has <genreform> terms inside <unittitle> and <controlaccess>.  That’s fine – both locations are permitted by the EAD Schema.

However, we have been experimenting with publishing Toolkit-generated EAD finding aids in ICA Atom and have identified a couple problems:

  • Atom displays <genreform> terms in the GMD area, but only when they are nested inside <controlaccess>.  <genreform> terms inside <unittitle> are displayed in the Title Proper area.
  • ICA Atom calls anything inside <controlaccess><genreform> a GMD when this may not be the case.
  • Some of our subject terms identified as Genre /Form (655) are not really genre / form statements.  We need to check our procedures for adding subject terms and how they are classified.

We have a few terms to correct, but most of this is prompted by the fact that RAD’s list of GMD terms are much narrower in scope than the MARC Genre / Form (655) field or the EAD <genreform> element.  According to the RAD glossary, a GMD is “a term indicating the broad class of material to which the UNIT BEING DESCRIBED belongs, e.g., graphic material.”   But RAD/EAD crosswalks map GMD statements to the <genreform> element, which is much broader and as such, is handled quite differently by archives management software like the Toolkit and ICA Atom.

According to the EAD tag library, the <genreform>element is comparable to ISAD(G) data element 3.1.5 and MARC field 655, and, when used in conjunction with <extent>, to MARC field 300.  MARC indicates that the 655 field is for:

Terms indicating the genre, form, and/or physical characteristics of the materials being described. A genre term designates the style or technique of the intellectual content of textual materials or, for graphic materials, aspects such as vantage point, intended purpose, characteristics of the creator, publication status, or method of representation. A form term designates historically and functionally specific kinds of materials distinguished by their physical character, the subject of their intellectual content, or the order of information within them. Physical characteristic terms designate historically and functionally specific kinds of materials as distinguished by an examination of their physical character, subject of their intellectual content, or the order of information with them.

The EAD tag library mirrors this language.  It indicates that <genreform> is:

A term that identifies the types of material being described, by naming the style or technique of their intellectual content (genre); order of information or object function (form); and physical characteristics. Examples include: account books, architectural drawings, portraits, short stories, sound recordings, and videotapes.

It seems that GMD is really more of a MARC 245$h “Medium” designator:

Medium designator used in the title statement. In records formulated according to ISBD principles, the medium designator appears in lowercase letters and is enclosed within brackets. It follows the title proper (subfields $a, $n, $p) and precedes the remainder of the title ($b), subsequent titles (in items lacking a collective title), and/or statement(s) of responsibility ($c).

There are some other issues with the RAD/EAD/MARC crosswalking (which were mentioned in our Crosswalk posted here), but it seems like this is more of a limitation of EAD.  There needs to be some kind of <medium> element for encoding the broad categories of material outlined in GMD statements.   GMD statements do not describe the intellectual content, they are really just statements of form.  And in terms of software tools, there needs to be a way to distinguish between GMD statements and legitimate <genreform> information.  We would like to add subject terms that meet the criteria outlined in MARC 655 or <genreform>, but we also want to include GMD statements in accordance with our descriptive standards.  ICA Atom calls anything inside <controlaccess><genreform> a GMD when this may not be the case.

Anyway, in the absence of such an element, we have taken a few steps to standardize our data entry practices and achieve the desired display in ICA Atom:

  • Added General Material Designation [GMD] terms from the Rules for Archival Description (Rule 1.1C) – We added the 10 GMD statements to the Toolkit as as subject terms, indicating RAD as the “source.”  We decided to include the brackets rather than try to add them with XSL.  The Toolkit does not allow for additional Subject Term types to be added, so despite the above misgivings, we entered these terms under the Genre / Form (655) type:

  • We revised our data entry procedures to require the use of one GMD subject term for fonds-level descriptions.  Other genre/form terms can be added (and will be in many cases), but they are optional.
  • We eliminated the practice of including GMD statements in the Title Proper and wrapping them with <genreform> tags.

There are a few other things to figure out:

  • We need to edit the Toolkit stylesheets so <genreform> tags where “RAD” is used as a source attribute will be displayed alongside the Title Proper
  • Right now the default behaviour in ICA Atom is to sort <genreform> terms alphabetically.  Having the brackets in place will hopefully put our GMD terms first, but if not, we will need to customize ICA Atom so <genreform> tags where “RAD” is used as a source attribute are listed as the first GMD.

Basic User Manual

We finally have a basic user manual!  The official Archivists’ Toolkit User Manual is chalk-full of useful information about configuring and maintaining the database.  But we needed something concise for our Reference Desk and non-processing staff, something tailored to our specific needs.  Thus, the development of a 23 page Basic Archivists’ Toolkit User Manual!

Basic Archivists’ Toolkit Manual (PDF)


RAD/Archivists Toolkit/EAD Crosswalk

We’ve been busy working to get our Archivists’ Toolkit set up to produce RAD finding aids.  A key criteria for us was to have AT-exported EAD that more or less conformed to existing RAD/EAD matrices.  We wanted to know that EAD exported from the AT is EAD that can be manipulated into a RAD multi-level finding aid.  The Canadian Committee on Archival Description released this crosswalk in 2003.  Artefactual Systems, the developers of ICA A-to-M software, have an excellent crosswalk between RAD, EAD, ISAD(G), Dublin Core, MARC, and MODS (also available here).

These crosswalks make it easy enough to see how each RAD rule should be mapped to an EAD element.   What we didn’t know was how the various AT data entry fields exported to EAD.  I came across a couple AT import and export maps, which were really helpful, but they didn’t link the EAD elements to RAD.  We still needed to determine what AT data entry field or note area should be used for a given RAD rule.

And to complicate things even further, AT is built using terminology derived from DACS, the US archival description standard.  Many types of information are the same (e.g., a title is a title) as RAD, but many of the note fields in the AT are by default labelled differently than you might find in a truly RAD-based archival information system.  For example, the Physical Characteristics and Technical requirements note exports to the <phystech> element.  But the RAD/EAD matrices show that RAD Rule 1.8B9a for “Physical condition” should be mapped to the <phystech> element.  The Physical Facet note exports to the <physdesc><physfacet> element – the same element prescribed for RAD Rule 1.5C for “Other physical details.”

Once we got a handle on the terminology, we were easily able to edit the labels for the notes and a few other fields where it seemed like it would be helpful.  The note labels can be edited by using the Setup > Notes, Etc. box and the labels for other fields can be edited by using the Setup > Configure Application box and selecting the table that contains the fields you wish to edit.

The next step was to experiment with the EAD export.  I created a dummy resource record where all the fields were filed out.  It had dummy fonds-level notes, dummy file-level descriptions, dummy names and subjects, etc.  The data was just placeholder text so I could easily see in the HTML or PDF reports how the information was being displayed.

By comparing the EAD exported by the AT with the RAD/EAD crosswalks we began to get a sense of where the exports were mapped properly and where there might be potential issues.   Fond-level descriptions go into our OPAC as MARC records so we did the same with the MARCXML exports.  We needed a way to keep track of our findings so I copied the Artefactual Systems crosswalk into Excel, added a few columns for pertinent AT information:

  1. Description of the RAD rule – Usually cobbled from RAD itself
  2. Archivists’ Toolkit Data Entry Form – Indicates what AT data entry field, note area, or method of encoding should be used for a given RAD rule
  3. AT Export EAD – Indicates how the EAD actually looks when it is exported from AT (helpful to see when AT adds attributes, refs, etc. and to see if it deviates from the crosswalk)
  4. AT MARCXML Export – Indicates how the MARCXML actually looks when it is exported from AT (again, helpful to see where the AT deviates from the crosswalk)
  5. MARCXML Export Issues – Indicates any potential issues with using the AT to export MARCXML records in a RAD context
  6. EAD Export Issues – Indicates any potential issues with using the AT to export EAD records in a RAD context

I also added some rows to describe the areas of description that the AT allows for but are not prescribed in RAD (e.g., abstracts, legal status, etc.).

Anyway, I posted back in June about some of our findings as we were getting started with that project and I’m happy to say now that we’ve mostly finished updating Artefactual Systems’ RAD crosswalk with guidelines about how descriptive information should be added to the AT [1]. It is rough and will definitely be updated as we continue moving forward, but it is already helping us understand where we’ve been making mistakes and to develop procedures for data entry that avoids those mistakes.

So, in summary, here’s what we did to configure AT to generate EAD more in line with the Rules for Archival Description (lists of edited field labels are included in the RAD/AT/EAD crosswalk):

  1. Edit note labels to use language found in RAD
  2. Edit field labels in Resource table
  3. Construct a resource record where (a) every note is used at least once, (b) names and subjects are linked, and (c) every field in the finding aid data tab is filled out [2]
  4. Check AT exported EAD against existing RAD/EAD crosswalks and note errors, anomalies, etc.

[1] RAD/AT/EAD Crosswalk with lists of edited field labels and potential import/export errors for RAD users

[2] I can’t upload XML, ZIP, or TXT files but if anyone wants the dummy EAD record, I can send it by email.

Our next step is to customize the AT XSL sheets so the data is grouped according to RAD.  For example, AT groups the custodial history note under an “Administrative Information” section but we want to group it under an “Archival Description” section.  We’re making good progress with that, and will post the new stylesheets when they are finished.

RAD Access Points

According to the RAD-EAD Matrix developed by the Canada Council on Archives, the proper way to encode the provenance access point is through the <origination> element:

famname|name role=”provenance”

The Archivists’ Toolkit (AT) creates an <origination> tag from the name link, specifically the name(s) where the link function is “creator”:

Name Link to a Resource Record in AT

Name Link to a Resource Record in AT

So, all I would need to do to create a RAD access point would be to use the “role” attribute in the name element and indicate “provenance.”  The AT has a long list of roles, but “provenance” is not one of them.  And the program does not allow the list that these terms are pulled from to be edited.  You can pull the “Name link creator / subject role” list from the Lookup Lists, but you can’t add items or edit items:

Name link creator / subject role

Name link creator / subject role list

I’ve posted a query to ATUG-L, and will post an update if we find a solution.

Some observations on AT for RAD finding aids

A few more general observations about the AT based on our migration project:

  • AT does not allow for multiple titles, extent notes, and other kinds of information. It would be really nice if we could, for example, add two extent notes to one file-level description.  We have find clever ways to deal with files that have 2 cm of textual records and 3 photographs.  The same goes for dates and titles.  Brigham Young University has created a couple plug-ins that provide this kind of functionality, but when I installed them, the new data entry fields covered over the existing extent and date information.  Apparently ArchivesSpace will be providing something to this effect.
  • AT is not suitable for RAD item-level description. It doesn’t have a space for statements of responsibility, edition statements, publisher/manufacturer information, and the publisher’s series area.   Some of this information could be merged into another data field, or provided in a general note, but it wouldn’t export proper EAD.
  • A lot of EAD fields don’t export to the proper MARCXML field. Many of the most important fields crosswalk correctly, but the GMD, parallel title, other title information, statement of responsibility, material specific details, and other key information export to an incorrect MARC field.  This may not matter – we’ve apparently been able to import an AT generated MARCXML file with no problem, but we’re only doing fonds-level MARC records so the more complicated data just isn’t there.

We’re working on expanding this RAD crosswalk to include AT data entry fields and the actual EAD and MARCXML exported from the program.  So far, it’s been a big help identifying places where the program is not suitable for RAD finding aids (e.g., statements of responsibility).  But we’re not worried about what we’ve found so far because most of the issues deal with file-level description or obscure data.  I’ll post the crosswalk when it’s finished.

Box-Folder Numbers Ending in Zero

Still working away at our migration from MS Access to Archivists’ Toolkit.  We successfully added about 30 legacy EAD finding aids before we realized box-folder numbers ending in zero were not importing properly. For example, a box-folder 1.10 would import as 1.1

I vaguely remember this coming up on the ATUG-L list ( or the AT forums ( but I couldn’t find any solution.  I posted a query and one list member responded saying she had been adding periods after the zero.  That works, but it’s visually unattractive.

The next day, I just tried adding a space after the zero.  It worked! Simply typing a space after the zero seems to trick the database into leaving the zero.  We’ll still have to go back and manually correct the box-folder numbers that did not import properly, but it’s a nice tip to work into our data entry manual.