physfacet notes become physdesc notes during import

We’re currently in the middle of a project to convert several legacy MS Access databases into EAD so the data can be imported into the Toolkit.  First, a historical side note:

Our old procedures for creating finding aids looked something like:

  • Copy an existing MS Access database used to create finding aids (these are simple databases with two tables: one for series-level descriptions and one for file-level descriptions)
  • Change available series titles (these are pulled from a drop-down list)
  • Enter data
  • Create queries to pull all file-level descriptions for each series (these queries could be copied from database to database and edited as needed)
  • Create reports that merge the query results into basic EAD tags (these reports could be copied from database to database and edited as needed)
  • Save each report as a text file
  • Merge each report into a fonds-level EAD template (the fonds-level description must be prepared and pasted into the template before merging the Access reports)
  • Convert EAD into HTML using XSL transformation in Oxygen or XMetal
  • Mount HTML on Archives website

This worked well enough for a number of years, but you can see all the problems (duplicate data, onerous time commitment, inconsistent encoding practices, etc.). We’ve already migrated all our legacy EAD into the Toolkit, but there were a handful of large databases that never had EAD exported.   The data just resides in the MS Access database.

Migrating the legacy EAD enlightened us to the many inconsistencies in encoding practices, including:

  • Various “container types” and “extent types”
  • Dates in the title field
  • Incorrect encoding of subject headings

Rather than try to dust off the old procedures (which caused numerous problems when the EAD was imported into the Toolkit), we decided to hire a casual data technician to clean the data the right way.  It has been going really well, but the MS database for the imX Communications fonds (with 10,000+ records) threw a few curveballs.

The database has numerous fields for “Extent” and “Other Physical Details” information (mostly used for film and other a/v material).  We thought: since we have someone who can properly parse this data, why not encode the different fields in proper EAD elements?  By this logic, information that falls under the RAD “Other Physical Details” notes should be wrapped in <physfacet> tags (see this post about crosswalks for more info) .

Unfortunately, it’s not so simple.  The import maps for the Archivists’ Toolkit do not allow for this level of specificity.  Data wrapped in <physfacet> tags are imported, but they are stored as Physical Description Notes, not “Other Physical Details” notes.  This means they could be imported as <physfacet> but exported as <physdesc>.

We are now looking into whether it would work to import these “Other Physical Details” notes with some kind of prefix that could be used in a post-import MySQL query that would change the notesEtcTypeID and chop off the prefix.  Stay tuned…


Be careful copying and pasting subject headings from LCSH

We’ve been making great progress with backend MySQL queries and manual fixes to our extent and container types (more on that later).  We also have developed some procedures and a checklist for reviewing  the finding aids currently in the Toolkit.  We’ll do a comprehensive assessment of each finding aid (at the fonds-level, at least) before they are exported for publishing online.

In preparing those procedures, I noticed some issues with our Subject Headings List (which is being added to by several database users). We’re typically checking Library of Congress Subject Headings (LCSH) and copying and pasting terms as needed.  This has worked reasonably well, but led to an unintentional problem: some terms get entered into the database with unnecessary spaces:

Copying and Pasting terms from LCSH can unintentionally add spaces to the subject heading

It is easy enough to diagnose: the terms do not list in proper alphabetical order when there are spaces:

Subject headings don't sort properly when they are prefixed with spaces

But it is still something to watch out for when adding terms via LCSH.

Box-Folder Numbers Ending in Zero

Still working away at our migration from MS Access to Archivists’ Toolkit.  We successfully added about 30 legacy EAD finding aids before we realized box-folder numbers ending in zero were not importing properly. For example, a box-folder 1.10 would import as 1.1

I vaguely remember this coming up on the ATUG-L list ( or the AT forums ( but I couldn’t find any solution.  I posted a query and one list member responded saying she had been adding periods after the zero.  That works, but it’s visually unattractive.

The next day, I just tried adding a space after the zero.  It worked! Simply typing a space after the zero seems to trick the database into leaving the zero.  We’ll still have to go back and manually correct the box-folder numbers that did not import properly, but it’s a nice tip to work into our data entry manual.