Indexes and Faceted Taxonomies

I recently completed a project of creating an index for a book. I had done quite a bit of freelance back-of-the-book indexing 2005 – 2013 but had not indexed a book in over four years. Since I also do taxonomy work, whenever I do indexing, I draw comparison between index creation and taxonomy creation. This time I drew some new comparisons.

It is back-of-the-book indexing, rather than the kind of indexing of content items that is done with a taxonomy, that has some similarities with taxonomy creation. That is because they both involve creating taxonomy terms, naming them, coming up with variant names, and relating them to each other. I have written a detailed article “Creating Indexes and Thesauri: Similarities and Differences” published in the journal The Indexer.

During my most recent index project, I thought of comparisons not with thesauri, but with faceted taxonomies. Faceted taxonomies are increasingly common form of taxonomies or controlled vocabularies. Different aspects/dimension/refinements/filter types of a content item and of a query to find it are considered in creating a set of facets from which terms are used in combination. Facets can be for each of such things as named persons, places, person types, events, activities, things, etc. The set of facets, ideally around 4-7, is customized to the set of content. Each facet may contain just a few or hundreds of terms.

An index, of course, is quite unlike a faceted taxonomy, because a single index includes all kinds of terms: named persons, places, person types, events, activities, things, etc. Some books, however, have separate Name and Subject indexes, so that could be like having two facets. Whether it’s a single index or a set of two, however, the user is only looking up one term at a time, unlike a faceted taxonomy, which allows the user to select multiple terms from multiple facets and combine them to limit the search results.

What is significant is that a good index should include all the aspects/dimensions/types of terms. Thus, the intellectual activity of creating a good back-of-the-book index is similar to creating a good faceted taxonomy, because a full set of aspects needs to be considered and created.

The book I recently indexed was a biography of a jazz saxophonist. As I indexed, focusing on the content at the level of a paragraph or a couple of consecutive paragraphs, I found myself making sure I created index terms that covered the different aspects or term types. In this case they tended to be: named persons, named places, person types (different kinds of musicians, music producers, etc.), place types, activities, music groups, music genres, record label companies, names of songs or albums, and music-related topics.

Of course, it is rare that a single paragraph would have more than a couple of distinct index term concepts (not counting synonyms, what in indexes is called “double posts”); a full set of facets is not expected. Rather, though, as I was indexing, after I selected an initial, obvious index term for the paragraph(s), I would then pause to think if there was a different aspect that could also apply as an index term from among potential facet-like categories, as listed above. I felt that being “facet aware” I was able to create a very comprehensive index.

The resulting index is simply an alphabetical arrangement of terms, with the larger concepts further broken down with subentries. It does not appear faceted. However, all the potential facets are included. The variants or synonyms, as “double posts” in the index, help guide different users who think of different words for the same thing to find the text passage of the desired topic. Additionally, the terms of the different aspects, like facets, help guide different users in another way, by serving those who are thinking about different aspects of the book’s content and narrative.