Taxonomy Design Research

I recently wrote an article “Taxonomies: Connecting Users to Content” for an online publication, Boxes and Arrows, on information architecture (IA) and user experience (UX). As I was working with the editors on the section of gathering information from users, I realized that IA and UX have very formalized researcher roles. There is a job title for “UX Researcher” with career guides and resources on what skills are needed, and many more jobs on job board sites posted for “UX researcher” than for “taxonomist.” Meanwhile, there is no such job as a “taxonomy researcher.” But designing and developing taxonomies, which are often part of information architecture or UX, does require research, including user research.

Taxonomy research is not as formalized and does not involve standard tools, as UX research does, but it is still important. There is not nearly as much published about taxonomy research as there is for UX research. However, certain research practices, I have found, are common in the taxonomy consulting industry. It’s a matter of best practices. Even when taxonomies are designed internally and not with an external taxonomy consultant’s assistance, research is still part of the process. The type of research may vary based on the background and experience of the person leading the effort. Taxonomy design research includes:

Interviewing sample users and other stakeholders
Gathering input from brainstorming sessions
Analyzing content to be tagged
Analyzing existing vocabularies of all kinds
Analyzing any search log reports
Taxonomy testing

While UX research is a form of user research, taxonomy research involves both user research and content research (or content analysis), because a taxonomy needs to consider both user needs and content suitability.

Interviewing stakeholders

The primary method of gaining user input on a taxonomy is through interviews and questionnaires, ideally both in combination, where a conversation follows up on a list of questions sent to the person being interviewed. It’s important to ask different kinds of questions tailored to the different kinds of users, questions dealing with tagging vs. questions dealing with retrieval of content. The input gathered from users in these interviews and questionnaires can be used to better design and the taxonomy and its user interface, to obtain use cases to later test the taxonomy, to identify possible facets for a faceted taxonomy, and also to collect some concepts for the taxonomy.

Brainstorming sessions

Another method of obtaining input from users is through a brainstorming session. This method is particularly useful for internal enterprise taxonomies. Representative users from different departments can contribute their ideas by suggesting sample terms, which are written down on a white board, flipchart, or sticky notes, and then working with a facilitator, the brainstorming group can remove outliers, bring together synonyms and similar terms, and come up with categories or facets to group the terms. PoolParty is the only taxonomy management software that has an integrated brainstorming module called CardSorting.

Analyzing content

After determining the scope of content inclusion, content analysis should be performed on a representative sample of content of each of the different types and subject areas of content that will be tagged and retrieved, to identify topics and named entities relevant to the content. This form of content analysis is similar to indexing without a controlled vocabulary. The taxonomist assumes the role of an indexer or someone tagging the content and notes what index terms or tags would best describe the content.

Automatic term extraction involves using text analytics software (which may be incorporated into taxonomy management software, such as in PoolParty) to extract candidate taxonomy terms based on their frequency and relevancy within a body (corpus) of text content. The suggested terms need to be analyzed for the context of their usage before determining whether they should be added to the taxonomy.

Analyzing existing vocabularies

If an organization already has some controlled vocabularies (taxonomies, thesauri, term lists, terminologies, glossaries, etc.), whether currently in use or not, these should be analyzed as sources of terms for incorporation into the new taxonomy. Assuming the project is to create a new taxonomy, any existing controlled vocabularies may have been for a different purpose, so only some of the terms would be relevant. Glossaries tend to have too many detailed terms that are not needed for information retrieval, but these and any other vocabularies are good sources for synonyms/alternative labels.

Analyzing any search log reports

When creating or editing a taxonomy, it’s always useful to look at search logs, which indicate what users have been typing into the search box. Search log reports can be sorted by search string frequency, so that the most frequently used search strings are considered for inclusion into the taxonomy. The search strings should be edited to confirm with taxonomy style and policy, but the exact search strings should be included as synonyms/alternative labels to support future searches.

Taxonomy testing

Near the completion of a taxonomy project, there should be some activity of taxonomy testing. Taxonomy use testing should test a taxonomy’s suitability for tagging content by manually test-tagging sample documents and determining if the desired terms are available in the taxonomy. Taxonomy use testing should also test the retrieval capabilities of the taxonomy. This is done by attempting to retrieve pre-identified documents with searches conducted by sample users with the search terms of their choice.

Other test on taxonomies, such as card sorting and A-B testing, which are also used in UX navigation testing, may be used in taxonomy development to test the preferences of the top two levels of a hierarchical taxonomy, but such tests are less suitable for multiple-level hierarchical taxonomies or for faceted taxonomies. More details are in my previous blog post on Testing Taxonomies.

Conclusions

Creating a taxonomy involves many research-related tasks, which can take up as much time or more than actually creating terms in a taxonomy. While there is a creative aspect to developing a taxonomy, a taxonomy also has to be based on research and analysis, with the emphasis on analysis. The research is more qualitative than quantitative, though.

Hedden Information Management

Making information findable