Testing Taxonomies

As mentioned in my previous blogpost, “Evaluating Taxonomies,” taxonomy evaluation and taxonomy testing differ. While the evaluation of a taxonomy by a taxonomist is needed when a taxonomy is created by non-taxonomists (such as by subject-matter experts instead), testing of a taxonomy, on the other hand, is recommended in all cases, no matter who created the taxonomy. Following is an overview of the different kinds of testing that can or should be performed on a taxonomy prior to its implementation.


Card-sorting is probably the best known kind of testing, especially now that the prevalence of online card-sorting tools facilitates set-up and enables remote participation. It is not necessarily the best kind of testing for all situations, though. Card-sorting serves to test categorization schemes, so while it is suited for hierarchical taxonomies, it is not so appropriate for faceted taxonomies, especially with regard to how the facets are to interact with each other. It is possible, though, to card-sort test an individual facet, if that facet comprises an internal hierarchy of terms.

There are two kinds of card-sort tests, open and closed. In open card-sorts, the testers group concepts/topics together and then assign a broader category of their own; whereas in closed card sorts, the broad categories are already designated, and the testers merely categorize the specific concepts/topics within those pre-determined categories. Open card-sorting, if chosen, is therefore done earlier in the taxonomy design process, when broad categories are uncertain. A single taxonomy project may have either or both kinds of card-sorting depending on where the greatest need is for this additional input of information. Testers could be test end-users or they could be stakeholders, depending on the needs of the test.

Card-sorting is actually not really a kind of taxonomy testing but rather a form of taxonomy idea testing. Card-sorting is not performed on a completed taxonomy to test it but rather to test ideas of categories/hierarchies which later will be combined to create the taxonomy. Therefore, card-sorting is not an alternative to the other kinds of testing described below, which may subsequently be done.

Use Testing

Use-testing or use-case-testing is a necessary step after a draft taxonomy is built or nearly completed but before it is finally implemented, allowing for revisions to be made based on the test results. It is at this point that the taxonomy is put to the test to see if it will perform as hoped in search/retrieval and (if applicable) for manual tagging. This type of testing might also be called taxonomy validation.

A cross-section of different kinds of test users should be recruited to prepare several typical use cases and perhaps one especially challenging use case of content search scenarios. The user is then presented with the taxonomy (which can be in any format at this stage, whether on paper, as an Excel file or as test web page) and asked to browse the taxonomy to look for terms under which the content for the use search scenario might be found. The user performs the test, either browsing in the tester’s physical presence or via screensharing with verbal narration of what the user is doing and why.  The test administrator takes notes regarding any problems in finding taxonomy terms for the use case. These findability problems should be considered as requirements for additional terms, additional nonpreferred (variant) terms to point to existing terms, or perhaps more polyhierarchy or associative relationships to help guide the user to find the desired concepts.

If the taxonomy is to be used for manual tagging or indexing, then a second, different set of use testing is needed, whereby users who perform this function should test the taxonomy for indexing of typical and challenging documents that they tend to deal with. Rather than coming up with use “cases”, the test-user-indexers merely need to come up with actual documents. The documents should represent a good cross-section of the various document types indexed. This exercise is even more straightforward than the user testing for finding content, so it could even be performed offline without the test administrator present, as long as the test-user-indexer takes good notes.

A-B Testing

In A-B Testing, the test-users are presented with two different possible scenarios and asked which they prefer. When comparing two different taxonomies or parts of taxonomies, only one or two variations should exist between the two that are compared to make the test clear-cut. You may set up a series of A-B test pairs to compare multiple variations. This kind of test is comparable to what an optometrist does for vision: “Which is better, A or B?”  Since only one or two differences should be compared and tested at a time, A-B testing is most suitable to compare proposed top-level categories, rather than getting into the depths of a taxonomy, where it is not practical to conduct a detailed term-by-term comparison. Thus, A-B testing focuses on high-level structural design, navigation and browsing, and not the effectiveness of finding and retrieving content.

A-B Testing can be done at any time in the taxonomy design and build process. It is also very useful when considering a taxonomy redesign for comparing the existing taxonomy (A) to a proposed change (B). A-B Testing is usually done by presenting the test users with graphical or interactive web page mock-ups. I’ve created the B image to an existing online A image, by taking a screenshot of A and then edit it in Microsoft’s Paint accessory. Although each individual A-B test is simple, deciding what to compare and how many comparison tests to make needs to be determined, since each test takes time and resources.


Taxonomies should be tested, but it’s not true that any test is good. Different tests are for different purposes and fit into different stages of the taxonomy process. An inappropriate test or inappropriately timed test can be a waste of time and money.

1 thought on “Testing Taxonomies

  1. Thanks Heather, for a great listing. I'm usually looking at the results of automatic classication and these methods will really help get user feedback!

Comments are closed.