Taxonomies for Content Components

The primary purpose of taxonomies is to support consistent topical tagging (indexing) of content and full and accurate content retrieval based on the tagged taxonomy concepts that the end-user selects. The unit of content that is tagged makes a difference in the retrieval results and user experience. Users want to find specific content, such as a paragraph, a captioned image, a timestamp section within an audio or video file. This is not always possible. The traditional method of tagging is to tag the entire file, document, or web page, even if the specific topic with the desired information is only part of the larger file, such as a few sentences within a web page or document of multiple paragraphs. The user then spends time (or wastes time) trying to find the desired information in the larger file.

Content components

Fortunately, there are methods to tag and retrieve content at smaller units, such as a text section identified with a heading, within a longer document. These methods depend on having “structured” content, where sections are marked off using a markup language, most commonly Extensible Markup Language (XML). As XML is rather generic, there have emerged standards specifically for XML-based component-based content management, including DITA (Darwin Information Typing Architecture).

Structuring content was not originally developed for the purpose of detailed topical tagging/indexing and retrieval, though, but rather for the purpose of creating (authoring) and publishing content, especially to the web, more efficiently. Originally, the focus of structured content was on marking up the document style and supporting keyword tags for the entire document. The first content management systems (CMSs) were developed shortly after the web in the 1990s to facilitate the publishing of web pages, although later a distinction emerged be web content management systems and enterprise content management systems.

By the early 2000s, component content management systems (CCMSs) emerged, whereby content is managed in units (components) smaller and more specific than an entire document. CCMSs enable content publishing to be more modular and flexible, supporting content reuse, and making it easier to update content, by updating only the relevant components, instead of the entire document. CCMSs are especially used for creating technical documentation, but they are not limited to that use. Examples of CCMSs include Adobe FrameMaker, Documentum, Hereto,, Quark, Paligo, Sanity, and Tridion Docs. While more precise tagging was not the original goal of CCMSs, it is a beneficial outcome.

Taxonomies and component content management

CCMSs, along with all CMSs, have come to support taxonomies and tagging better over the years. This includes both support for more taxonomy features, such as hierarchies and synonym (alternative labels), and support for importing and exporting taxonomies in standard interoperable formats. With respect to CCMSs, taxonomies can be built out to a greater level of detail, with concepts specific to the component topics of CCMS. However, whoever is creating the taxonomy should remember not to create concepts that are so specific that a concept is applicable to only a single component topic. A single taxonomy concept should retrieve multiple results.

CCMSs, along with all CMSs, can also connect to or integrate with taxonomies managed in dedicated taxonomy management systems, such as PoolParty. Since organizations tend to have multiple CMSs, each for different kinds of content and purposes, they are likely to end up creating multiple, separate (siloed) taxonomies with similar or overlapping concepts. Therefore, the best strategy for enterprise taxonomy management is to manage taxonomies centrally, either as a single master taxonomy or with multiple taxonomies linked together in dedicated taxonomy management software, which can connect to CMSs with APIs (application programming interfaces) to push the taxonomy out to the CMSs, including CCMSs. Additionally, prebuilt integrations of taxonomy management systems and CCMSs, such as PoolParty and Tridion Docs, are becoming more common.

There is also a growing interest in taxonomies at conferences dealing with component content management. Last October I attended the LavaCon conference for content strategy for the first time, where my pre-conference workshop on taxonomies was well attended. Two weeks ago, I participated in the ConVEx conference, where there is more focus on component content management than at LavaCon. (ConVEx was formerly the DITA North America conference.) In contrast to LavaCon’s two presentations on taxonomies, ConVEx had a track with the “taxonomy” theme and five presentations focused on taxonomies and another three presentations with topics related to taxonomies.

Component content management enables more targeted topic tagging and opens up more possibilities for rich taxonomies. Thus, as a taxonomist, I look forward to learning more about CCMSs and how they taxonomies can best be applied in these systems.

Leave a Reply