Information Logo

Home

Courses/Workshops
Taxonomies
Indexing
Web Site Indexing
Information Architecture
Translations

Presentations
Articles
Books
 The Accidental Taxonomist
 Indexing Specialties: Web Sites

About Heather

 

Hedden Information Management

Taxonomies, Thesauri,
and Controlled Vocabularies


Skills/Services
Definitions
Portfolio
Training Services
Resources


Skills and Services Offered

  • Taxonomy or controlled voacbulary design, creation and editing
  • Thesaurus construction
  • Ontology development
  • Metadata design
  • Synonom creation (synonomy)
  • Taxonomy web user interface design
  • Faceted navigation/search design
  • Tem concept "mapping" (metadata "crosswalks)
  • Machine-aided indexing term rule writing and term weighting
  • Tagging guidelines and policy writing
  • Taxonomy/thesaurus software testing and evaluation
  • Training in taxonomy development


Definitions of Taxonomies

Controlled Vocabularies
A controlled vocabulary is a restricted list of words or terms used for labeling, indexing or categorizing. It is controlled because only terms from the list may be used for the subject area covered by the controlled vocabulary. It is also controlled because, if it used by more than one person, there is control over who adds terms to the list, when, and how to the list. The list could grow, but only under defined policies. Most controlled vocabularies also have some form of cross-references pointing from one or more “non-preferred” terms to the designated “preferred” term. Only if a controlled vocabulary is very small and easily browsed, such as on a single page, might such synonyms be excluded.

Thesauri
A literature retrieval thesaurus is a more structured kind of controlled vocabulary. Itprovides information about each term and its relationships to other terms within the same thesaurus. In addition to clearly specifying which terms can be used as synonyms (called “used from”), a thesaurus also indicates which terms are more specific (narrower terms), which are broader, and which are related terms. National and international standards have been developed to provide guidance on creating such thesauri, including ISO 2788, ISO 5964, ANSI/NISO Z39.19. The standards explain in great detail the types of relationships that fall into three types: hierarchical (Broader Term/Narrower Term), associative (Related Term), and equivalence (Use/Used from).

A literature retrieval thesaurus, like a dictionary-thesaurus (such as Roget's) lists similar terms at each controlled vocabulary term entry. The difference is that in a dictionary-thesaurus all the associated terms might be used in place of the term entry depending upon the specific context, which the user needs to consider in each case. But in certain contexts some of these terms are not appropriate. The literature retrieval thesaurus, on the other hand, is designed to be used for all contexts, regardless of a specific term usage or document. The synonyms or near synonyms must therefore be suitably equivalent in all circumstances.

Taxonomies
The word taxonomy means the science of classifying things, and traditionally the classification of plants and animals, as in the Linnaean classification system. It has become a popular term now for any hierarchical classification or categorization system. Thus, we no longer speak of “taxonomy” as a science but rather “a taxonomy” (plural: taxonomies) as a kind of controlled vocabulary that has a hierarchy (broader term/narrower terms), but not necessarily the related-term relationships and other requirements of a standard thesaurus. Unlike a thesaurus, where a given term may or may not have broader or narrower terms, in a taxonomy all terms belong to a single, large hierarchy that encompasses all concepts of a certain class, category, or aspect. The structure is sometimes referred to as a “tree” and the terms as “nodes” in the tree.

Recently the term taxonomy has also become popular as the term for any kind of controlled vocabulary, whether a structured thesaurus, a simple synonym ring, or anything in-between. This is especially the case in the corporate world, where one might speak of “enterprise taxonomies.” Thus, these taxonomies may or may not have the hierarchical structure that is associated with traditional taxonomies. It’s simpler to have a one-word term for the concept of controlled vocabularies, especially when speaking of the people involved, such as “taxonomists” instead of “controlled vocabulary creators/editors.”

Ontologies
An ontology is set of concepts with attributes and relationships between the various concepts that contain various meanings, all to define a domain of knowldege, and is expressed in a format that is machine-readable. Certain applications of ontologies, as used in artificial intelligence or biomedical informatics, may define a domain of knowledge through terms and relationships as the end goal, rather than being used for any tagging. In the area of taxonomies and information science, however, an ontology can be seen as a more complex type of thesaurus, in which nstead of having simply "related term" relationships, there are various customized relationship pairs that contain specific meaning, such as "owns" and a reciprocal "is owned by."

Past Taxonomy Projects

Viziant Corporation

  • Developed base taxonomies (totaling 1,464 node terms) in Geographies, Actions/Events, Occupations & Roles, Cultures & Languages, and Facilities & Infrastructure; and vertical market taxonomies (totaling 1,643 node terms) in Business & Finance, Politics & Government, Military & Defense, Terrorism, and Information Security; along with multiple synonyms/cross-references for each node term.
  • Selected and edited corpus of sample "training" documents associated with each taxonomy term for automated indexing.
  • Tested and made recommendations for improving the taxonomy maintenance user interface. (See interface screenshot)

Earley and Associates

  • Took interview notes and conducted content repository analysis, and created a "strawman" taxonomy for the web site of a biomedical research laboratory.
  • Created a "strawman" taxonomy of travel-related terms for the life goals section of a corporate intranet.
  • Conducted term extraction and content analysis of a web site of a major manufacturer of mobile communications products.
  • Conducted term extraction and content analysis of web sites of insurance companies.
  • Conducted term extraction and content analysis of an intranet of a large manufacturing company.

50Lessons (Truman Company)

  • Redesigned and integrated legacy web site taxonomies for content searching of the “50 Lessons” database of executive interview videos.
  • Specified metadata for search and retrieval of video lessons and of speakers
  • Wrote tagging guidelines

Bain & Company

  • Reviewed the multi-facted hierarchical taxonomy for usability and how it conforms with best pracctices
  • Analyzed retrieval statistics for taxonomy terms
  • Wrote recommendations for how to improve the taxonomy.

Factiva, (Dow Jones)

  • Mapped thousands of logged search phrases to the controlled vocabulary of a Web commercial products and services directory (yellow pages).

Banana Pages

  • Reviewed and edited hierarchical taxonomy for consumer products and services for Web-based yellow pages direcory.
  • Created more narrower terms to expand the depth of select subject areas.

PlaceLinks

  • Edited controlled vocabulary of products and services for web-based yellow pages.

Tiny Engine Inc.

  • Developed of news classification taxonomy and created a total of 1142 terms in a four-level taxonomy.
Gale: Kids InfoBits
  • Developed the three-level hierarchy of over 2500 topics and names. (See taxonomy excerpt linked through second level in all categories and third level in Animals only)

Content Management Professionals Resource Library

  • Developed two-level taxonomy of content management topics to organize rsource documents on the memberships organization web site.

Open Directory Project (dmoz): Locality category of Carlisle Massachusetts

  • Expanded from a listing of only 13 web sites and one subcategory to over 90 web sites and eight subcategories

German Saturday School - Boston's library

  • Audiocassette collection cataloging


Training in Taxonomy Creation

Heather Hedden teaches an online workshop "Taxonomies and Controlled Vocabularies" through Simmons College Graduate School of Library and Information Science Continuing Education Program. Course information.

Heather Hedden also offers a full-day workshop on creating taxonomies and controlled vocabularies. It is offered as an onsite workshop through Simmons College Graduate School of Library and Information Science Continuing Education Program and through professional associations, such as chapters of the American Society for Indexing, in other states and regions. Workshop description

Further reading on Taxonomies and Thesauri

Taxonomies & Controlled Vocabularies Special Interest Group

Taxonomy Community of Practice Wikispace

Taxonomy Warehouse
Directory of taxonomies and controlled vocabularies, along with other resources from Dow Jones.

Thesaurus principles and practice
Willpower Information

Managing taxonomies strategically
Montague Institute article

Content Classification
EncycloZine article

There is also a discussion group dedicated to taxonomies:
Taxonomy Community of Practice Yahoo Group