|

Home
Courses/Workshops
Taxonomies
Indexing
Web Site Indexing
Information Architecture
Translations
Presentations
Articles
Books
The
Accidental Taxonomist
Indexing
Specialties: Web Sites
About Heather
|
Hedden
Information Management
Taxonomies, Thesauri,
and Controlled Vocabularies
Skills/Services
Definitions
Portfolio
Training Services
Resources
Skills and Services Offered
- Taxonomy or controlled voacbulary design, creation and editing
- Thesaurus construction
- Ontology development
- Metadata design
- Synonom creation (synonomy)
- Taxonomy web user interface design
- Faceted navigation/search design
- Tem concept "mapping" (metadata "crosswalks)
- Machine-aided indexing term rule writing and term weighting
- Tagging guidelines and policy writing
- Taxonomy/thesaurus software testing and evaluation
- Training in taxonomy development
Definitions of Taxonomies
Controlled Vocabularies
A controlled vocabulary is a restricted list of words or terms used for
labeling, indexing or categorizing. It is controlled because only terms
from the list may be used for the subject area covered by the controlled
vocabulary. It is also controlled because, if it used by more than one
person, there is control over who adds terms to the list, when, and how
to the list. The list could grow, but only under defined policies. Most
controlled vocabularies also have some form of cross-references pointing
from one or more “non-preferred” terms to the designated “preferred”
term. Only if a controlled vocabulary is very small and easily browsed,
such as on a single page, might such synonyms be excluded.
Thesauri
A literature retrieval thesaurus is a more structured kind of controlled
vocabulary. Itprovides information about each term and its relationships
to other terms within the same thesaurus. In addition to clearly specifying
which terms can be used as synonyms (called “used from”),
a thesaurus also indicates which terms are more specific (narrower terms),
which are broader, and which are related terms. National and international
standards have been developed to provide guidance on creating such thesauri,
including ISO 2788, ISO 5964, ANSI/NISO Z39.19. The standards explain
in great detail the types of relationships that fall into three types:
hierarchical (Broader Term/Narrower Term), associative (Related Term),
and equivalence (Use/Used from).
A literature retrieval thesaurus, like a dictionary-thesaurus (such as
Roget's) lists similar terms at each controlled vocabulary term entry.
The difference is that in a dictionary-thesaurus all the associated terms
might be used in place of the term entry depending upon the specific context,
which the user needs to consider in each case. But in certain contexts
some of these terms are not appropriate. The literature retrieval thesaurus,
on the other hand, is designed to be used for all contexts, regardless
of a specific term usage or document. The synonyms or near synonyms must
therefore be suitably equivalent in all circumstances.
Taxonomies
The word taxonomy means the science of classifying things, and traditionally
the classification of plants and animals, as in the Linnaean classification
system. It has become a popular term now for any hierarchical classification
or categorization system. Thus, we no longer speak of “taxonomy”
as a science but rather “a taxonomy” (plural: taxonomies)
as a kind of controlled vocabulary that has a hierarchy (broader term/narrower
terms), but not necessarily the related-term relationships and other requirements
of a standard thesaurus. Unlike a thesaurus, where a given term may or
may not have broader or narrower terms, in a taxonomy all terms belong
to a single, large hierarchy that encompasses all concepts of a certain
class, category, or aspect. The structure is sometimes referred to as
a “tree” and the terms as “nodes” in the tree.
Recently the term taxonomy has also become popular as the term for any
kind of controlled vocabulary, whether a structured thesaurus, a simple
synonym ring, or anything in-between. This is especially the case in the
corporate world, where one might speak of “enterprise taxonomies.”
Thus, these taxonomies may or may not have the hierarchical structure
that is associated with traditional taxonomies. It’s simpler to
have a one-word term for the concept of controlled vocabularies, especially
when speaking of the people involved, such as “taxonomists”
instead of “controlled vocabulary creators/editors.”
Ontologies
An ontology is set of concepts with attributes and relationships between
the various concepts that contain various meanings, all to define a domain
of knowldege, and is expressed in a format that is machine-readable. Certain
applications of ontologies, as used in artificial intelligence or biomedical
informatics, may define a domain of knowledge through terms and relationships
as the end goal, rather than being used for any tagging. In the area of
taxonomies and information science, however, an ontology can be seen as
a more complex type of thesaurus, in which nstead of having simply "related
term" relationships, there are various customized relationship pairs
that contain specific meaning, such as "owns" and a reciprocal
"is owned by."
Past Taxonomy Projects
Viziant Corporation
- Developed base taxonomies (totaling 1,464 node terms) in Geographies,
Actions/Events, Occupations & Roles, Cultures & Languages, and
Facilities & Infrastructure; and vertical market taxonomies (totaling
1,643 node terms) in Business & Finance, Politics & Government,
Military & Defense, Terrorism, and Information Security; along with
multiple synonyms/cross-references for each node term.
- Selected and edited corpus of sample "training" documents
associated with each taxonomy term for automated indexing.
- Tested and made recommendations for improving the taxonomy maintenance
user interface. (See interface screenshot)
Earley and Associates
- Took interview notes and conducted content repository
analysis, and created a "strawman" taxonomy for the web site
of a biomedical research laboratory.
- Created a "strawman" taxonomy of travel-related
terms for the life goals section of a corporate intranet.
- Conducted term extraction and content analysis
of a web site of a major manufacturer of mobile communications products.
- Conducted term extraction and content analysis of web sites of insurance
companies.
- Conducted term extraction and content analysis
of an intranet of a large manufacturing company.
50Lessons
(Truman Company)
- Redesigned and integrated legacy web site taxonomies for content searching
of the “50 Lessons” database of executive interview videos.
- Specified metadata for search and retrieval of video lessons and of
speakers
- Wrote tagging guidelines
Bain & Company
- Reviewed the multi-facted hierarchical taxonomy for usability and
how it conforms with best pracctices
- Analyzed retrieval statistics for taxonomy terms
- Wrote recommendations for how to improve the taxonomy.
Factiva,
(Dow Jones)
- Mapped thousands of logged search phrases to the controlled vocabulary
of a Web commercial products and services directory (yellow pages).
Banana Pages
- Reviewed and edited hierarchical taxonomy for consumer products and
services for Web-based yellow pages direcory.
- Created more narrower terms to expand the depth of select subject
areas.
PlaceLinks
- Edited controlled vocabulary of products and services for web-based
yellow pages.
Tiny Engine Inc.
- Developed of news classification taxonomy and created a total of 1142
terms in a four-level taxonomy.
Gale: Kids
InfoBits
- Developed the three-level hierarchy of over 2500 topics and names.
(See taxonomy excerpt linked through
second level in all categories and third level in Animals only)
Content
Management Professionals Resource Library
- Developed two-level taxonomy of content management topics to organize
rsource documents on the memberships organization web site.
Open
Directory Project
(dmoz): Locality category of Carlisle Massachusetts
- Expanded from a listing of only 13 web sites and one subcategory
to over 90 web sites and eight subcategories
German
Saturday School - Boston's library
- Audiocassette collection cataloging
Training in Taxonomy Creation
Heather Hedden teaches an online workshop "Taxonomies and Controlled
Vocabularies" through Simmons College Graduate School of Library
and Information Science Continuing Education Program. Course
information.
Heather Hedden also offers a full-day workshop on creating taxonomies
and controlled vocabularies. It is offered as an onsite workshop through
Simmons College Graduate School of Library and Information Science Continuing
Education Program and through professional associations, such as chapters
of the American Society for Indexing, in other states and regions. Workshop
description
Further reading on Taxonomies and Thesauri
Taxonomies &
Controlled Vocabularies Special Interest Group
Taxonomy Community
of Practice Wikispace
Taxonomy Warehouse
Directory of taxonomies and controlled vocabularies, along with other
resources from Dow Jones.
Thesaurus
principles and practice
Willpower Information
Managing taxonomies strategically
Montague Institute article
Content
Classification
EncycloZine article
There is also a discussion group dedicated to taxonomies:
Taxonomy
Community of Practice Yahoo Group
|