by Joseph Busch
Organizing electronic content using metadata fields with controlled vocabularies has at least a 50 year history. It’s the story of how we got from expensive rarely used time-shared databases to the almost ubiquitous web where anyone can “look it up” anywhere anytime. The work of tagging content has always been done by an army of indexers, more geeks than librarians, working in more of a cottage industry than a factory. All were accidental information scientists with backgrounds in business, medicine, law, the humanities, maybe sometimes library but rarely computer science.
Some people may think that the content in Heather Hedden’s practical compendium is “old wine in a new bottle”, but somebody had to write this stuff down. True, librarians have been doing cataloging, classification and subject indexing for a long time, long before electronic content became a format to manage. But meaningfully adapting appropriate practices from library science and communicating it in a form that can be effectively used by a broad interdisciplinary audience is the major accomplishment of this book.
Taxonomies to support content indexing and finding could be tied to the history of database systems that included processable text information. At first these databases were electronic versions of abstracting and indexing services offered as very expensive time-share online services (e.g., Dialog), later as subscription CD-ROM databases, and most recently as various types of web-mediated services. In the early days, two disciplines dominated the online services—medicine and law. Medical informatics was heavily subsidized by governments (especially in the United States) after World War II, and legal information (e.g., LexisNexis) was valuable enough to be paid for by large corporations who were the clients of large law firms. Medical Subject Headings (MeSH) was introduced by the National Library of Medicine in 1960. Its precursor was the subject headings of Index Medicus, which date from 1940. Medical “subjects” are one of the taxonomy gold standards. They include taxonomies for the human body, taxonomies for conditions and treatments, taxonomies for medical practice settings, etc.
The iterations of digital environments over the past 50 years have had major impacts on what would be considered effective and efficient information organization strategies. In the era of expensive time-share online services, taxonomies needed to enable especially precise retrieval because every minute and every citation to an information source had a significant cost associated with it. End users such as business managers were typically not allowed to execute their own searches. This was an era of intermediated searching. The online searcher (often a librarian) was a highly trained gatekeeper, and often a subject matter expert him- or herself.
With CD-ROMs the costs of online access were eliminated. But the content organization schemes had to be changed to work on these self-contained platforms. The web changed this again, at first replacing content organization with the power of web search engines (Google, Yahoo!, Altavista, etc.), global taxonomies such as the DMOZ Open Directory Project, and very importantly, online shopping. Search engines transformed us into a “look it up” culture. Shopping online has taught everyone how to do Boolean searching, these days referred to as search refining.
The current era of the semantic web is proving to be a further watershed, because its underpinnings are the identification of named entities – people, organizations, locations, events, products, topics, and the like – when they occur in content on the web. The first-generation web enabled the observation and boosting of content relevance based simply on its access and use. The semantic web is enabling the identification of relationships among all types of named entities and the presentation of information based on these relationships. Simply put, the semantic web is based on the organizing power of faceted taxonomy.
Inside the organization, the relatively new current expectation is that information should be as findable and linkable as on the public web. Enterprise applications are more and more becoming web services that happen to be within the organizational firewall. Employees expect there to be
As taxonomy becomes a ubiquitous part of the organizational information ecosystem, there is more and more demand from organizations for people who have the skills to integrate taxonomies into enterprise applications. But what exactly does creating and maintaining taxonomies entail, and where are you going to find the appropriate expertise to competently undertake these tasks? While this is a great time to be a taxonomy consultant, one measure of the success of one of our engagements is whether a taxonomy editor has been identified or hired to be the central point of contact for taxonomy maintenance. Hence, you may find yourself becoming an “accidental taxonomist”.
This book is an excellent primer for the novice who finds him- or herself assigned (or volunteering for) the task of creating and maintaining a taxonomy. The book should also serve as a “bible” for the expert (I have a copy on my shelf). It answers the key questions which I am frequently asked:
This edition is a comprehensive revision, notably updating screenshots of websites, the section on taxonomy software, and adding information about two important new taxonomy standards ISO 25964 (Thesauri and Interoperability with Other Vocabularies), and SKOS (Simple Knowledge Organization System) a W3C recommendation. As a consultant I am a proponent of keeping things as simple as possible. The Accidental Taxonomist is a very useful tool for me to share with my clients and prospects. It is full of information about the various considerations related to content organization and is one of the best sources for guidance on best practices for addressing them.
Joseph Busch is an authority in the field of information science and a frequent speaker at conferences on metadata and taxonomy. Prior to founding Taxonomy Strategies, a consulting firm that guides organizations in improving information capture, preservation, search, retrieval, and governance, he was vice president for Infoware at Metacode Technologies and the Getty Trust’s program manager for Standards and Research Databases. He is a past president of the Association for Information Science and Technology (ASIS&T) and a member of the Dublin Core Metadata Initiative Executive Committee.