Most of us first became familiar with the term taxonomy in high school biology when the concept was used in reference to the classification and naming of plants and animals. If you did not pursue a career in biology, you probably did not give the concept any further thought for quite some time after that. Although the term is also used to refer to nomenclature and classification of concepts in other academic disciplines, only since the late 1990s has it been understood to mean information organization in general. Taxonomy in this sense includes controlled vocabularies for document indexing and retrieval, subject categories in content management systems, navigation labels and categories in website information architecture, and standardized terminology within a corporate knowledge base. In some of these areas, such as websites, the application of taxonomy is relatively new, coinciding with the newer adoption of the term taxonomy. Other areas, such as controlled vocabularies and thesauri used in periodical indexing and literature retrieval, have been around for decades. Their publishers may continue to refer to a “controlled vocabulary,” an “authority file,” or a “thesaurus,” even though the newer usage of the term taxonomy is also used for these purposes.
Today there are many meanings of the word taxonomy, which can complicate any research into the term. Although the original meaning, the study of classification, is rarely used, the term taxonomy continues to be used to designate classification systems of things. Originally used for the classification of things in nature, the term spread from the sciences to the social sciences and thus came to be used also for the classification of concepts. (One of the better known such taxonomies is the Taxonomy of Educational Objectives, also known as Bloom’s Taxonomy.) Despite the recent popularity of the term taxonomy for generic knowledge organization, the majority of books and scholarly articles on taxonomies in print today are still about highly specific classification systems in the sciences or social sciences. Their taxonomists are experts in their academic disciplines rather than librarians or information architects.
Even as a generic system of knowledge organization, the term taxonomy presently has two different common usages. One meaning of taxonomy, reflecting the earlier usage for the classification of living organisms, is a hierarchical classification of things or concepts in what may be considered a tree structure. Terms within the taxonomy each have a “parent,” or broader term, and a “child,” or narrower term, unless the terms are at the very top or bottom levels of the taxonomy. Another, even more recent, usage of the term taxonomy is to refer to any controlled vocabulary of terms for a subject area domain or a specific purpose. The terms may or may not be arranged in a hierarchy, and they may or may not have even more complex relationships between each other. Thus the term taxonomy has taken on a broader meaning that encompasses all of the following: specific-subject glossaries, controlled vocabularies, information thesauri, and ontologies. Each of these will be explained in further detail in Chapter 1. For the purposes of this book, this second, broader definition of taxonomy is used. It is the simplest term, and it corresponds to the word taxonomist.
As the word taxonomy has different meanings, so does the designation of a taxonomist. It can still refer to a biologist who specializes in the field of naming and classifying organisms. The majority of people with the title of taxonomist today, however, are information specialists, librarians, or information architects and are not likely to be subject matter experts. They deal with taxonomies in the broader definition of knowledge organization systems (not limited to hierarchical trees of terms). They may be creators of controlled vocabularies, thesauri, metadata schemes, or website categorization systems. “Taxonomist” is a more practical and catchy job title than “controlled vocabulary editor,” “thesaurus creator,” or “nomenclature manager.”
Yet for the scope of this book, taxonomists are not limited to people who have the word taxonomy or taxonomist within their job title. There are other job titles for essentially the same tasks, such as vocabulary developer, technical categorization analyst, and information classification specialist. There are many people who work on taxonomies as only one of several job responsibilities, whether as corporate librarians, information architects, or knowledge managers. Finally, there are those who serve in the role of taxonomist temporarily on a project, returning to other duties after completing the taxonomy.
In sum, a taxonomist is someone who creates taxonomies, either singly or as part of a team of taxonomists, and taxonomies are defined as any knowledge organization system (controlled vocabulary, synonym ring, thesaurus, hierarchical term tree, or ontology) used to support information/content findability, discovery, and access. This taxonomy work may be an ongoing job responsibility or a temporary project, and it may be a primary job responsibility or a secondary responsibility. These people, and those who are interested in getting into such work, are the primary audience of this book.
There is no undergraduate major or graduate degree in taxonomy and no department, program, concentration, or certificate in the field. Thus, people do not choose to be taxonomists when they decide what they want to study. Furthermore, the majority of graduate schools and programs of information science, or library and information science, do not have even a single course devoted to creating taxonomies (although it is often a topic within a course).(2) Therefore, even people with an education in information science are probably not thinking of working as a taxonomist. For this reason, too, we can say that many taxonomists become so by chance or by “accident.”
Unlike working as a reference librarian or corporate librarian, working as a taxonomist does not usually require a degree in library and information science (although it is often preferred). For this reason, too, people with varied educational backgrounds may accidentally find themselves working as taxonomists. In fact, according to the results of an online survey of taxonomists in May 2015, just about half had a master of library science (MLS) or master of library and information science (MLIS) degree. (The full survey questions and answers are reproduced in Appendix A of this book.)
Information taxonomies are relatively new and growing in terms of their applications. New web interactive technologies make taxonomies more usable and user friendly, and the exponential growth of electronic data increasingly calls for new means of organizing and accessing information. Since information taxonomies have been getting attention only since the late 1990s or around 2000, any experienced professional who is getting into taxonomies is doing so somewhat accidentally. As for entry-level taxonomy positions for the new MLIS or MIS graduate, I have yet to see such a position posted.
As for my story, although I came to developing corporate taxonomies via work on controlled vocabularies for periodical database indexing, I did come to the field of controlled vocabularies quite accidentally. I had started my career in writing and editing and responded to a job notice for an abstractor at the computer magazine publisher Ziff Communications, not realizing that Ziff, at the time, owned a large periodical indexing division called Information Access Company. It turned out that the abstractors did the indexing and other metadata application as well, so after intensive employee training on indexing, I got my first exposure to controlled vocabularies.
After indexing for a couple of years, I decided to move onward and upward into the controlled vocabulary management group and soon forgot about abstracting. But I never completely gave up writing, as the production of this book will attest. When my position was eliminated in early 2004 and I had to look for new work, I had difficulty finding a job in a profession that I didn’t know what to call. My previous title had been controlled vocabulary editor, but, alas, I found nothing by that name on the job board sites. Although publishers of aggregate periodical indexes are few and far between, it turned out that similar skills were in demand by large companies to organize and retrieve their internal documents. I then discovered taxonomy and taxonomists and realized that I could call what I had been doing for the previous 10 years “taxonomy work.” With my prior taxonomist position, I soon landed new taxonomy contract work and then, with that additional experience, a series of full-time taxonomist positions in addition to periods of independent consulting.
While taxonomy may no longer be the latest, hottest topic, as it was around 2000, it has moved beyond being a buzzword to become a topic of more stable interest. The following illustrate the sustained interest in taxonomies:
Although there are numerous articles and conference presentations on information taxonomies, books dedicated to the subject are rare. There have been several good books on thesaurus construction published in recent decades. While these might serve as useful guides for the practicing taxonomist, thesaurus construction books do not sufficiently cover other kinds of taxonomies, such as enterprise and website taxonomies and issues of automated indexing and search. The more recent books on taxonomies, on the other hand, are focused on enterprise taxonomies or take a more project management perspective on taxonomy creation. These may be good books for the manager or executive who is considering a taxonomy project, but they lack sufficient depth to instruct the practicing taxonomist, who needs advice on how to handle various situations in working with the taxonomy terms themselves.
What was missing, in my view, was a practical book for the person actually creating and editing the terms within a taxonomy—a resource for practicing taxonomists designed to go beyond the introductory level. Introductory information on taxonomy creation abounds in articles, conference workshops, Taxonomy Boot Camp, and a few graduate school or continuing education courses. I teach such a continuing education course myself and have been asked by prospective students about offering an intermediate or advanced course, as nothing exists. Rather than teach a second course—an ongoing commitment—I decided to write this book.
That is not to say that The Accidental Taxonomist is purely at an advanced level. It is still appropriate for beginning taxonomists and includes all the content of my introductory course on creating taxonomies and controlled vocabularies. The currently practicing taxonomist will also find useful information, as additional content has been included based on various presentations and articles I have written the past years and on some more recent research.
Because there are many different kinds of taxonomies—for human and automated indexing, for literature retrieval and website information categorization, for consumers and internal enterprises—a taxonomist’s experience in creating one kind of taxonomy is not necessarily sufficient preparation for working on a different kind of taxonomy. Thus, this book also serves the purpose of cross-training existing taxonomists for different kinds of taxonomy projects. If we want to carry the label of taxonomist and move from one job to another, then a broader understanding of the types of work and issues involved is needed.
This book aims to explain what you need to know to be a good taxonomist rather than explain how to create a taxonomy, step by step. Therefore the chapters are arranged in order of importance in terms of what you need to know, rather than in the project sequence for building a taxonomy. Chapters 1 and 2 provide background on taxonomies and taxonomists. Chapters 3 and 4 present the basics of term and relationship creation in accordance with the ANSI/NISO Z39.19 and ISO 25964standards, which may serve as a review for experienced taxonomists but is fundamental for the new taxonomist. Chapter 5 provides practical information on the various taxonomy management software options available. While some software tools have come and gone, others have been around for a long time and have staying power.
The following chapters move beyond the basics to focus on particular issues for different types of taxonomies. Chapter 6 deals with creating taxonomies or thesauri used by human indexers, whereas Chapter 7 discusses the issues involved with creating taxonomies used in automated indexing, auto-categorization, or search. Chapter 8 examines various taxonomy structures, and Chapter 9 presents various display options.
Chapter 10 turns to broader issues of taxonomy planning and design, which often involve the taxonomist, and Chapter 11 deals with ongoing taxonomy work, such as the maintenance, merging, and translating of taxonomies. Finally, Chapter 12 returns the focus to the taxonomist: the nature of the work, what kind of work exists, and training and resources available.
As an aside, the quotations at the start of each chapter were proposed mottos for the Taxonomy Community of Practice discussion group, suggested by its various members in January 2009. (The quotation for Chapter 1 was the winning motto.)
I hope the book will prove not just informative but practical and useful as well. While it covers most of what you need to know to create taxonomies, it does not address every detail. For additional specific instructions, I highly recommend consulting the ANSI/NISO Z39.19 standard, Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies, which is available free of charge. It offers a wealth of information but is really too much for the newcomer to taxonomies to digest. That is where this book comes in. This book also discusses additional types of taxonomies and taxonomy features not addressed in the standard.