Faceted taxonomies (taxonomies with attributes, dimensions, filters, etc. to limit search results based on the combination of selected criteria) are becoming increasingly popular with the support of web database technology. Unlike traditional hierarchical taxonomies, designing a faceted taxonomy first requires a decision on how many facets to create. There are various factors to take into consideration.
The nature of the content is always the most important factor. It may seem ironic, but content that is more limited in scope can support more facets than content that it broad in scope. For example, an ecommerce site selling just computers, could have a relatively large number of facets by which to limit laptop computers: brand, price range, hard drive, screen size, operating system, processor brand, processor type, webcam inclusion, and online/in-store availability (9 facets). On the other hand, if a content repository comprises all kinds of articles, then there is not much else beyond “subject” and article type to classify them by (2 facets). (Other metadata fields, such as author, title, and date, may also be used to limit results, but these do not involve taxonomy terms.)
More facets can be included, if they are stacked one above each other vertically, such as in a left-margin, than if they are displayed horizontally across the width of the screen. This is because horizontal scrolling is something users dislike and is avoided in content design, whereas limited vertical scrolled is acceptable.
What the tagging process supports
For manual tagging, you have to consider who is doing the tagging, what their knowledge and experience is, what level of training is practical, how much time and effort can practically be devoted to tagging, and what the tagging user interface looks like. As with the end-user UI, the tagging interface also needs to display all facets and facet values in an easy-to-use manner. Usually, people who tag content for internal content management are not dedicated indexers. To simplify tagging and ensure that it is done correctly and done at all, for internal tagging there should not be too many facets for internal tagging (such as around 3).
In automated tagging, it’s not so much a matter of how many facets, but rather how distinct the facets are and how easy they are for automated tagging. There are different technologies out there, but, in general, named entities/proper nouns are easier to distinguish than topical subjects. So, facets for author, location, department, product name, etc., are easy to classify automatically. Language, and a document type that is based on file format are also straight-forward for auto-classification. Subject or Topic could be catch-all for high-ranked keywords. If you want to create facets for different kinds of topics, though, such as Purpose, Activity, Significance, Origin, etc., the distinctions will likely be too challenging for an auto-classification tool.