Taxonomy Boot Camp, 2018, Report: AI and Taxonomies

Artificial intelligence (AI) is not new, but it is becoming more ubiquitous, and its applications are growing within other specializations in information management, knowledge management, and content management, including taxonomies. Hence the theme for this year’s Taxonomy Boot Camp conference (November 5-6, 2018, Washington DC) was “Bridging Human Thinking and Machine Learning.”

This was the 14th Taxonomy Boot Camp conference and its 9th year in Washington, DC, which (along with the newer Taxonomy Boot Camp London) is the only conference dedicated to taxonomies. As usual, it is held along with several other co-located conferences of Information Today Inc., which overlap or are consecutive. The format, as in past years, involved an opening keynote, after which the conference breaks in two tracks of sessions the first day, one more basic and one more advanced, then on the second day a joint keynote with KMWorld conference, and a single track for the rest of the second day. By a show of hands, it appeared that 75% of the Taxonomy Boot Camp attendees were first-timers, even more than before. There were 235 attendees, including speakers and sponsors.

While the conference has two tracks the first day, a more basic and a more advanced track, presentations on machine learning and AI were in both tracks. These included “Taxonomy & Machine Learning at the Knot,” “Sandwiches, Categories, Ethics & Machine Learning,” “Taxonomy Skills in the World of AI” (a panel), “Semantic AI: Fusing Machine Learning with Knowledge Graphs,” “Semantic Search Enrichment,” “Taxonomies and AI Chat Boxes,” and “Taxonomy in the Age of Amazon Echo,” and “Applying Taxonomy Skills to Cognitive Computing” (a project involving IBM Watson data privacy research product of Thomson Reuters).

In “Semantic AI: Fusing Machine Learning with Knowledge Graphs,” presenter Andreas Blumauer of the Semantic Web Company said that increasingly companies are adopting knowledge graphs as their IT infrastructure, and leading players are trying to fuse knowledge graphs with machine learning. A knowledge graph has to be stored in a graph database. There are two types of graph database models: property graphs and RDF graphs. RDF graphs are more important for knowledge graphs.

Semantic AI core principles include the following.

  • It’s about things not strings.
  • It’s more than metadata: it describes the meaning of metadata as an additional, semantic layer.
  • The knowledge graph establishes the semantic layer.
  • Knowledge graphs can be seen as an input for machine learning.
  • AI isn’t always good at understanding questions so a taxonomy/ontology is needed to support it.
  • AI should be built upon data quality, data as a service, no black box, a hybrid approach, as structured data meeting text, aiming towards self optimizing machines (a vision, as we are not there yet).

Use cases of knowledge graphs include a recommendation engine. A knowledge graph is the basis behind the recommendation engine providing content, taking into consideration users.

In “Taxonomy & Machine Learning at the Knot,” the presenters of the web media company the XO Group, started with a good introduction to machine learning, starting off with explaining the problems it can solve: predicting behavior, automating tedious steps, and classifying; and that there are two types: supervised and unsupervised. Common applications include clustering, recommendations, and classification, and each of these can involve taxonomies. Specific implementation examples were provided.

As with last year, there was also a lot of talk of auto-categorization (automated or machine-aided indexing) across various session. Three were dedicated to the subject: “Driving Discovery: Combining Taxonomy & Textual AI at Sage” (a case study using Expert System auto-categorization) “Testing for Auto-tagging Success” and “Classification Relevance at Associated Press.” AP has an automated rules-based classification system for Subjects, Geography, and Organizations. Rules based auto-classification was chosen over the statistical method, because it offers transparency and control, breaking news and low frequency terms can be dealt with (don’t need the existing training set), you can scope/disambiguate between terms better, such incident type terms (Violent crime) vs. issue terms (Domestic violence), and semantic rules ensure there is not must passing mention. Entity extraction with disambiguation rules is used for person names and publicly-traded companies.

Knowledge graphs are getting more attention both here and at Taxonomy Boot Camp London. This was, of course, the main topic of the presentation Andreas Blumauer’s talk “Semantic AI: Fusing Machine Learning with Knowledge Graphs,” and Mike Doane, in the introduction of his talk on “Taxonomy in the Age of Amazon Echo”  said that the information industry analysis firm Gartner reports that knowledge graphs are on the rise and are discussed more than taxonomies. Gartner is tracking knowledge graphs instead of taxonomies and ontologies.

While the opening keynote did not focus on AI or machine learning, it was presentation by a computational linguist, Deborah McGuinness, a professor of Computer, Cognitive, and Web Sciences, at Rensselaer Polytechnic Institute. Among other things, she spoke of the Data life cycle, whereby a computer understandable specification of meaning (semantics) supports enhanced lifespan and impact of data. She went on to include to specific ontology case examples.

Nearly all session slides are available to download, except the keynotes, without any login credentials at: