Semantic Identity for Library Organizations

By Kenning Arlitsch posted

Kenning Arlitsch, dean of the library at Montana State University, coauthored this blog with Patrick O'Brien, semantic web research director at Montana State University.

Library organizations are poorly represented in Semantic Web applications such as Knowledge Cards, which now display to the right of many Google search results. Knowledge Cards provide users with brief information about organizations or people and may include such items as location, description, hours, logos, photographs, and reviews (see example, below). They are a product of the larger and invisible Knowledge Graph, which Google introduced in 2012 to help it gather semantically rich data for more accurate and enhanced search results. More importantly, Knowledge Cards are a manifestation of what we call “semantic identity,” which authoritatively establishes the existence of an organization or person. The concept is not dissimilar to the name authority files that librarians have long maintained, but this authority is for the Semantic Web.




We recently conducted a survey of the 125 members of the Association of Research Libraries (ARL), which revealed that Knowledge Cards are not visible at all for 43 of the libraries. Of the 82 libraries that did show some kind of Knowledge Card, 10 were inaccurately matched to the library in question, while 29 offered little or no useful information. Only 5 libraries showed Knowledge Cards that contain rich and useful information for users. The absence of robust Knowledge Cards means that Google is having difficulty finding authoritative information about most library organizations. This in turn means that libraries are not accurately represented on the Semantic Web.

We’ve all heard the Semantic Web mantra of “strings to things” that indicates the changing nature of entity identification. Changing the machine interpretation of library organizations from “strings” of text to machine comprehension of those organizations as established entities or “things” is crucial to accurate and robust representation in Knowledge Graph products like Knowledge Cards. Machines have trouble understanding the concept of "things" and their relationships without considerable help. Google and other search engines are most successful in gathering accurate information about libraries if they are defined and verified as entities in data sources they trust. These sources comprise the Linked Open Data universe, and at the center of that universe lies DBpedia, whose structured data records are generated from Wikipedia entries. No entry about your library organization in Wikipedia? Then there will be no structured data record in DBpedia and in all likelihood there will be no Knowledge Card because Google will have trouble verifying the existence of the organization. 

The Knowledge Card displayed for the Montana State University Library in 2012 was for a branch campus in Billings, MT, rather than the flagship campus in Bozeman. Investigation revealed a number of deficiencies in the semantic identity of the Montana State University Library. No article about the library existed in Wikipedia and therefore no structured data record existed in DBpedia. There was no Freebase entry, the Google+ page was unofficial, and the property in Google Places was unverified. Google had interpreted the meager string-based information available, and had mistakenly concluded that the “Montana State University Library” was located in Billings.

Initial work by Montana State University and OCLC Research has demonstrated that creating and enriching semantic data in LOD sources as well as applying Schema.org markup to library websites can achieve more accurate semantic identity.

Rectifying our data deficiencies has had significant effect on the MSU Library’s representation on the Semantic Web. A rich page of structured data generated from a Wikipedia article can now be found in DBpedia. A Freebase[1], entry exists indicating that “Montana State University Library” is the “SameAs” the “Roland R. Renne Library” and the “MSU Library.”[2]  The property has been claimed in Google Places/Google My Business, and an accurate page represents the organization in Google+. As a result, an accurate Knowledge Card is now displayed for the MSU Library, although it is still not robust and some inconsistencies remain. More research is required to determine if the semantic identity of library organizations can be consistently established or improved.

This is tricky territory and we do not advocate libraries charging into the Wikipedia environment without a significant understanding of the culture that sustains the world’s largest encyclopedia. Self-promotion is frowned upon and articles must be well cited and vetted by Wikipedia editors. One thing is certain, however. Engaging in building and improving the Linked Open Data sources that feed the Semantic Web will benefit the library community and its constituents.

Funding for this research was provided in a 2011-2014 National Leadership Grant from the Institute of Museum and Library Services.



[1] Just before this blog went to press Google revealed that it is phasing out Freebase and will instead support the newer Wikidata effort of the Wikimedia Foundation. http://tinyurl.com/nofacf3

[2] There are other entities known as “MSU Library.”  These include libraries at Michigan State University, Missouri State University, Mississippi State University, etc.