Monday, May 02, 2011

Making Links and Open Linked Data at The Great Lakes THATCamp

Usually I wait a couple days after a conference before I try to write down and share what I’ve learned. But because it will only be two short days until I’m on my way to code4lib North and because I have two deadlines to bravely confront before then, I have to write down what I have learned from this weekend’s Great Lakes THATCamp *right now* for better or for worse.

I was going to work on the aforementioned deadlines on Friday night as I would be holed away in alone in a hotel room in East Lansing, but the workshop that I had attended that morning - Jon Voss’s workshop, Intro to Linked Open Data in Libraries, Archives & Museums (#lodlam) - had triggered an epiphany...

About half-way through the morning's workshop, Jon had told a story of when he and some “computer people” had visited the folks at the Library of Congress. Jon and his friends had been collecting data by screen-scraping the Library of Congress website and the folks at LoC were surprised and asked them why they didn’t just collect the data through z39.50 or OAI-PMH. And, to paraphrase the response, they were all like wot’s dat?

I mention this because the same thing happened to me during Jon’s session. Because I was coming from a library context, I did not recognize the tools of those working with the semantic web. But once I was introduced to them, I recognized them. Let’s see if I can help you can recognize them too. One caveat: what follows is what I understand about open, linked data and that’s not necessarily reality. If I am mistaken about something, feel free to school me oh pedantic web

Ok. Let’s start with z39.50. Generally a user will visit a library website to search a library catalogue, but there is a standard that exists called z39.50 that allows a user (or a computer program) to search a library catalogue from outside of the website. This is the protocol that allows users to search library catalogues from within citation managers such as RefWorks and Endnote and allows for users to search multiple library catalogues is such services as RACER.

Next, OAI-PMH. This stands for Open Archives Initiative Protocol for Metadata Harvesting. Instead of library catalogues, OAI-PMH is designed for Digital Collections, like institutional repositories. The idea is that if you make your individual repository open for “harvesting”, another service can gather information about your collection. You might know about one particular service that uses OAI to harvest metadata from repositories. It’s called Google Scholar.

Now, we are entering the exciting new world of linked, open data. And it’s early days - like 1997 all over again. And people and organizations are starting to provide open, linked data... for programs that haven’t been written yet. 

Well, there are some programs that have been written... but I’ll get to them in just a moment. 

If you’ve read through Jon’s presentation on Linked Open Data, you know that linked, open data relies on descriptions that are meant to be read by machines. If you want to earn 5 stars of open, linked data you can create a FOAF to introduce yourself to code.Here’s Ed Summer’s FOAF  (firefox will display the file but chrome and safari might attempt to download it -- so I’ve provided a screenshot below). 


(You can create one yourself using it as a template or you can find a FOAF generator to make one yourself).

Now, imagine that you are responsible for creating a staff directory for your institution of higher education. You *could* create individual FOAF’s for your staff. If you go this route, you would of course first check to see if there is an established way to semantically describe departments and programs in higher education, so you don’t re-invent the wheel and so you will make it easier for future machines to make connections between your staff and the staff of other institutions. Here’s one for UK institutions that I found via Patrick Murray-John’s post, Thoughts Toward a Giant EduGraph. Or, you could download an application that creates a directory of researchers and their research interests and works that automagically produces semantic descriptions for you. Like VIVO from Cornell University Library.

During the workshop, I asked Jon the question, “If you download VIVO onto your own server, will your information would be available to be searched like the other VIVO instances” and he replied that he thought so as, VIVO would presumably have a SPARQL endpoint. And I replied,wot’s dat?

And that’s when I learned SPARQL is the query protocol that is like SQL but for semantic information in RDF. In other words, it’s like the z39.50 that connects and queries library catalogues and OAI that connects and queries digital repositories.

Now, I know that it’s more complicated than that. I suspect that semantic data can be harvested without SPARQL. I think this because on Friday night I was fiercely determined to use my improved understanding of open linked data to find out how Drupal 7 incorporates RDF. And my persistence paid off because I found this illuminating screencast, The story of RDF in Drupal 7 and what it means for the Web at large. And by watching this one hour presentation, I learned of a tool called Sindice Inspector that makes visible the semantic descriptions of a given webpage.  And using this tool, I found RDF descriptions in Drupal 7 that I had already been creating without even knowing it. 

So the Next Action is now clear. I need to read up on semantic drupal and to muck about to find out how I can create sound semantic data using Drupal 7. I suspect that I will be writing up these adventures in my workplace blog about our website development

I hope I will be able to write a little bit more about the other wonderful things that I learned at the Great Lakes THATCamp. The semantic web runs on love. And so does THATCamp.  

Only connect.

5 comments:

Amanda French said...

Wow, I'm learning a lot more about linked data, too, just from this post. Thanks, Mita! (Small Pedantic Web correction: it's OAI-PMH, not OAI-PHM.)

Lynne Goldstein said...

Great post, Mita. I think you've been very clear and accurate about the different approaches. At least, what I learned matches what you've described. Very helpful!

Mita said...

Thanks for the heads-up! I will correct it now...

Anonymous said...

Nice post Mita. One thing you might be interested to know is that Google themselves say "OAI-PMH Wots Dat?" now :-) http://googlewebmastercentral.blogspot.com/2008/04/retiring-support-for-oai-pmh-in.html

Marcel said...

Fun post. Good info in there.