Tuesday, March 04, 2008

Ontario Scholars Portal – Yours to a Discovery Layer

(This is a little something I wrote to support this work)

What is a “Discovery Layer”?

To me, a Discovery Layer allows a user to search across a library catalogue (or several), an ebook platform (or several), and a source of articles (or several). 

A Discovery Layer could make use of one or more combinations of the following:
  • Federated Searching
    • a query is distributed to multiple sources, and responses are compiled, de-duped, and returned
    • e.g. Sirsi Single Search
  • Metasearching
    • a single, regularly complied index is created from the collection of metadata from multiple sources
    • e.g. Endeca, Google
  • Single host environment
Why a Discovery Layer?
The pursuit of a Discovery Layer seem to be driven by the need to present one, strong and stable user interface over many disparate sources of information. Some benefits of a discovery layer include:
  • users only have to learn one interface, instead of many
  • users don’t have to choose from lists of dozens of indexes
  • users don’t have to repeat searches depending on format (one search for books, then one for dissertations, then one for articles…)
  • users expect simple, effective search tools like Google
What’s the problem?
The challenges that face the construction of a discovery layer include:

  • many of our research tools are very difficult to extract data from as they make use of a multitude of non-standard formats and protocols
  • most of our research tools (especially the library catalogue) generate search results with poor relevance ranking
  • some sources will be rich in text and metadata (articles, ebooks) while other sources will only be represented by metadata (print books)
How much can an improved interface improve things?
At the present time, I would say that there are 3 archetypes of Discovery Layer Interfaces.
How much can an improved interface, improve relevant results?
Coming up with what a user might deem relevant from 2 or 3 keywords is challenging in a regular search environment. Producing consistently relevant results in a federated or metasearch environment is extremely difficult.
Relevance might be improved through one or more of the following:
  • by taking into account the user’s previous searching behaviour
  • by weighing results by the number of times an item has been bookmarked, printed, or saved
  • by using citation information to determine ‘likeness’ (e.g. based on a percentage of shared citations in item’s bibliography)
  • by using user-created lists articles to generate similar items of possible interest
  • by knowing what courses a users is currently taking/teaching and emphasizing relevant resources accordingly
What is a Good Enough Discovery Layer?
Is it realistic to expect a Discovery Layer to serve both the novice researcher and the expert to access a variety of formats in a multitude of disciplines? Can one size fit all? Should we develop several Discovery Layers with one for each discipline? (Arts, Social Sciences, Medicine). Should we develop one interface for undergraduates and one for faculty and graduate students?

How will we know we have reached the Promised Land?
Most discovery layers are still in the earliest stages of their development and by appearances, they seem more alike than unalike. How should we choose what is an acceptable product? One suggestion is to measure the success of a Discovery Layer by comparing its search results to Google.


Unknown said...

Great summary, Mita. A few thoughts come to mind. Eric Lease Morgan posted a link to the
Many More than a Million: Building the digital environment for the age of abundance
report from CLIR on the
list today, and reading the report gives a sense of the challenges and possibilities of mass digitization. While the majority of print books are going to be represented by metadata for the short term, there is a lot that may change here. Leaving aside the many issues of OCRing such material, one peculiarity of digitized analog objects is that they can capture context better than completely digital renditions. I have seen this with the OurOntario newspaper project, reading the positive spins put on the Dieppe raid beside the ads for victory bonds and keeping telephone lines free for war communications really gives some insight into the mindset of the media of the time, and in a way that a standalone representation would not. So I don't how how this fits, but capturing context and reflecting mass digitization probably factors into the discovery layer somehow.

The other aspect that I think it is really important to retain comes from the
Scholr 2.0
white paper, and that is providing a bi-directional space for information interaction, where users could upload and augment their own content. Michael Jensen's quote in the white paper about scholarly output locked behind firewalls and hidden on hard drives represents a somewhat scary, but possibly incredibly advantageous, opportunity for a discovery layer. This could range from the sheer act of archiving this content in at least one other place through to
constructing a sort of bibliographic-based DNA for tweaking relevancy.

And that leads to my last point. I don't think we can underestimate how much research has been done on relevancy weighting and how important it is to be able to define and provide variations on indexing strategies. This is where the comparison to Google is so brilliant, Google was able to trump Alta Vista via figuring out page rank, your suggestions for improving relevancy point to a possibly similar trajectory for scholarly content.

Mita said...

Thanks Art for your thoughts. I'm going to have to give more thought about the 'context' of scholarly resources and objects.

One concept from Scholr 2.0 that I didn't bring into this summary was the notion of making the data available and mashable and giving the users the opportunity and ability to develop their own interface that works best for them and their particular interests.