Testing web-scale discovery services: how well do they work?

My library is currently evaluating web-scale discovery services. We are considering Exlibris’ Primo, Proquest’s Summon, Ebsco’s EDS, and OCLC’s WorldCat Local. (If you want to learn more about web scale discovery, I recommend Athena Hoeppner’s good overview.)

As part of this process I’ve been looking at how libraries have implemented these four services. For this post I did an informal test of 10 academic library implementations, plus Google and Google Scholar. I selected both small and large libraries with fairly standard implementations – those that used the native, out-of-the box interface and combined book and article results.  I compared them using some typical undergraduate searches.

The searches

1)  The republic  (known book, user wants physical book and availability info)

2)  Cyberbulling in middle schools  (note spelling mistake)

3)  Library hours  (whether services can be incorporated into the discovery tool)

4)  do women’s magazines negatively impact body image in young girls  (natural language searching)

5)  Book Published in 1640 Sets a Record at Auction (known article search – a recent NY times article)

6)  civil war primary sources (using format as keyword)

7)  nature (journal title)

8)  JSTOR (database title)

9)  gun control agenda is a call to duty for scientists (known article from Science)

The results

There was no clear winner among the vendor products – and there were a few problems.  The best results came from using a combination of Google and Google Scholar: these provided the most relevant results for known article searches, primary source searching, and database by title. Google/Google Scholar also did better than vendor products at correcting spelling mistakes, keyword searches, and natural language searches.  Google even found “library hours” when the institution name was included in the search. There was only one test the Google services failed: known book searches.

So I’m left with the following question:  do libraries really need to invest in discovery services? Or do users prefer reliable, streamlined access to library resources from the search engine of their choice?

Detailed results

1)  The republic  (known book, user wants physical book and availability info)

Summon, Primo, and WorldCat Local passed this test.  

EDS failed this test. The first result in both EDS implementations was a completely non-relevant journal article with a remarkably long title that happens to include the word “republic” many times.  It takes a really long scroll to get to relevant book results. Both Google and Google Scholar also failed, because they could not provide availability information.  

results of EDS search

“The republic” search in EDS

 2)  Cyberbulling in middle schools  (note spelling mistake)

All vendors caught the spelling mistake.  Summon and WorldCat Local responded with “did you mean cyberbullying in middle schools?”  while Primo and EDS implementations simply automatically brought back results for cyberbullying in middle schools.  Google and Google Scholar showed results for cyberbullying but had an option to change back to the exact spelling.

3)  Library hours  (are services incorporated into the discovery tool?)

This test clearly relies on whether or not the library has opted to add library services data into the discovery product – as well as whether or not this customization is possible.  Only two of the 10 libraries I looked at had done this:  University of Miami with Summon, and MIT with EDS.

I also tested Google with library hours, adding the institution name to the query, e.g., “Brandeis University library hours.”  9 out of 10 libraries had Googleable library hours (within top three search results.)  Colby College and Northeastern University were the most Googleable: they brought back a feature box with today’s hours:

Google search for library hours

Google search results for Northeastern University library hours

4)  do women’s magazines negatively impact body image in young girls?  (natural language searching)

The winner here was definitely Google Scholar, with the most relevant results on the first page. The larger libraries (Penn State, MIT, Northeastern, Brandeis, and Maryland) also brought back relevant results on the first page.  There was no noticeable difference in relevancy ranking among the four vendors.

5)  Book Published in 1640 Sets a Record at Auction (known article search – a recent NY times article)

All vendor products failed this search.  This article, published in the New York Times a few days ago, is available through the Lexis Nexis Academic database.  All of the libraries I tested subscribe to Lexis.  So I can only assume that the fact that this article was not returned means that Lexis is not included in any of the discovery services, or does not work well in the discovery services, or has not yet been picked up by the discovery services.

In contrast: this article is the first result returned in a Google search.

6)  civil war primary sources (format as keyword)

Google did best with this search: bringing back sites that feature primary source material on the American Civil War.  All four discovery services brought back items about primary source material, and items about the wrong war – e.g., the Russian Civil War.

EDS did slightly better than the other three: they have a subject term for “primary sources” as well as a “primary sources” facet.  A couple of relevant results were returned, even without faceting.

7)  nature (journal title)

All four vendor services passed this test, bringing back the journal title within the first three results. Google also passed (first result), while Google Scholar failed.

MIT’s implementation of EDS does a nice job displaying Nature as a featured result and linking to both online and print access:

Nature

MIT result for Nature

8)  JSTOR (database title)

This test relies on whether or not libraries chose to add databases by title to their discovery service. The JSTOR database was in the first three results for 7 out of 10 libraries.  For three Summon libraries, JSTOR came up as a “recommended resource” (a feature of Summon 2.0).

For some libraries, the first result was a review of JSTOR, rather than the database itself.  In Google, JSTOR is the first result.

9)  gun control agenda is a call to duty for scientists (known article from Science)

All the libraries I tested subscribe to the journal Science.  But for Babson College (Summon) and Brandeis (Primo) this article was not returned.  For all other libraries, the correct article was the first result.

This article was the first result in both Google and Google Scholar.

The libraries

Babson College – Summon (newspapers included in default search)

Penn State – Summon (newspapers included in default search)

Colby College – Summon (newspapers not included in default search)

University of Miami – Summon (newspapers not included in default search)

Connecticut College – EDS

MIT – EDS

Brandeis University – Primo

Northeastern University – Primo

Portland Community College – WorldCat Local

University of Maryland – WorldCat Local

5 thoughts on “Testing web-scale discovery services: how well do they work?

  1. Interesting results. I find the “civil war primary sources” search particularly interesting. I know Primo Central is starting to harvest some institutional (and other) repositories which means the number of primary source materials will increase, but I wonder if they would be found with this search? The issue probably is not so much the discovery layer itself in this instance but the underlying metadata used by libraries. We have primary materials in our discovery layer but we don’t use that term. Google’s advantage probably is in the pages that link to resources containing terms like primary sources.

    I am going to have to see if my discovery layer search logs have “library hours” or similar in it. If it does, we might want to consider adding that. With our discovery layer it would be trivial to add a record with a url to our hours (or other local web pages) but we decided not to index our Web site in the discovery layer. I wonder how many libraries do?

    1. Thanks for your insightful comment. I think adding library services to discovery search is starting to slowly catch on. I’ve found in usability testing that if users are presented with a search box they will type just about anything into it. Stanford’s Blacklight discovery layer does a great job at integrating their website – and I like how they included “renew books” in their search examples: http://library.stanford.edu/

      Also, Lorcan Dempsey has had some interesting posts recently around what he has called “full library discovery:” http://orweblog.oclc.org/archives/002214.html

  2. I’ve been struggling with such issues since 2011 and I think things are a lot better now. A lot of tests you tried that now discovery services pass, used to fail. One thing that frustrates me a lot is that in Summon if you try entering , found you get book reviews, articles mentioning the book rather than the book itself. This is particularly bad if you are looking for a classic work. The correct way to find a known item in Summon is just though the irony is users try to “help” the system by adding the author get worse results.

    1. I agree, Aaron, there has been much improvement! But as you said, library systems still don’t respond well to searches that work in Google (author/title combination). And until they do, we will continue to have confused, frustrated users.

Leave a Reply

Your email address will not be published. Required fields are marked *