Monday, July 14, 2014

Why Nested Boolean search statements may not work as well as they did

At library school, I was taught the concept of nested boolean. In particular, I was taught a particular search strategy which goes like this.

  • Think of a research topic
  • Break them up into major concepts - typically 3 or more - eg A, B, C
  • Identify synonyms for each concept (A1,A2, A3 ; B1, B2, B3 ; C1, C2, C3
  • Combine them in the following manner

(A1 OR A2 OR A3) AND (B1 OR B2 OR B3) AND (C1 OR C2 OR C3)

We like many libraries have created videos on it as well.

If you are a academic librarian who has even taught a bit of information literacy, I am sure this is something you show in classes. You probably jazzed it up by including wildcards (such as teen*) as well.

Databases also encourage this search pattern

I am not sure how old this technique is, but around 2000ish? databases also started to encourage this type of structured search.

Above we see Ebscohost platform and in my institution this "Advanced search" is set to default. You can see a similar UI (whether as default or advanced search) in JSTOR, Engineering Village, Proquest platforms etc.

A lecturer when I was in library school even claimed credit (perhaps jokingly) for encouraging databases into this type of interface.

Recently I noticed a slight variant on this theme where the default search would show only one search box (because "users like the Google one box" according to a webinar I attended), but if you clicked on "add field" or similar you would see a similar interface. Below shows Scopus.

After clicking Add search field, you get the familiar structured/guide search pattern

You see a similar idea in the latest refresh of Web of Science, a default single search box but with a option to expand it to a structured search pattern. Below we see Web of Science with "Add another field" selected twice.

Lastly even Summon 2.0 which generally has a philosophy of keeping things simple got into the act and from what I understand under pressure from librarians finally came up with a advanced search that brought tears of joy to power users. 

But are such search patterns really necessary or useful?

In the first few years of my librarianship career, I taught such searches in classes without thinking much of it. 

It feels so logical, so elegant, it had to be a good thing right? Then I began studying and working on web scale discovery services, and the first doubts began to appear. I also started noticing when I did my own research I rarely even did such structured searches.

I also admit to be influenced by Dave Pattern's tweets and blog posts, but I doubt I will ever be as strongly in the anti-boolean camp.

But I am going to throw caution to the wind and try to be controversial here and say that I believe increasingly such a search pattern of stringing together synonyms of concepts generally does not improve the search results and can even hurt them

There is of course value in doing this exercise of thinking through the concepts and figuring out the correct language used by Scholars in your discipline, but most of the time doing so does not improve the search results much especially if you are simply putting common variants of words eg different variants of say PREVENT or ECONOMIC which is what I see many searches do.

That's because many of the search systems we commonly use increasingly are no longer well adapted to such searches even though they used to be in the past

Our search tools in the past

Think back to the days of the dawn of the library databases. They were characterized by the following

  1. Metadata (including subject terms) + abstract only - did not including full text
  2. Precise searching - what you enter is what you get search
  3. low levels of aggregation - A "large database" would maybe have 1 million items if you were lucky
In such conditions, most searches you ran had very few results. If you were unlucky you would have zero results. 


Firstly the search matched only over metadata + abstract and not full text. So if you searched for "Youth" and it just happened that the abstract and title the author decided on using "Teenager", you were sunk.

Also this was compounded by the fact that in those days, searches were also very precise. There was no autostemming that automatically covered variants of words (including British vs American spelling), so you had to be careful to include all the variants such as plurals, and other related forms. 

Lastly, It is hard to imagine in the days of Google Scholar with estimated 100 billion documents (and Web Scale discovery systems that could potentially match that) but in those days databases were much smaller and fragmented with much smaller indexes and as such the most common result would be zero hits or at best a few dozen hits.

Summon full index showing over 100 million results

This is why the (A1 OR A2 OR A3) AND (B1 OR B2 OR B3) AND (C1 OR C2 OR C3) nested boolean technique was critical to ensure you expanded the extremely precise search to increase recall.

Add the fact that search systems like Dialog were charged per search or on time - so it was extremely important to craft the near-perfect search statement in one go to do efficient searching.

I will also pause to note that relevancy ranking of results could be available but when you have so few results that you could reasonably look through say 100 or less, you would just scan all the results, so whether it was ranked by relevancy was moot really.

Today's search environment has changed

Fast forward to today.

Full-text databases are more common. In fact, to many of our users and younger librarians, "databases" would imply full-text databases and they would look in dismay when they realized they were using a abstract and indexing database and wonder why in the world people would use something that might not give them instant gratification of a full text item. I fully understood some old school librarians would consider this definition to be totally backwards but......

Also the fact you are searching full-text rather than just metadata changes a lot. If an article was about TEENAGERS, there is pretty good odds you could find TEENAGER and probably, YOUTH, ADOLESCENCE etc in the full text of the book or article as well, so you probably did not need to add such synonyms to pick them up in the result set anyway.

Moreover as I mentioned before , increasingly databases under the influence of Google are starting to be more "helpful", by autostemming by default and maybe even adding related synonyms, so there was no real need to add variants for color vs colour say or for plural forms anyway.

Even if you did a basic

A AND B AND C -  you would have a reasonable recall, thanks to autostemming, full text matching etc.

All this meant you get a lot of results now even with a basic search.

Effect of full-text searching + relative size of index + related words

Don't believe this change in search tools makes a difference? Let's try the ebscohost discovery service for a complicated boolean search because unlike Summon it makes it easy to isolate the effect of each factor.


Let's try this search for finding studies for a systematic review

depression treatment placebo (Antidepressant OR "Monoamine Oxidase Inhibitors"  OR "Selective Serotonin Reuptake Inhibitors" OR "Tricyclic Drugs") ("general  practice" OR "primary care") (randomized OR randomised OR random OR trial)

Option 1 : Apply related words + Searched full text of articles - 51k results

Option 2 : Searched full text of articles ONLY -  50K results

Option 3 : Apply related words ONLY - 606 results

Option 4 : Both off - 594 results 

The effect of apply related keywords is slight in this search example possibly because of the search terms used, but we can see full text matches make a huge difference.

Option 4 would be what you get for "old school databases". In fact, you would get less than 594 results in most databases, because Ebsco Discovery service has a huge index far larger than any such databases.

To check, I did an equivalent search in one of the largest traditional abstracting and indexing database Scopus and I found 163 results (better than you would expect based on the relative sizes of Scopus vs EDS).

But 163 is still manageable if you wanted to scan all results, so relevancy ranking can be poor and it doesn't matter as much really.

Web scale discovery services might give poor results with such searches 

I know many librarians will be saying, doing nested Boolean actually improves their search, and even if it doesn't what's the harm?

First, I am not convinced that people who say nested boolean improves the results of their search have actually done systematic objective comparisons or whether it is based on impression that I did something more complicated so the results must be better. I could be wrong.

But we do know that many librarians and experienced users are saying the more they try to carry out complicated boolean searches the worse the results seem to be in discovery services such as Summon.

Matt Borg of Sheffield Hallam University wrote of his experience implementing Summon.

He found that his colleagues reported "their searches were producing odd and unexpected results."

"My colleagues and I had been using hyper stylised searches, throwing in all the boolean that we could muster. Once I began to move away from the expert approach and treated Summon as I thought our first year undergrads might use it, and spent more time refining my results, then the experience was much more meaningful." - Shoshin

I am going to bet that those "hyper stylised searches" were the nested boolean statements.

Notice that Summon like Google Scholar actually fits all 3 characteristics of a modern search I mentioned above that are least suited for such searches
  • Full text search
  • High levels of aggregation (typical libraries implementing Summon at mid-size universities would have easily 300 million entries)
  • autosteming was on by default - quotes give a boost to results with exact matches.
All this combine to make complicated nested Boolean searches worse I believe.

Poor choices of synonyms and overliberal use of wildcards can make things worse

I will be first to say the proper use of keywords is the key to getting good results. So a list of drugs names combined by an OR function, or a listing of philosophers, concepts etc - association of concepts would possible give good results.

The problem here is that most novice searchers don't have an idea what are the keywords to list in the language of the field, so often because they are told to list keywords they may overstretch and add ones that make things worse.

Say you did

(A1 OR A2 OR A3) AND (B1 OR B2 OR B3) AND (C1 OR C2 OR C3)

Perhaps you added A3, B3, C3 though they aren't exactly what you are looking for but "just in case".

Or perhaps you decided it wouldn't hurt to be more liberal in the use of wildcards which led to matches of words you didn't intend. 

Or perhaps the keyword A3, B3, C3 might be used in a context that is less appropriate that you did not expect. Remember unlike typical databases, Summon is not discipline specific, so a keyword like "migration" could be used in different disciplines. 

The fact that web scale discovery searched through so much content, there would be a high chance of getting A3 AND B3 AND C3 entries that were not really that relevant when used in combination.

Even if all the terms you chose were appropriate, the fact that they could be matched in full text could throw off the result.

If A2 AND B2 AND C2 all appeared in the full text in an "incidental" way, they would be a match as well. Hence creating even more noise.

And when you think about it, the problems I mention will get even worse. as each of the keywords would be autostemmed (which may lead to results you don't expect depending on how aggressive autostemming is) exploding the results.

My own personal experience with Summon 2.0 is that often the culprit is the match in full-text. Poorly chosen "synonyms" could often surface and even be pushed up.

The "explosion" issues is worsen by full text matches in books

In Is Summon alone good enough for systematic reviews? Some thoughts.  , I was studying to see if Summon could be used for systematic reviews. A very important paper, pointed out that Google Scholar was a poor tool for doing systematic reviews, because of the lack of precision features like lack of wildcards, limited character length, inability to nest boolean more than 1 level etc, and I had speculated Summon lacking these issues would be a better tool.

Somewhat surprising to me was when I tried actually to do so.

Sometimes, when I did the exact same search statement in both Google Scholar and Summon, number of Summon results usually exploded, showing more results than Google Scholar!

Please note that when I say "exact same search statement" I mean that precisely.

So for example, one of the searches done in Google Scholar to look for studies was

depression treatment placebo (Antidepressant OR "Monoamine Oxidase Inhibitors" 
OR "Selective Serotonin Reuptake Inhibitors" OR "Tricyclic Drugs") ("general 
practice" OR "primary care") (randomized OR randomised OR random OR trial)

Google Scholar found 17k results, while Summon (with add results beyond library collection to get the full index) shows 35K. 

Why does Summon have more than double the number of results?  

This was extremely unexpected because we generally suspect Google Scholar has a larger index and Google Scholar is more liberal in interpreting search terms as they may substitute terms with synonyms, while Summon at best includes variant forms of keywords (plurals, british/amercian spelling etc

But If you look at the content types of the results of the 35k results you get a clue.

A full 22k of the 35k results (62%) are books! If you remove those than the number of results make more sense. 

Essentially books which can be indexed in full text have a high chance of been discovered since they contain many possible matches and this gets worse the more ORs you pile on. Beyond a certain point they might overwhelm your results.

It is of course possible some of the 22k books matched can be very relevant, but it is likely a high percentage of them would be glancing hits and if you are unlucky, other factors might push them up high. 

I did not even attempt to use wildcards to "improve" the results, even though they could work in Summon. When I did that the number of results exploded even more.

As an aside the Hathitrust people have a interesting series of posts on Practical Relevance Ranking for 11 Million Books, basically showing you can't rank books the same way you rank other materials due to the much longer length of the book.

The key to note is that you are no longer getting 50, 100 or even 200 results like in old traditional databases. You are getting thousands. So you can no longer look through all the results, you are totally at the mercy of the relevancy ranking...

The relevancy ranking is supposed to solve all this... and rank appropriately, but does it? Do you expect it to?

A extremely high recall but low precision (over all results), with a poor relevancy ranking makes a broken search. Do you expect the relevancy ranking to handle such result sets resulting from long strings of OR?

With so few users actually doing Boolean in web scale discovery (e.g this library found  0.07% of searches uses OR), should you expect discovery vendors to actually tune for such searches? 

Final thoughts

I am not going to say these types of searches are always useless in all situations, just that often they don't help particularly in cases like Google, Google Scholar, web scale discovery.

Precise searching using Boolean operators has it place in the right database. Such databases would include Pubmed - which is abstract only, allows power field searching, including a very precise MESH system to exploit. The fact that medical searches particularly systematic reviews require comprehensiveness and control is another factor consider.

I also think if you want to do such searches, you should think really hard on just adding one more OR or liberal use of wildcards "just in case". With web scale discovery services searching full-text, and autostemming, a very poor choice will lead to explosion of searches with combinations of keywords found that may not be what you expect.

A strategic use of keywords is the key here, though often for the novice searcher who doesn't know the area, he is as likely to come up with a keyword that might hurt as it might help initially. As such it is extremely important to stress the iterative nature of such searches, so as you figure out more of the subject headings etc you use them in your search.

Too often I find librarians like to give the impression they found the perfect search statement by magic on their first try, which intimidates users. 

I would also highly recommend doing field searches, or metadata only search options if available, if you try such searches and get weird results.

Systems like Ebsco discovery service give you the option to restrict searches to metadata only or not search in full text.

For Summon, if you expect a certain keyword to throw off the search a lot due to full-text matches, doing title/subject term/abstract etc only matches might overcome this.

Try for example


So what do you think? Do you agree that increasingly you find doing a basic search is enough? Or am I understating the value of a nested boolean search? Are there studies showing they increase recall or precision.

Wednesday, June 11, 2014

8 surprising things I learnt about Google Scholar

Google Scholar is increasingly becoming a subject that an academic librarian cannot afford to be ignorant about.

Various surveys have shown usage of Google Scholar is rising among researchers, particularly beginning and intermediate level researchers.  Our own internal statistics such as link resolver statistics and views of Libguides on Google Scholar, tell a similar story. 

Of course, researchers including librarians have taken note of this and there is intense interest in details about Google Scholar.

I noticed for example in April....

More recently there was also the release of a Google Scholar Digest  that is well worth reading.

Sadly Google Scholar is something I've argued that libraries don't have any competitive advantage in, because we are not paying customers, so Google does not owe us any answers, so learning about it is mostly trial and error.

Recently, I've been fortunate to be able to encounter and study Google Scholar from different angles at work including

a) Work on discovery services - lead me to study the differences and similarities of Google Scholar and Summon (also on systematic reviews). Also helping test and setting up the link resolver for Google Scholar.

b) Work on bibliometrics team  - lead me to study the strengths and weakness of Google Scholar  and related services such as Google Citations and Google Scholar Metrics vs Web of Science/Scopus as a citation tool.

c) Most recently, I've was studying a little how entries in our Institutional repositories were indexed and displayed in Google Scholar.

I would like to set out 8 points/features on Google Scholar that surprised me when I learnt about them, I hope they are things you find surprising or interesting as well.

1. Google does not include full text of articles but Google Scholar does

I always held the idea without considering it too deeply was that Google had everything or mostly everything in Google Scholar but not viceversa.

In the broad sense this is correct, search any journal article by title and chances are you will see the same first entry going to the article on the publisher site in both Google and Google Scholar.

This also reflects the way we present Google Scholar to students. We imply that Google Scholar is a scope limited version of Google, and we say if you want to use a Google service, at least use Google Scholar, which does not include non-scholarly entries like Wikipedia, blog entries unlike in Google.

All this is correct, except the main difference between Google Scholar and Google, is while both allow you to find articles if you search by title, only Google Scholar includes full-text in the index.

Why this happens is that the bots from Google Scholar are given the rights by publishers like Elsevier, Sage to index the full-text of paywalled articles on publisher owned platform domains, while Google bots can only get at whatever is public, basically title and abstracts. (I am unsure if the bots harvesting for Google and Google Scholar are actually different, but the final result is the same).

I suppose some of you will be thinking this is obvious, but it wasn't to me. Until librarians started to discuss a partnership Elsevier announced with Google in Nov 2013.

Initially I was confused, didn't Google already index scholarly articles? But reading it carefully, you see it talked about full-text

The FAQ states this explicitly.

A sidenote, this partnership apparently works such that if you opt-in,  your users using Google within your internal ip range will be able to do full-text article matches (within your subscription). We didn't turn it on here, so this is just speculation.

2.  Google Scholar has a very comprehensive index , great recall but poor precision for systematic reviews.

I am not a medical librarian, but I have had some interest in systematic reviews because part of my portfolio includes Public Policy which is starting to employ systematic reviews. Add my interest in Discovery services meant I do have a small amount of interest in how discovery systems are similar and different to Google.

In particular "Google Scholar as replacement for systematic literature searches: good relative recall and precision are not enough" was a very enlightening paper that drilled deep into the capabilities of Google Scholar.

Without going into great detail (you can also read Is Summon alone good enough for systematic reviews? Some thoughts), the paper points out that while the index of Google Scholar is generally good enough to include almost all the papers eventually found for systematic reviews (verified by searching for known titles) , the lack of precision searching capabilities means one could never even find the papers in the first place when actually doing a systematic review.

This is further worsened by the fact that Google Scholar, like Google actually only shows a maximum of 1,000 results anyway so even if you were willing to spend hours on the task it would be a futile effort if the number of results shown are above 1,000.

Why lack of precision searching? See next point.

3. Google Scholar has 256 character limit, lacking truncation and nesting of search subexpressions for more than 1 level.

Again Google Scholar as replacement for systematic literature searches: good relative recall and precision are not enough gets credit from me for the most detailed listing of strengths and weaknesses of Google Scholar.

Some librarians seem to sell Google and Google Scholar short.  Years ago, I heard of librarians who in an effort to discourage Google use, tell students Google doesn't do Boolean OR for example, which of course isn't the case.

Google and Google Scholar does "implied AND" and of course you could always add "OR", As far as I can tell the undocumented Around function  doesn't work for Google Scholar though.

The main issue with Google Scholar that makes precision searching hard is

a) Lack of truncation
b) Unable to turn off autostemming - (Verbatim mode available only in Google, not sure if + operator works for Google Scholar, but it is depreciated for Google)

These are well known.

But I think lesser known is that there is a character limit for search queries in Google Scholar of 256. Apparently if you go beyond, it will silently drop the extra terms without warning you. Typical searches of course won't go beyond 256 characters, but ultra precise systematic review queries might of course.

Another thing that is is interesting to me is the paper I believe states that nested boolean operators beyond one level will fail.

4. Google Scholar harvests at the article level, hence it is difficult for them to give coverage lists.

Think many people know that Google Scholar's index is constructed very differently from databases in that it crawls page by page, pulling in content it considers Scholarly at the article level.

This meant that multiple versions of the same article could be pulled into Google Scholar and combined, so for example it could grab copies from

  • the main publisher site (eg Sage)
  • an aggregator site or citation only site
  • a Institutional repository 
  • even semi-legal copies on author homepage, Researchgate, etc
All these versions are auto-combined.

I knew this, but only fairly recently it dawned on me this is the reason for why Google Scholar does not have a coverage list of publication with coverage dates.

Essentially they pull items at the article level, so there is no easy way to succinctly summarise their coverage at the journal title.

Eg. Say there is a journal publisher that for whatever reason bars them from indexing, they could still have some articles with varying coverage and gaps by harvesting individual articles from institutional repositories that may have some of the articles from the journal.

Even if they had the rights to harvest all the content from say Wiley, the harvester might still miss out a few articles because of poor link structure etc.

So they would in theory have coverage that could be extremely spotty, with say an article or 2 in a certain issue, a full run for some years etc.

As a sidenote, I can't help but compare this to Summon's stance that they index at the journal title level rather than database level, except Google Scholar indexes at a even lower level at the article level. 

Of course in theory Google Scholar could list the publisher domains that were indexed?

That said, I suspect based on some interviews by Anurag Acharya when asked this question, fundamentally Google doesn't even think the coverage data is useful to most searchers. I believe he notes, that even though databases with large indexes have sources listed, it still provides little guidance on what to use and most guides recommend just searching all of them anyway.

Other semi-interesting things include
  • Google Scholar tries to group different "manifestations" and all cites are to the this group
  • Google Scholar uses automated parsers to try to figure out author and title, while may lead to issues of ghost authors, though this problem seems to be mostly resolved [paywall source]

5. You can't use Site:institutionalrepositoryurl in Google Scholar to estimate number of entries in your Institutional repository indexed in Google Scholar

Because of #4 , we knew it was unlikely everything in our institutional repository would be in Google Scholar.

I was naive and ignorant to think though one could estimate the amount indexed in our institutional repository in Google scholar by using the site operator.

I planned to do say Site: in Google Scholar and look at the number of results. That should work by returning all results from the site right?

Does Harvard's institutional repository only have 4,000+ results in Google Scholar?

Even leaving aside the weasel word "about", sadly it does not work as you might expected.  In the official help files it is stated this won't work and the best way to try to see if there is an issue is to randomly sample entries from your institutional repository.

Why? Invisible institutional repositories: Addressing the low indexing ratios of IRs in Google Scholar has the answer.

First, we already know when there are multiple versions, Google Scholar will select a primary document to link to. That is usually the one at the publisher site. The remaining ones that are not primary will be under the "All X versions".

According to Invisible institutional repositories: Addressing the low indexing ratios of IRs in Google Scholar., the site operator will only show up articles where the copy in your institutional repository is the primary document (the one that the title links to in Google Scholar, rather than those under "All X versions")

Perhaps one of the reasons I was mislead was I was reading studies like this calculating and comparing Google Scholar indexing ratios.

These studies, calculate a percentage based on number of results found using the site:operator as a percentage of total entries in the Institutional repository.

These studies are useful as a benchmark when studied across institutional repositories of course.

But I think assuming site:institututionalrepository shows only primary documents, this also means the more unique content your Institutional repository has (or for some reason the main publisher copy isn't indexed or recognised as the main entry), the higher your Google Scholar indexing ratio will be.

Some institutional repositories contains tons of metadata without full-text (obtained from Scopus etc), and these will lower the Google Scholar indexing ratio, because they will be typically under "all x variants" and will be invisible to Site:institutionalrepositoryurl

Other interesting points/implications

  • Visibility of your institutional repository will be low if all you have is post/preprints of "normal" articles where publisher sites are indexed. 
  • If the item is not free on the main publisher site and you have the full-text uploaded on your institutional repository Google scholar will show on the right a [Pdf ] from yourdomain

Seems to me this also implies most users from Google Scholar won't see your fancy Institutional repository features but will at best be sent to the full-text pdf directly, unless they bother to look under "All X versions"

  • If the item lacks an abstract Google Scholar can identify, it will tend to have a [Citation] tag. 

6. Google Scholar does not support OAI-PMH, and use Dublin Core tags (e.g., DC.title) as a last resort.

"Google Scholar supports Highwire Press tags (e.g., citation_title), Eprints tags (e.g., eprints.title), BE Press tags (e.g., bepress_citation_title), and PRISM tags (e.g., prism.title). Use Dublin Core tags (e.g., DC.title) as a last resort - they work poorly for journal papers because Dublin Core doesn't have unambiguous fields for journal title, volume, issue, and page numbers." - right from horse's mouth

Also this

Other interesting points

  •  "New papers are normally added several times a week; however, updates of papers that are already included usually take 6-9 months. Updates of papers on very large websites may take several years, because to update a site, we need to recrawl it"
  • To be indexed you need to have the full text OR (bibliometric data AND abstract)
  • Files cannot be more than 5 MB, so books, dissertations should be uploaded to Google Books.

7. Despite the meme going on that Google and especially Google Scholar (an even smaller almost microscopic team within Google) does not respond to queries, they actually do respond at least for certain types of queries.

We know the saying if you are not paying you are the product. Google and Google Scholar have a reputation for having poor customer service.

But here's the point I missed, when libraries put on their hats as institutional repository manager, their position with respect to Google is different and you can get responses. 

In particular, there is a person at Google Darcy Dapra - Partner Manager, Google Scholar at Google, who is tasked to do outreach for library institutional repositories and publishers.

She has given talks to librarians managing institutional repositories as well as publishers in relation to indexing issues in Google Scholar.

Her response in my admittedly limited experience when asking questions about institutional repository items presence in Google Scholar is amazingly fast. 

8. Google Scholar Metrics - you can find H-index scores for items not ranked in the top 100 or 20.

Google Scholar Metrics which ranks publications is kinda comparable to Journal Impact factor or other journal level metrics like SNIP,  SJReigenfactor  etc

First time I looked at it, I saw you could only pull out the top 100 ranked publications by languages (excluding English).

For English, at the main category and subcategories it will show the top 20 entries.

Top 20 ranked publications for Development Economics

I used to think that was all that was possible, if the publication was not in the top 20 of each English Category or Subcategory, you couldn't find a metric for the publication.

Apparently, not as you can search by journal titles.  Maybe obvious but I totally missed this.

As far as I can tell entries 7 & 8 above are not in the top 20 of any category or sub-category yet there is a H5 index for this.

How is this useful? Though we would frown on improper use of journal level metrics, I have often encountered users who want some metric, any metric for a certain journal (presumably they published in it) and they are not going to take "this isn't the way you use it for research assessment" anyway.

When you exhaust all the usual suspects (JCR, SNIP, Scimago rank etc), you might want to give this a try.

 Other interesting points

  • There doesn't seem to be a fixed period of updates (e.g Yearly)
  • Some Subject repositories like  arXiv are ranked though at more granular levels eg arXiv Materials Science (cond-mat.mtrl-sci)
  • Suggested reading


Given this is Google Scholar, where people figure things out by trial and error and/or things are always in flux, it's almost certain I got some things wrong.

Do let me know any errors you spot or if you have additional points that might be enlightening. 

Thursday, May 22, 2014

Types of librarian expertise - are they getting easier to acquire for non-librarians?

I have been recently thinking of the types of expertise academic librarians have and how recent trends in academic librarianship have made things harder.

1. Domain (basically knowledge of the research area)
2. Systems (how to actually use the search interface)
3. Information seeking (more on the structure of information and how to construct searches, etc.)

She goes on to mention a 4th type "interactional expertise", all of this is very interesting.

What I would like to consider is a) The difficulty to acquire such expertise and b) sources of such expertise, do academic librarians have a competitive advantage there?

To be precise, can someone who is not an academic librarian (here defined as someone who does not work in a academic library) easily acquire the same type of expertise academic librarians have assuming time is no object?

Domain expertise

The importance of domain knowledge for academic librarians is a well debated area (eg  Should libraries hire phds rather than just MLIS holding academic librarians?) and is I suspect the hardest expertise to acquire as it is the most specific of the three expertises.

Academic librarians definitely do not have any competitive advantage here and probably are not expected to. After all the typical academic librarian needs to have domain expertise in the area of library and information science as well as the discipline he is serving as a liaison, so it is not realistic to expect much.

Systems expertise

My understanding of  "Systems expertise" may differ from Christina, but I am of the view this type of the expertise is probably the easiest to acquire and getting easier with time.

Most databases (Scopus, Web of Science) come with extremely detailed help files, the typical academic librarian learns about them through trial and error and as a last resort referring to help files, same as any other user.

There really isn't any inherent difficulty for someone bright to sit down, read the help files on Boolean operators, proximity and field searching. The concepts maybe a bit alien if you have never encountered it before but once you got the basic concept down it's a matter of just "button hunting" on different platforms really. (Or am I undervaluing this expertise?)

Arguably a lot of systems require "Systems expertise" simply because they are so poorly designed and not due to any inherent difficulty.  

Improving user interfaces make systems expertise easier to acquire

While user interfaces of typical library systems are still poor, they are improving all the time and as someone remarked on Twitter, Google (or search technology in general) is probably not going to get any worse, so I expect in time, it will be easier for someone who is not a librarian to acquire this expertise. 

One thing that occurs to me though is librarians generally do have a competitive advantage here because they typically have direct contact with the library vendors supplying databases, discovery services etc. A minority of academic librarians like myself have additional expertise in terms of troubleshooting discovery systems, link resolvers etc because we are the ones who decide on the settings that may affect results.

So for example, if I need information on a certain feature in Scopus that isn't covered in the documentation, my Elsevier rep would just be a email away to answer my questions as we are direct customers. (Note : That said my experience is often you get very obscure or worse outright wrong answers from support staff of such companies because the developers are usually shielded)

This is where it occurs to me the typical competitive advantage exist enjoyed by librarians over our users lies. That and the fact that database vendors love to keep changing their interfaces and no-one but a librarian has the time to keep looking at them. :) 

But our competitive advantage is diminishing 

The typical researcher doesn't know who to contact if they have a question on say text datamining, though I would add in recent years this advantage has diminished because there are signs that publishers, database vendors are trying to reach out and engage directly to users not least by setting up Social media accounts like Facebook, Twitter as well as offer other direct services to users.

A interesting question to consider is this, when JSTOR or some popular database is down, do people complain to your library's social media channels or do they do it on say JSTOR's facebook page?

Why should our users ask us when they have a specific question about JSTOR , when they can get their answers from the horse's mouth?

Users are increasingly using systems not under our direct control due to cloud services

Increasingly as libraries adopt cloud services such as LibGuides, Summon, some next generation platforms in the cloud, there is an interesting side effect that libraries are becoming even more of the middle man.

For example take the library catalogue, in the past it would be something locally hosted so if anything went wrong, only our library would be affected and only our library could fix it.

But now we use Summon as our main search, and when Summon is down, every Summon using library is affected , this means ANU, Duke etc. Libraries that use Primo Central hosted in the cloud would be in the same boat.

In such cases, going direct to Summon people would be far more effective than going through the library because effectively we can't do anything anyway.

Users are increasingly using systems by parties we do not have privileged relationships with 
Another thing to consider is that increasingly users are shifting to systems that are not provided by the library (even as a intermediary). 

We are talking about Google, Google Scholar, Zotero, Mendeley (unless you get the institutional version maybe) etc.

Despite intense interest by researchers and librarians on Google Scholar (5 out of 10 of the most hots articles in LIS field for April 2014 are on Google Scholar!), librarians just don't have any privileged access to Google

Google is known for poor customer service, and  "if you are not paying, it means you are the product" also means libraries can't really demand answers, though I have found recently Google Scholar does respond to questions about entries from institutional repositories, where libraries serve as a source of information. In hindsight, this shouldn't be surprising as Google Search responds to webmasters as well but not users.

Academic librarians like myself who are tasked to be "knowledgeable" on Google Scholar, are reduced to reading up whatever literature exists by researchers who themselves spend time figuring things out by trial and error.

This may make you more knowledge than the typical person who has not read the literature, but anyone even a non-librarian who bothers to do that can achieve the same level of knowledge = you have no competitive advantage. 

Information seeking expertise

Christina's defines this as "more on the structure of information and how to construct searches, etc." It's a little vague and may blend in with systems expertise but I presume this refers to knowledge in general of the scholarly communication cycle and how it affects search.

This can be anything from knowing typical sources, how the fields and information are typically structured in general (e.g controlled vocabulary) .

In some areas, the typical researcher may actually have a competitive advantage over the librarian. For example, after all he actually does the research and knows what sources to use and what type of searches work.

It would be a very bold librarian indeed to suggest to a distinguished eminent professor that the librarian knows more about scholarly communication and information seeking in his area of expertise! 

It could be argued that academic librarians do have some advantage in that they are typically aware of new products first, though even this may not be as true anymore as mentioned above.

Other expertises

Of course searching is increasingly becoming a smaller part of the academic librarians skillset. We are told we need to learn how to support all stages of the scholarly communication cycle such as
  • Supporting grant proposals
  • Research data management
  • Reference management
  • Bibliometrics 
to name just a few.

Perhaps this is wise course of action, as searching is only to get easier.

Of course, even if the typical librarian has no competitive advantage over the typical researcher, and the typical librarian expertise is becoming easier to acquire, it doesn't mean academic librarians are doomed.

I assumed that "time is no objection" but of course it is!

First of all, beginning researchers would be much weaker in the 3 expertise mentioned, though I suspect most will eventually acquire them on their own even without librarian guidance.

Also just because a distinguished eminent professor could devote time learning about the ins and outs of Google Scholar, doesn't means that he will. 

They have their own fields of expertise after all.  

Add the synergistic effect of a bright academic librarian who has a unique blend of the 3 expertises... this is where our value lies....

Wednesday, May 7, 2014

Lazy Scholar - Interesting Chrome extension - a review and comparison with other library find full text options

Every librarian worth his salt knows that despite the rise of web scale discovery services, Google and Google Scholar are often the go-to tools of researchers.

In particular, while we prefer to direct our users to the official published version, we know that any free copy will often work in a pinch and Google Scholar in particular is the #1 tool out there to look for free copies floating on the web.

With the rise of Open access particularly Green Open access, with researchers depositing preprints into institutional repositories and discipline specific repositories (not to mention sites like ResearchGate, which may or may not be legal etc) such a strategy of searching Google Scholar for "free" copies is getting more and more important.

I recently discovered this Chrome plugin called Lazy Scholar that automates this step of searching for free copies via Google Scholar.

This plugin appears to be created by a Phd student Colby Vorland and does not appear to originate or is influenced via LibraryLand (Thinks to Chris Bourg for drawing my attention to it via Twitter) , so it is interesting to see how this stacks up with the Libx plugin  which is "is a joint project of the University Libraries and the Department of Computer Science at Virginia Tech."

In libraryland, we have basically solved the issue of users searching Google Scholar and linking to full text via Google Libary Link Program

Google Scholar Library Links

But what happens if users just Google (or link via other means) and land up on the publisher page, or some other indexing service that has no full text like RePec, or PubMed?

Our solutions tend to be either adding the proxy (via Proxy bookmarklet is most popular though there are many many ways), or the more powerful Libx plugin , which allows among other things to leverage unique identifiers like DOI, PMID and use the library's link resolver to find the appropriate copy.

Both these solutions focus on getting access to the official published copy with searching Google Scholar for free copies as a secondary thought. (though the link resolver might have a link that you can click to search for a copy in Google Scholar as a secondary method if it can't find a subscribed version).

Library's Link resolver provides a last ditch effort to search Google Scholar by article title, when there is no known full text via library sources

Lazy Scholar's approach is different. It attempts to locate the free copy first, though it does give you the option to add the proxy similar to proxy bookmarklet as well as leverage the link resolver via Scraping library links in Google Scholar. Arguably, in many ways, we can see how this plugin reflects the mindset of a researcher and how this plugin reflects the mindset of a researcher. Why go through the library with complicated passwords when you can get the free copy first?

Basic functionality

By default, you need to click the Lazy Scholar button, and it will attempt to scan the page you are on for an article and it will display a notification if it detects an article where full text can be found (via Google Scholar).

What happens if Google Scholar can't locate free full text? Lazy Scholar will display the below up to 2 other options.

You can always add your library proxy to the current page, but that basically duplicates stuff like the proxy bookmarklet.

Of course, adding the proxy directly to the page often doesn't work, because either (1) You may have full text elsewhere rather on the current page, or (2) It is a page that has only the abstract and no full text itself (eg Pubmed, Repec, repositories that list metadata no full text etc)

This is where the useful "I noticed you are signed on into institution login ....." link comes into play. Where does that link go to?

What happens is that Lazy Scholar will check the Google Scholar result to not only look for free full text but it will also see if a library link to full text is available -
This is the normal Findit@.... entry you find next to Google Scholar, if you have set up library links (see above).

This link will be scraped and added to the "I noticed you are signed on into institution login. Click here to go there" link.

Note : There is currently a known bug if you are using Google Scholar outside the United States. Typically when you use Google Scholar you will be redirected to a country specific subdomain, like in my case, I am always sent to rather than and if you set up library links it will be on the country subdomain version and Lazy Scholar won't be able to detect it.

The workaround is to go directly to (notice the lack of the .sg) and set library links there as well.

You can also turn on auto-detect in the options (right click on LS icon and select options), though you have to set up permissions.

If these recommended settings are set, when you visit any page that it detects as an academic article, and if it detects free full text via Google Scholar, there will be a popup on the top right and clicking on it will send you there.

If no free paper is found on Google Scholar, you will see a different popup.

Could be wrong but the autodetect doesn't seem to give the option of "I noticed you are signed on into institution login. Click here to go there" link.

In any case while I haven't done a full test, Lazy Scholar seems capable of recognising titles from a wide variety of sites including but not limited to

  • Sciencedirect
  • RePec
  • Emerald
  • PubMed
  • ACM Digital Library
  • Taylor & Francis
  • Oxford Journals
  • Science AAAS

JSTOR doesn't seem to work at all and Wiley works sometimes (there is a bug for some). I am unsure how detection works (Libx uses COINS)

In any case, Lazy Scholar's greatest benefit is when used on pages that do not host full-text themselves.

Just for fun, I tried it on our Dspace institutional repository, which currently consists mostly metadata of articles published by our researchers. Lazy Scholar works beautifully (though where it links to is interesting)

My testing shows auto-detect can be a bit buggy, it may take a while to pop something up and sometimes it is faster just to click on the button manually. Even manually clicking on the button will occasionally be slow.

That seems to be the main function finding free full text via Google Scholar, but there are other features.

Display times cited and altmetrics 

Lazy Scholar also tries to be helpful to assist in assessing the quality of the paper.

It also displays Google Scholar Times cited and Web of Science times cited scores(available if you are in-campus at an institution that has Web of Science) - scraped from Google Scholar.

Altmetric scores are also included not sure how useful this is, but it was probably added because it was easy to do, though you do have to opt in via options.

This plugin is pretty cutting edge, with support for the new PubMed Commons, so you can see if there are any comments in this new system.

Citation/Reference Management

As the focus of Lazy Scholar is to make locating pdfs for download easy, it naturally has features like auto-renaming, ability to export references to EndNote, Google Scholar Library etc, or just copy and paste citations in APA, MLA etc.

Other ways to get papers....

What happens if the paper you want is not free and adding the ezproxy does not work?

Most researchers know that a workaround is simply contact the author direct. LazyScholar, makes this easy by trying to locate the author email and displaying it for you.

Slightly more controversial I think from the librarian point of view is the support for #icanhazpdf

In case, you are unaware this is the practice of tweeting for articles you want behind paywalls you can't access with the hashtag #icanhazpdf (Some analysis here)  and most of course there are debates on the morality and legality of this. 

Some librarians would much prefer researchers come to us in the libraries (what about those who don't have access to academic libraries though?) so we can do document delivery or even consider subscription (if demand is high enough), but from the point of view of researchers this might be too slow.

Of course, this feature is optional, but it does reflect the mindset of researchers who just want access to what they need as fast as possible.

I guess if you are concerned about support for #icanhazpdf, the following possible feature in testing mentioned in the  blog post  PDF exchange: need beta testers might concern you even more.


I find LazyScholar a useful plugin that I can expect to use (though there are bugs such as slowness and on one of my systems LazyScholar just refuses to run despite starting a brand new profile), but more than that I see it's development as a fascinating insight into what researchers want. 

LazyScholar definitely resonates with what researchers want because I was retweeting this and the next thing I know, one of the researchers at our institution who are following me picked it up and blogged about it

Compared to Libx plugin  a library backed plugin, Lazy Scholar has less integration with library systems (besides inserting the proxy, which I suspect is independently discovered by researchers all the time without library input) and with a focus towards getting the full text by the fastest and any means not necessarily through the library.

Lazy Scholar focuses first and only on Google Scholar to search for free copies, a strategy that will pay off as time goes by, their own statistics (though it is early days) show about 1/5 of articles can be found that way and only them tries library access via Ezproxy (not link resolver)

Again this differs from I suspect most library specific efforts where searching Google Scholar for free copies is the last resort.

That said the scraping of library links from Google Scholar is outright brilliant.

Ideally on any page with article metadata, the ideal way is to somehow evoke the library link resolver to be sent to the appropriate copy or place where you have access.

Adding the proxy doesn't solve the appropriate copy problem so while it works most of the time, it is not the most accurate and may fail even if the library has access.

Both Lazy Scholar and Libx provides ways to activate the link resolver but in different ways and degrees.

Libx supports COINS as well as autolinking of unique identifiers like DOI, ISSN and leverages basically the link resolver, with no reliance on Google Scholar (the "magic button" function is the only thing that usesGoogle Scholar).

On the other hand, Lazy Scholar relies almost entirely on Google Scholar, pulling in free text or via library links (using the librarylink resolver).

This caters to users who Google (not Google Scholar) or otherwise managed to get to some page with article metadata and don't know where to get the full text. They can now use the link resolver (in a way) to get to full text.

The main weakness I can see is if something isn't covered in the index of Google Scholar, Lazy Scholar can't do anything beyond adding the proxy. This is of less concern then you think because Google Scholar has one of the largest if not largest index of Scholarly material, so almost everything you come across in other sources would probably be indexed in it.

It is probably the librarian in me talking but LibX still feels better to me since it handles books etc, it will be interesting to see if Libx can incorporate this particular feature, though philosophically you can see the difference between the two.

That said, if you are not affiliated with any academic library, Lazy Scholar is by far superior as it automatically gives you free full text.

Monday, April 21, 2014

More heretical thoughts on library trends - Altmetrics, discovery and spaces

Almost 4 years ago in 2010, I posted A few heretical thoughts about library tech trends.

I was sceptical about QRcodes, Mobile websites and SMS reference. How did I do?

It's a bit difficult to gauge whether a given technology was absorbed or rejected into the industry, certainly you can't go by the number of presentations in conferences, since the lack of mentions could either mean the idea died out or the idea is so common place that it is not considered something worth talking about.

Nevertheless, I will try to score myself.

I think I pretty much got QRcodes right - it seems successful only in a limited number of library use cases such as scavenger hunts and novelty video walls like the one in NCSU.

NCSU Listen to Wikipedia video wall with QRCode is heavily used

I feel the general concept is sound, linking physical location to online locations but the technology, the supporting infrastructure, alignment of the stars? is still lacking.

With regards to mobile library websites, I was mostly wrong, though it still accounts for a relatively small amount of traffic at my institution, it is undoubtedly important particularly at public libraries though the current trend seems to be shifting towards "mobile responsive" rather than pure mobile sites.

With regards to SMS reference, I am unsure how it stands. The launch of Springshare's LibAnswers platform with SMS reference has probably made it easier for libraries to provide this service and usage probably increased but whether it has caught on to become a staple like how chat reference is one in US academic librarians is unclear to me. I give myself a half-right score.

In all for 3 predictions, I score myself between 33% to 50%.

More predictions

So let me go on and try to gaze into the crystal ball and predict the fates of 3 more library trends...

Focus on Discovery vs Delivery - Delivery will win 

I've always was a bit conflicted when it came to web scale discovery services, which are basically attempts by academic libraries to compete with Google and Google Scholar by creating "one-search" systems that mimicked the functioning of commerical web searches.

Even back in 2012, just before the final launch of the web scale discovery service at my institution, I blogged "Playing devil's advocate. Why you shouldn't implement a web scale discovery service.", though I ultimately came down on the side of implementing web scale discovery.

The essential argument here was first made strongly by Utrecht University in Thinking the Unthinkable: A Library without a Catalogue.

The argument goes as follows, Google has won the discovery wars, libraries should not try to fight a war that has long being lost and should instead focus on making delivery of content easier from Google, Google Scholar or whether our users were searching from.

More poetically I wrote in "The day library discovery died - 2035"

"While most academic libraries in the 2010s bought into the discovery meme, a few others saw the writing on the wall sooner and their rallying cry was "delivery not discovery" and decided to opt out of the discovery wars which they saw (correctly as history will record) libraries had already lost and ceded discovery mechanics to outside the library but focused on delivering content once discovery occurred elsewhere.

The majority of commenters branded such moves as "Defeatist" , deriding such moves as turning the library back to a warehouse (albeit a digital one with libraries focusing on digitizing their special collections and having in available everywhere).

In the end though the final seeds of the defeat of the library discovery movement was sowed by the librarians themselves in a completely different direction. By 2030, to their immense surprise, Open Access became the norm (the story of how this came to past is too long and complicated to detail here), hitting over 75% of all published literature on average and close to 100% in areas like life sciences.

This all but annihilated the reason discovery systems existed - aka the need to track which of the articles your institution had access to. "

While some comments to this tongue-in-cheek blog post was that I was far too conservative to put such a far-off date of 2038, currently, the idea of eventually giving up on search is still a minority position.

So to hold to this position is I believe still a minority and hence heretical thought, as can be seen from the reaction to the presentation by Utrecht University given last week at  #UKSGLive.

Recently Lorcan Dempsey  wrote about two opposing forces or trends in library strategy, one involves centralization of services and resources around a given central network presence aka library website, another involves decentralization typically citing the meme "embed the library into the researcher workflow"

He names them centripetal trend and centrifugal trend respectively (Lorcan has a history of coining terms that eventually stick, not sure here it works).

Clearly this desire for setting up "Full library discovery" is at odds with support delivery not discovery and it may be unclear yet which trend will win out.

He acknowledges that while the decentralization trend has a lot of interest, it is still a "emergent interest". I agree and would argue based on my knowledge of typical institutional incentives, in general the centralization trend will be stronger all things equal.

But all things are not equal and I expect more and more libraries to eventually more towards the decentralization side of things.

Altmetrics (certain portions relating solely to social media shares) will mostly be forgotten

A while back a colleague was referring to altmetrics and remarked offhandedly, "This is an area where Aaron is the expert".

I was kinda mystified why this was said since I didn't have any real expertise in this area (beyond the usual surface professional reading of areas affecting academic librarianship)

Then it dawned on me that most probably, the colleague was under the mistaken impression that just because one is familiar with social media this will automatically grant familiarity with Altmetrics.

Even if familiarity with social media grants an automatic understanding of the field of bibliometrics (trust me it doesn't, I acquired some limited degree of understanding by actually working on it in day to day work),  the problem of course is Altmetrics while commonly associated with social media mentions such as tweets, also includes other indicators that have nothing much to do with social media and arguably are even more important.

Altmetrics, simply is short for ALTERNATIVE metrics so it can be pretty much anything except traditional bibliometrics citations, though for some reason social media mentions eg tweets, likes on Facebook seem to be the ones most associated with it.

There are many ways to classify altmetrics, Plumanalytics a platform which seems to have the most number of metrics currently classes metrics into

  • Usage - Downloads, views, book holdings, ILL, document delivery
  • Captures - Favorites, bookmarks, saves, readers, groups, watchers
  • Mentions - blog posts, news stories, Wikipedia articles, comments, reviews
  • Social media - Tweets, +1's, likes, shares, ratings
  • Citations - PubMed, Scopus, patents
Early research on altmetrics have focused mostly on seeing if any of these altmetrics (1) correlate with future citations and (2) whether they are distinct types of metrics showing different types of impact (typically using statistical methods like principal component analysis)

I have read through quite a few papers on altmetrics and scanned through so many over the last 2 years that they all start to blur in my mind.

But the overwhelming impression I get is that Social media type metrics like Tweets and shares are currently pretty much useless in predicting citations. Worse yet, because the vast majority of articles are not tweeted or otherwise spread via social media, trying to use such metrics to even focus attention on what to read tends to be hit or miss. Even if tweets did a good job of directing their attention to new fast breaking research (something that is a shaky assumption) you should read, it would have low recall and you were just as likely to miss other important work.

Here's one sample conclusion. Essentially not enough Top 1% cited papers (fairly recent ones after the rise of social media) have altmetrics so altmetrics are outperformed on recall/precision compared to Journal citations scores. 

If that is the case, you might as well just glance at articles in top journals. 

That said, some research so far seems to suggest other type of metrics particularly ones relating to Mendeley readership figures do correlate moderately with traditional bibliometrics citations. This makes sense when you think about it.

Moreover I am not quite sure I am on the same side as Why you should ignore altmetrics and other bibliometric nightmares.

It makes a pretty devastating case against altmetrics though it probably makes the same points made by formal research except with more anecdotal evidence.

Still at times the author seems to be making a case against Bibliometrics as a whole, rather than merely altmetrics (though it irks him more than traditional metrics), he pretty much belongs to the "read and understand the paper to judge it's worth" school of thought, so I guess any metric will be useless in his mind.

"Altmetrics are numbers generated by people who don’t understand research, for people who don’t understand research. People who read papers and understand research just don’t need them and should shun them."

I would add that Altmetrics is a vast and evolving field (see Top 5 altmetrics trends to watch in 2014) with many confusing terms such as altmetrics vs article level metrics (ALM), so I don't pretend to understand everything about it.

It also is associated with what I feel are many good ideas, such as tracking of other research outputs or artifacts rather than just research papers, including datasets, code, videos, figures (figshare etc) etc. 

I wouldn't say all altmetrics will be pointless, but I am going to stick my neck out and say that those relating to social media mentions particularly tweets will be of little practical use (even excluding predictions of future citation success). It is possible they may measure impact in the sense of what the public is interested (read tickled or amused by) , but that is of limited use.

I expect in say 3-5 years times, most altmetrics platforms like Impactstory, and Plum-analytics will drop measures like tweets or Google+ shares, so if they are included will be increasingly ignored.

Also as noted in the Q&A , social media platforms like Twitter can have fleeting life spans, are we so sure 10,20, even 50 years from now, we will all still be using Twitter or even Facebook?

While it's true that tweets might measure another type of impact, my suspicion is in the long run (5-10 years) researchers will not care as incentives do not reward interest in these areas.

3d Printers, collaborative spaces and a truly heretical thought

Though 3d printers are starting to appear in collaborative spaces , Knowledge commons etc of academic libraries, a shot in the dark prediction is that they will be mostly a fad.

In the long run, they will continue to be located in labs and not in the library. One argument is that 3d printers belong in a library to promote collaboration, but that seems to me to be a weak argument, since you can pretty much use that argument to include practically any piece of equipment you can find in a lab or anywhere for that matter. Is this mission creep?

I am not the only one who thinks so. But maker spaces are really a minor issue.

The key point is collaborative spaces and learning/digital commons are the academic libraries' (and public libraries') attempt to hold on to space we don't need any more when more of our items are available online.  Will we succeed?

Right now the paradigm model held out as the "academic library of the future" is NCSU's James B. Hunt Jr. Library. Others like University of Technology, Sydney (UTS)  in Australia have similar models . We talking about big large hi-tech libraries where space allocated to print books are limited (or even non-existent) and instead users use high-tech "library retrieval system (LRS)" to browse and select books that will be automatically brought from a off-site repository to the user on demand.

Will this catch on and most university administrators be willing to sign off on such undoubtedly costly academic libraries?

Or will the libraries' of future be lean and mean? If we started a academic library today from the scratch with no preconceptions would we insist on occupying so much space? Could it be simply that because we started with so much space assigned to us due to our print collection, now that when we no longer need it (as much) we continue to find excuses to hold on to what we are used to?


I won't be surprised if all of the 3 thoughts above turn out to be totally wrong, though some are fairly modest.

What do you think? What current trends do you see that you suspect are mere fads or blind alleys in today's environment?

Share this!

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Related Posts Plugin for WordPress, Blogger...