Friday, January 27, 2012

Open Access, Curation and Seredipity

The issue of open access has raised itself up again this last week in the chemistry blogosphere. Rich Apodaca, author of the Depth-First blog, caused a stir when he came out in support of the Research Works Act, legislation that would at present prevent the NIH from requiring that research publications support by NIH grants cannot be placed in PubMed Central where they are freely accessible by anyone. (The bill of course, is written more broadly than that, but that would be the most obvious and immediate impact.)

But all this is prologue.
  • Rich made one argument that I strongly disagree with - that for a journal, the imprimatur provided by a journal is still valuable and needed.
    "Any scientist who has been an active participant in scientific publication as an author, reviewer, and consumer recognizes that the only remaining value added by scientific publishers today is imprimatur. Imprimatur is the implied endorsement received by authors who publish in certain scientific journals, particularly in those that earned a high level of prestige during the pre-digital period of publication scarcity."
    I disagree. The value of a "prestigious" journal is not the prestige, it is something far more valuable that we all are implicitly aware of, something that journal editors provide, something that gives the journal it's prestige and that is curation - deciding what is and isn't important. It is this step that provides the ultimate value of a journal. Without appropriate curation (lots more on that in a minute), the journal becomes a meaningless pile of data. Being peer-reviewed, it is accurate (more-or-less) but without guideposts.

    As I said, we are all aware of this already, but just not explicitly. When that prestigious journal brag about how few papers they accept, what the journal really bragging about how much they curate. They are able to make great decisions about what their readers want to see and what they don't want to see. Effective curation over time leads to imprimatur. Without effective curation, your journal has papers which may or may not be of great value, but a landmark article in the Upper Midwest Journal of Photochemical Interactions in Northern Blots will never give that journal any prestige.

    To me, if I was a journal that wanted to convince people that they should pay for it in this age when "information just want to be free" is the mantra, I would sell curation hard, very hard. All the other services that journals used to supply are now easily duplicated by all the disruptive technology available. But curation, and the ability to perform it well, is something valuable that still remains. It is what distinguishes Angewandte Chemie from Tetrahedron Letters. It needs to be recognized and it is worth paying for.
  • Let me go back to the subject of curation in a broader context.

    As long as there has been mass media, there has been curation. It has never been possible for a mass media source to publish/broadcast everything so choices were made as to what was and wasn't going to be put out for consumption. In the past, those choices were made by a small group of people. If you were unhappy about their choices, there was little you could do. With the internet, curation initially appeared to have died, as everyone could have access to everything. But even with all those choices, mainstream media sites still remain popular destinations because of their ability to effectively curate. As with very selective journals, when the New York Times makes the statement "All the News That's Fit to Print", they are referring to their curation abilities.

    Curation will never disappear. I would suggest that there is a basic human need for it. If it is lacking, someone, somewhere will create it. It is also fairly obvious that it is needed now more than ever. The information that we have access (or potential) access to is greater than ever before, and becoming greater with each passing day. Some of the most visited websites - the Drudge Report is a terrific example - are nothing more than curation sites. Matt Drudge decides what to link to on his page and doesn't provide any explicit editorial comment other than the headline for the link, and yet his sites has millions of hits per day because he has a great sense of what people want to see. If he ever loses this sense, his page will drop in its importance.

    Curation has also started appearing in our search engines. This is certainly a new concept, one that has never before occurred in our history. Early search engines on the Internet gave a haystack of results with little effort to prioritize the results. The searcher was expected to find the good from the bad. Google became a dominant player because it was able to prioritize webpages - decide what was more important. But they didn't stop there. It is now becoming more well known that Google will further alter search results based on information it has gathered about you from past searches. A recent example of this was seen in the engineering sub-reddit, where an engineer posted a screenshot of what the Google image search showed for "pump" - quite a few high heeled shoes, more than he would have liked.

    Here's what an identical search for me produced.
    A lot less shoes. As you can see, all the technical searches that I perform at work are then used by Google to slant the search results towards what I am most likely interested in. (I imagine that the reddit engineer was searching on a computer often used by his girlfriend.)

    I've noticed over the years that my Google searches have become increasingly productive. I used to think that it was because I was becoming a better searcher - choosing better search terms - but now I am not so sure that Google isn't doing a better job of providing the results. Regardless, Google's efforts to provide relevant search results is a rather desirable outcome - in most situations. I'm looking for specific information and I don't have a lot of time, so not having to scroll down the page at all is invaluable
  • And yet, there are times where curation is that last thing that we need. With the more poignant search results, it becomes increasingly difficult to stumble upon something new, something that you find interesting, something that you didn't know you were looking for. I'm old enough to remember when libraries had card catalogs - collections of cards in drawers that could be searched to find a book in the library. Searches were typically possible by the title of the book, the author and by subject. The unintended beauty of the catalog was that you could be assured that the first card you looked at was not the card you wanted, and so you then started flipping through more cards until you found the one you wanted. And sometimes in this search, you would find something that wanted without knowing that you wanted it - serendipity! Further, once you found the call number for the book, you had to wander the stacks to find it, which could lead to further unexpected results, although that was not as likely as books were generally arranged by subject. (Ethan Zuckerman notes that Harvard is considering enforcing serendipity by reorganizing all the books by size!)

    Now when we find books via computer search (either from libraries or Amazon), serendipitous outcomes are pretty much impossible. Even if you misspell a word, the search algorithms will overcome that and give you the "intended" result. Can you have serendipitous search results with Google? A decade ago when their algorithms weren't so refined, you could have. Now you can't. There is no "random search" option, no "show me something new" option, no "tickle my brain" option. And this is not to pick on just Google. Twitter will only show you what you have indicated that you want to see. Same with Facebook and endless other sites.

    Serendipitous results can be found, but ironically, you have to work at it. Reddit/Digg/StumbleUpon and similar sites are places where people send links that they think are interesting. Their main pages provide the top results, so you will certainly see something new, although it will be what is most popular, a description that makes me wonder about humanity's fate at times.

    Wikipedia is another place where it is relatively easy to get off the beaten path if you start following the links in an article or the "see also" links. And serendipity is why I enjoy the journals Nature and Science - they have research from all fields of science, not just the polymers and rheology that I specialize in. (I do note the irony that these journals are also very well curated. More journals of such broad coverage in the future would be another route to enforce serendipity!)
Curation and serendipity are opposite sides of the coin, but paradoxically, we need both. Curation seems to have the upper hand at present and that trend will continue for the foreseeable futher, while serendipity is being forced to the side, something that we have to struggle to keep alive.

And finally getting back to the issues of open access that this post started with, curation is still the last valuable service performed by journal publishers that cannot be replace by technology - at present. Can algorithms be developed that can perform the curation for us? I wouldn't ever bet against that. In fact, I would expect to see it in my lifetime. But the real question is this: do we want that? (Sadly, I think the question might be better stated as: How can we prevent that?)


As a last comment, the blog entry by Ethan Zuckerman, Desperately Seeking Serendipity is a wonderful read about serendipity in cities and other geophysical structures. It is a long article, but well worth the time. Mind-opening.