Data from CiteULike’s new article recommender
CiteULike’s automated recommender system, as described in Toine’s excellent post, has been running live for 6 weeks now. We have some early data to share.
When users look at their recommendations list, they have the option of accepting (and thereby copying the recommended article into their own library) or rejecting (and clearing the article to make way for a new recommendation).
Here is breakdown of accepted recommendations:

and the rejected ones:

The totals are: 9930 rejected, 2323 accepted.
We are pretty excited by that percentage. It suggests that recommendations based on an algorithm working on item co-occurrence in CiteULike user’s libraries works pretty well in a real world setting. It is clearly helping people discover articles they were unaware of.
There are a huge number of refinements we can make to this, and we are busy doing just that. For example, we have not begun to look at the 6m+ tags as part of the system yet.
More than anything, I believe that it shows the quality of the citeulike dataset for a task like this. (We already make the whole CiteULike bookmarking dataset available for download; there are dozens of independent research projects using the data in institutions around the world right now).
Of course, the automated recommender system is only a few weeks old. CiteULike’s social functions have been enabling user’s to share and discover articles from day one.
We count the end result of the social discovery by the number of articles users have copied from each other’s libraries. That happened 99,159 times in the last year, 8827 times in the last 30 days.
That is almost certainly an underestimate, it does not count the times people find an article, view it on the publisher’s site and post from there.
It’s really gratifying to see the social discovery of science generated by the simple act of keeping your references public on a Web page.
The recommender looks nice. I haven’t yet sat down with it to see how many suggestions I would accept, but I’ve noticed at a glance that it is recommending some papers I already have.
For example, it’s using the Citeseer ref where I use the ACM ref. I prefer to use ACM over Citeseer, because it provide more accurate information, and I don’t want to have duplicate refs in my library. So it may be that I reject a number of suggestions because of this or similar reasons.
I don’t see any suggestions that I’d reject just because they’re not interesting though. Currently, you can’t distinguish between the types of my rejection, can you? I’m wondering if the feedback you’re getting can be rather erroneous due to problems like these.
It’s a nice idea, but as implemented it recommends _way_ too many articles (IME). I easily get 100 new every refresh, and keep maybe 1 (on a good day). A way to make it considerably more selective would be very nice.
And does it re-recommend rejected articles if they are re-citeulike’d? I’m sure I keep seeing some of the same articles again.
And an “undo” button would be nice — since I reject almost everything at quite a speed, sometimes I regret my decision.
And ( more trivial, this one
) please make the acc/rej buttons slightly taller, so that how a recommendation wraps or not does not shift the buttons from under my mouse cursor after I click & the list refreshes.
@Paul
The recommendations are ordered by “most relevant” so just scan down as far as you’re bothered – we think better too many than too few.
In general, you shouldn’t see the same articles again once you’ve accepted or rejected them. Occasionally the same “article” appears twice as two different CiteULike articles but I think this is relatively rare.
@Sean
Thanks very much for your comments.
It shouldn’t recommend papers that you already have in your library. I’m guessing that what is happening is due to the fact Citeulike does not always recognize that two papers from different sources are the same. It’s pretty good at this, but it’s not always possible.
In terms of the papers you reject, at the moment we are not using this information to influence the new recommendations you get.
I take your point about distinguishing between the different types of reasons for rejection, however we took a decision to not do this right now.
[...] people’s bookmarks. It’s experiencing a great acceptance among users as they explain on the CiteUlike blog Twitter: [...]
Could you provide the source data that you are summarizing? Perhaps others could find and report interesting trends in the data.
Tim
Available datasets are at http://www.citeulike.org/faq/data.adp
Currently that does not include the recommendation acceptance/rejection data, but you get all the who posted (anonymized), what, when and tags.
is there a way to eliminate/merge duplicate articles? i am also getting CUL recommending an article thats already in my library, simply because the system somehow treated the same article submitted by 2 different people as different articles.
some sort of “merge” or “identify with another article” functionality would be nice. for now i guess i’ll just add the particular one i’m looking at (“Polychronization” if you want to take a look) again and mark it as “duplicate”