Data from CiteULike’s new article recommender
CiteULike’s automated recommender system, as described in Toine’s excellent post, has been running live for 6 weeks now. We have some early data to share.
When users look at their recommendations list, they have the option of accepting (and thereby copying the recommended article into their own library) or rejecting (and clearing the article to make way for a new recommendation).
Here is breakdown of accepted recommendations:
and the rejected ones:
The totals are: 9930 rejected, 2323 accepted.
We are pretty excited by that percentage. It suggests that recommendations based on an algorithm working on item co-occurrence in CiteULike user’s libraries works pretty well in a real world setting. It is clearly helping people discover articles they were unaware of.
There are a huge number of refinements we can make to this, and we are busy doing just that. For example, we have not begun to look at the 6m+ tags as part of the system yet.
More than anything, I believe that it shows the quality of the citeulike dataset for a task like this. (We already make the whole CiteULike bookmarking dataset available for download; there are dozens of independent research projects using the data in institutions around the world right now).
Of course, the automated recommender system is only a few weeks old. CiteULike’s social functions have been enabling user’s to share and discover articles from day one.
We count the end result of the social discovery by the number of articles users have copied from each other’s libraries. That happened 99,159 times in the last year, 8827 times in the last 30 days.
That is almost certainly an underestimate, it does not count the times people find an article, view it on the publisher’s site and post from there.
It’s really gratifying to see the social discovery of science generated by the simple act of keeping your references public on a Web page.