Enhancing Statistics: Google Analytics and Visualization APIs


  • Graham Triggs




OR2010, Usage Statistics, Library and information sciences, DDC: 020


Usage statistics have been an important topic in the repository community for some time. From Minho's DSpace additions, through Interoperable Repository Statistics, to @mire's Solr based contribution do DSpace 1.6, there have been many approaches to providing statistics. One technique that has been used in a few places is to set up a Google Analytics account. These have several advantages - free, independent of repository (and it's architecture), proven scalability, excellent tools for visualizing the data. But it has historically had its problems too - doesn't understand the structure of the repository (for displaying totals or top views/downloads for an arbitrary grouping of the content), doesn't track downloads without additional work (or those directly linked from search engines), and the reports are locked behind an authentication wall and can't be opened up to general repository users. With the [April 2009] release of an API to retrieve data from Google Analytics, that has changed. Data that has been calculated in Google Analytics can be pulled back into the repository, so that it can be viewed within context, and by anyone that can access the repository (or not, depending on implementation). This presentation shows how Google Analytics can be integrated with the repository, techniques for capturing data that wouldn't normally be available with Analytics, and making the data comprehensible through visualizations. Whilst the implementation presented here was initially conceived using a DSpace repository, the techniques can be replicated in any repository software. Further, the visualization methods are independent of the analytics data themselves, so can be adapted for other sources of data.