On Constructing Repository Infrastructures: The D-NET Software Toolkit


  • Paolo Manghi
  • Marko Mikulicic
  • Katerina Iatropoulou
  • Antonis Lebesis
  • Natalia Manola




OR2010, Repository Infrastructures, Library and information sciences, DDC: 020


Due to the wide diffusion of digital repositories, organizations responsible for large research communities, such as national or project consortia, research institutions, foundations, are increasingly tempted into setting up so-called repository infrastructure systems (e.g., OAIster (http://www.oaister.org), BASE (http://www.base-search.net), DAREnet-NARCIS (http://www.narcis.info)). Such systems offer web portals, services and APIs for cross-operating over the metadata records of publications (lately also of experimental data and compound objects) aggregated from a set of repositories. Generally, they consist of two connected tiers: an aggregation system for populating an information space of metadata records by harvesting and transforming (e.g., cleaning, enriching) records from a set of OAI-PMH compatible data sources, typically repositories; and a web portal, providing end-users with advanced functionality over such information space (search, browsing, annotations, recommendations, collections, user profiling, etc). Typically, information spaces also offer access to third-party applications through standard APIs (e.g., OAI-PMH, SRW, OAI-ORE). Repository infrastructure systems address similar architectural and functional issues across several disciplines and application domains. On the one hand they all deal, with more or less contingent complexity, with the generic problem of harvesting metadata records of a given format, transform them into records of a target format and deliver web portals to operate over these records. On the other hand, they have to cope with arbitrary numbers of repositories, hence administering them, from automatic scheduling of harvesting and transformation actions, definition of relative transformation mappings, to the inherent scalability problems of coping with ever growing incoming records. Existing solutions tend to privilege customization of software, neglecting general-purpose approaches. Typically, for example, aggregation systems are designed to generate metadata records of a format X from records of format Y, and not be parametric with respect to such formats. Similarly, the participation of a repository to an infrastructure is driven by firm policies and administrators often do not have the freedom of specifying their own workflow, by combining as they prefer logical steps such as harvesting, storing, transforming, indexing and validating. In summary, repository infrastructure systems typically provide advanced and effective solutions tailored to the one scenario of interest, while can hardly be applicable to different scenarios, where similar but distinct requirements apply. As a consequence, an organization willing to set up a repository infrastructure system with peculiar requirements has to face the "expensive" problem of designing and developing a new software from scratch. In this paper, we present a general-purpose and cost-efficient solution for the construction of customized repository infrastructures, based on the D-NET Software Toolkit (www.d-net.research-infrastructures.eu), developed in the context of the DRIVER and DRIVER-II projects (http://www.driver-community.eu). D-NET offers a service-oriented framework, whose services can be combined by developers to easily construct customized aggregation systems and personalized web portals. D-NET services can be customized, extended and combined to match domain specific scenarios, while distribution, sharing and orchestration of services enables the construction of scalable and robust repository infrastructures. As we shall describe in the following, D-NET is currently the enabling software of a number of European projects and national initiatives.