Harvesting Rioxx: optimising repositories for content aggregation

A previous blog post over the summer of 2022) reported that a v3.0 Release Candidate of Rioxx had been published. Publication of this candidate has stimulated additional feedback from the community and the Rioxx Governance Group (RGG) is therefore working hard to address this feedback, ahead of finalizing the publication of Rioxx v3.0.

In the meantime, members of the RGG recently presented at The 3rd Workshop on Open Citations and Open Scholarly Metadata 2022 (WOOC2022) about Rioxx: Enhancing discovery and enriching the scholarly graph with the Research Outputs Metadata Schema (Rioxx). This contribution highlighted recent Rioxx developments and how it had the potential to contribute to the burgeoning scholarly graph; but it also highlighted the importance of Rioxx as a facilitator of aggregation. That is, how Rioxx can better support the operations of content harvesters which are seeking to aggregate repository content.

Aggregation services harvest content from thousands of repositories across the world, improving its discovery and enhancing its potential re-use. They also generate a rich data platform for delivering advanced text and data mining (TDM) features, capable of unlocking new knowledge and scientific discoveries. The CORE API, OpenAIRE Discover portfolio and BASE represent obvious examples. But the growth of aggregated data also enables innovation and tangible benefits of the 'network effect', such as the CORE Recommender and CORE Discovery.

Improving the efficacy of aggregation is therefore important. We have previously reported on how Rioxx supports aggregation efficacy by expressing repository metadata in a more harvester friendly way. The RGG mission continues to focus on this, and the summer 2022 release candidate is a contribution to making Rioxx even better for aggregators. Because aggregation is a key component of the scholarly repository concept: that thousands of open repositories can operate an indenpdent, distributed service model but simultaneously contribute to the scholarly commons.

The value of aggregators, which provide data tools for TDM and the creation of new services, is already upon us. And that value is being created every minute; with every object harvested from a repository, the value of the wider aggregation grows. However, with even better metadata about what repositories contain, we can improve the efficacy of harvesting and ergo the value of the aggregation. We can improve the computational load endured by many aggregators by having better metadata -- a very relevant concern when we are all revisiting the energy footprint of our system infrastructures. But we can also enrich the quality of that data to make the wider aggregation even more useful. Think of the way this enrichment could benefit the 'data platform' ethos offered by the likes of CORE -- and then consider the impact this could have on the creation of new knowledge or data-based innovations.

Finalising Rioxx v3.0 has taken longer than expected. But its eventual introduction will be to add value to the quality of scholarly commons and the essential data platforms that underpin it.