News from 2022

Harvesting Rioxx: optimising repositories for content aggregation

A previous blog post over the summer of 2022) reported that a v3.0 Release Candidate of Rioxx had been published. Publication of this candidate has stimulated additional feedback from the community and the Rioxx Governance Group (RGG) is therefore working hard to address this feedback, ahead of finalizing the publication of Rioxx v3.0.

In the meantime, members of the RGG recently presented at The 3rd Workshop on Open Citations and Open Scholarly Metadata 2022 (WOOC2022) about Rioxx: Enhancing discovery and enriching the scholarly graph with the Research Outputs Metadata Schema (Rioxx). This contribution highlighted recent Rioxx developments and how it had the potential to contribute to the burgeoning scholarly graph; but it also highlighted the importance of Rioxx as a facilitator of aggregation. That is, how Rioxx can better support the operations of content harvesters which are seeking to aggregate repository content.

Aggregation services harvest content from thousands of repositories across the world, improving its discovery and enhancing its potential re-use. They also generate a rich data platform for delivering advanced text and data mining (TDM) features, capable of unlocking new knowledge and scientific discoveries. The CORE API, OpenAIRE Discover portfolio and BASE represent obvious examples. But the growth of aggregated data also enables innovation and tangible benefits of the 'network effect', such as the CORE Recommender and CORE Discovery.

Improving the efficacy of aggregation is therefore important. We have previously reported on how Rioxx supports aggregation efficacy by expressing repository metadata in a more harvester friendly way. The RGG mission continues to focus on this, and the summer 2022 release candidate is a contribution to making Rioxx even better for aggregators. Because aggregation is a key component of the scholarly repository concept: that thousands of open repositories can operate an indenpdent, distributed service model but simultaneously contribute to the scholarly commons.

The value of aggregators, which provide data tools for TDM and the creation of new services, is already upon us. And that value is being created every minute; with every object harvested from a repository, the value of the wider aggregation grows. However, with even better metadata about what repositories contain, we can improve the efficacy of harvesting and ergo the value of the aggregation. We can improve the computational load endured by many aggregators by having better metadata -- a very relevant concern when we are all revisiting the energy footprint of our system infrastructures. But we can also enrich the quality of that data to make the wider aggregation even more useful. Think of the way this enrichment could benefit the 'data platform' ethos offered by the likes of CORE -- and then consider the impact this could have on the creation of new knowledge or data-based innovations.

Finalising Rioxx v3.0 has taken longer than expected. But its eventual introduction will be to add value to the quality of scholarly commons and the essential data platforms that underpin it.

Rioxx v3.0 Release Candidate 1 published

The Rioxx Governance Group (RGG) is pleased to announce the publication of the v3.0 Release Candidat 1 of Rioxx: The Research Outputs Metadata Schema -- formerly known as the RIOXX Metadata Application Profile.

Since July 2021, when the beta release of v3.0 was made available, the RGG has been working towards incorporating feedback from the repository and scholarly communications communities, and making adjustments to Rioxx to improve the way in which metadata are modelled. The publication of the candidate release today marks the end of this period of consultation but by no means the end -- it will always remain possible for anyone to document issues for future consideration at the Rioxx GitHub repository.

I would draw to the attention of readers the announcement made when the v3.0 beta release was published, as it provides a useful summary of some of the principal changes between v2.0 and v3.0. Importantly, the specific changes listed in the beta release remain in the candidate release too, which included some significant changes to the use of <dc:identifier> and <dc:relation>. But in this candidate release we have made some further significant updates:

Separation of project and grant data

The conflation of project and grant data has been a common problem across a number of metadata schema. Prior to this candidate release, Rioxx was no exception. Research projects can, and do, operate independently of grant funding. Similarly, not all grants result in the creation of a project. Moreover, some projects can persist for many years and receive multiple grants from multiple different funders; yet, many schema conflate projects and grants and thereby fail to accurately model reality. Greater separation of project and grant data is therefore something we have introduced as part of this candidate release, with the creation of <rioxxterms:grant> and the re-scoping of <rioxxterms:project>.

Wider use of persistent identifiers

A more inclusive approach to persistent identification and persistent identifiers (PIDs) across <rioxxterms:author>, <rioxxterms:contributor>, <rioxxterms:project> and <dc:publisher> has been introduced to reflect the growing maturity of the PID landscape. The v3.0 beta draft permitted a far narrower number of PID schemes. This has been addressed in the candidate release, with most extant schemes supported.

Rioxx name change

By now you will have released that the name has changed. As there exist many different repository metadata application profiles, it was not uncommon for the consultation process to elicit feedback from community members stating that the label 'RIOXX' was confusing, or that no-one could remember what the acronym stood for, or that the name of the profile failed to adequately communicate its purpose. Some even suggested that the name should be jettisoned altogether! We decided to keep the name, but we can at least make its full title a little more self-explanatory. We have thus elected to refine the name a little and de-acronymize 'RIOXX', since the acronym itself is no longer meaningful:

  • Rioxx: The Research Outputs Metadata Schema

The name change helps to communicate the raison d'ĂȘtre of Rioxx: to facilitate the improved harvesting, aggregation and discovery of repository content. This change will be reflected in all future dissemination about Rioxx.

The future

But, of course, to truly facilitate improvements in harvesting, aggregation and discovery, Rioxx needs to be adopted by repositories. This is easier for some repositories to achieve than others, and easier for some organizational teams than others. Even teams with the technical nous may lack the team capacity to oil the wheels of adoption. The RGG is in discussions with a number of bodies and organizations about the possibility of supporting Rioxx v3.0. The shape of support remains uncertain at this stage but formal endorsement and/or technical support for repositories would be a welcome start. Further updates will be posted here in due course.

George Macgregor, RGG Chair - 2022-06-24

more posts (archive)