What Is A Repository For?

If you haven’t heard, in 2024 Humanities Commons will be launching a completely reimagined open-access repository. It’s currently under heavy construction. So we’ve been asking ourselves: Why does the Commons have a repository in the first place? At our heart we are a social network, a hub for scholarly exchange. Most of us don’t think “repository” when we think about social networks like Mastodon or Instagram or Facebook. So what exactly is a repository? And why will the new repository be so vital to the life of the Commons?

A Vault

A repository can be understood as a vault. This is how most academic repositories started out. A vault is a place where we put things to keep them safe, to make sure they aren’t damaged and don’t disappear. This function is crucial for academic research. It matters that journal articles, conference posters, data sets, and all the other products of academic research survive intact. This is why, for example, we’re building the new repository using InvenioRDM, a platform developed at CERN that provides the foundation for open repositories like Zenodo. One reason we chose InvenioRDM is that it preserves a series of versions for each work it contains. Even if the creator wants to expand or improve the material, the community can go into the “vault” and find the earlier versions that may have shaped our conversations. This is also why we use enterprise-level secure cloud infrastructure to store the works uploaded to the repository, and why we make sure the data is always backed up in a physically separate datacenter.

On the other hand, when something goes into a vault we need to be sure it won’t get lost in with all the other things we’ve stored. This is why we use a Digital Object Identifier (DOI)  system for works in the repository and register those records with DataCite. A DOI is a label that points unambiguously to one unique work among all of the documents held by world’s research repositories and libraries. The InvenioRDM framework allows us to give each version of a document its own DOI. So anyone in the world can rely on being able to retrieve the right version of the right work, no matter how many records are added to the Commons. 

Of course, the problem with a vault is that the things inside are hidden behind several inches of concrete and steel. When a repository is primarily a vault, the material inside isn’t very accessible. This is why so many researchers think of institutional repositories as a place research goes to die. If a repository is only focused on preservation, its files will rarely be opened. So a repository in today’s academic world needs to focus on more than just preservation.

A Broadcast Tower

We can also think of a repository as a broadcast tower. It can be a platform to boost a work’s reach, to get it heard. This is how people often think of commercial services like academia.edu. People upload their research so that it will show up in searches, so that a worldwide audience can find it regardless of what library they may (or may not) have access to. 

How will the new Commons repository broadcast researchers’ work? Reaching an audience is partly about open access. This is not just a matter of letting visitors view the works on the repository site free-of-charge. It is also about letting other open access services and sites “re-broadcast” works from the Commons collection. So we will offer free access to the Commons repository in the formats that other tools and aggregators can use: a REST API, OAI-PMH streams, and (later on) the COAR Notify protocol. And we will embed data about each work in its repository page so that it is catalogued by services like Google Scholar. This extends the audience for members’ work far beyond the circle of people who visit the Commons.

But effective broadcasts have to be received. The tower has to send out the right signals on the right frequencies. Otherwise they’ll be lost in the background static. Researchers depositing their work need to know that it will be found by the audiences that matter to them. So works in the new repository will not just be accessible, they will be discoverable. Our move to the InvenioRDM platform will allow us to offer robust and user-friendly search, so that Commons visitors can quickly find the works that most interest them. Further down the line we are planning a smart recommendation engine that will intelligently suggest works connected to your current interests and projects. For creators, this all means that their work gets noticed by people who will pay attention.

Why bother building a new broadcast tower if services like academia.edu already exist? Commercial services bring their own agenda. If they don’t charge for access (which limits a work’s reach again) they have to use the data you upload to make money. Like Facebook or Instagram, they often scrape data from deposits and sell it on to other companies who may or may not have the public good in mind. Commercial services may feed their users’ work and their search activity into AI models like ChatGPT, making their responses “smarter” and allowing the owners of the commercial service to charge higher and higher fees for their AI tools. 

The new Commons repository will be a broadcast tower too, but one run by academics, responsible to the research community, working without a profit motive. The Commons does not use your research to make money. We are built and run by a non-profit research group (Mesh Research) based at Michigan State University and supported by academic organizations like the Modern Language Association. We do not sell users’ data, and we do not use what you upload to train commercial AI tools. We are looking at the possible uses of AI to make the Commons work better for its users–for things like more intelligent search and recommendations. But we will begin training those tools on works that are licensed to allow open re-use, and moving forward we will ask you clearly whether you want your research to be used in training our internal AI tools. So you do not have to trade your data for your work’s extended reach. We will serve as a broadcast tower for your research without selling off information about you or your behaviour.

A Filter

Some other kinds of digital repositories are also filters. Institutional repositories will often collect only the work of scholars at a particular university or research institute. Many repositories allow only work published (or heading for publication) in traditional academic journals. Some will allow only the kinds of work that we have traditionally considered “academic”–articles, monographs, etc. In other words, repositories often heavily restrict the material they allow in, trying in one way or another to be a kind of quality-control filter. 

The Commons repository also acts like a filter, but in a different way. We don’t focus on keeping things out. Two of our core values at the Commons are the open exchange of ideas and creative experimentation. So we encourage contributions from beyond the boundaries of conventional academic institutions. Our repository already includes many peer reviewed papers by experts at the top of their fields. But we don’t try to decide who should be part of the conversation. We will also be allowing a very wide variety of media and digital objects as contributions to the Commons repository: peer reviewed articles and books, but also visual art, performance recordings, interview transcripts, data visualizations, poster presentations, slide shows, software applications…just about anything.    

How will we act as a filter if we let everything in? Not by keeping things out, like a coffee filter keeping out the grounds. Instead the Commons community will provide a filter through which visitors can view the works, like a visual filter applied to an image. Community members will be able to make comments on public works in the repository, give public feedback, and engage in ongoing dialogue. So the academic community of the Commons will together create a filter through which visitors view the works that are contributed–not by hiding anything, but by adding their evaluations. Visitors will see all the works that have been added along with the community’s response to each one. These assessments are fallible, of course, and the community may sometimes sharply disagree. But visitors can use the dialogue about a work to inform how they receive it. This public response and discussion isn’t part of InvenioRDM out-of-the-box. It’s something we’ll be building on. So it will likely come as part of updates released after the initial public beta launch next year.

Right at launch, though, the new Commons will allow groups to create “overlay” publications like open-access journals entirely within the repository. A group of journal editors will be able to receive submissions, interact with authors, accept or reject submissions, and publish the final accepted articles, all inside the Commons repository. Most of this editorial flow is built-in to the InvenioRDM platform. Because we are committed to open scholarship, journals that build on the Commons infrastructure will also never be “locked in” to our platform. Not only will this eliminate many of the costs of running a journal, but it will be another way in which the Commons repository can act as a filter. Groups within the Commons who have specialist expertise can identify collections of works that they agree deserve attention. Groups on Humanities Commons have always been able to create curated collections of CORE deposits, and the new repository will expand on this functionality, allowing Commons members to evaluate and filter repository works for one another.  

A Workroom

The most innovative aspect of the new repository will be how it allows works to grow and get better over time. All of the metaphors I’ve used so far can describe static things: objects sitting in a vault, fixed messages to be broadcast, or images viewed through a filter. In the new Commons repository, though, a work can be a living, evolving project. The new repository will be a collaborative workroom where the fruits of research can grow and get better over time.

The versioning system in InvenioRDM already assumes that a work is not static. It allows us to present a series of versions and watch the work change and improve. InvenioRDM allows members to publish draft works and freely revise their files and data. When they decide they want to capture a snapshot of the work at a certain stage, they can establish a fixed “version.” Then a new draft can be created, and the work’s evolution can continue until the work’s owner wants to capture the next fixed snapshot. This ability to treat a work as an evolving, growing thing is another factor that led us to choose the InvenioRDM platform.

When we combine this versioning of drafts with the ability to discuss a work, a whole new kind of academic publishing becomes possible. The printing press forced us as researchers to treat our work as a set of finished products to be consumed by readers in a one-way relationship. Feedback and evaluation from the academic community was a long, slow process. It can take years for a journal article to be published, and several more years for other area specialists to digest it and frame responses in their own articles. In between there might be dialogue in conference discussions or on social media, but those forms of dialogue are divorced from the process of writing and publication. 

Not only is this system slow, but it can lead to an unhealthy academic culture. Researchers are encouraged to “make a stand” on the positions they publish. They can feel the need to defend those positions and become more closed to new data or alternative proposals. Scholars also don’t get much recognition for helping to refine and improve one another’s work. All that can be “counted” is the publication line on the CV. This situation has begun to change in with the advent of the “preprint” paper. The academic community is given a chance to read and review an article before it goes to publication, and the writers have a chance to make changes before they’re “committed” in print. But the very name “preprint” still puts all the emphasis on the static publication that results. Few of the platforms available include a forum for back-and-forth dialogue. Scholars still don’t have good ways of recording the contributions they have made to that refinement process, and the process is still divorced from the publication that results in the end.

The new Commons repository will facilitate what we believe is a healthier form of academic conversation. It will still be possible to upload fixed, static publications. But members can also choose to treat a work published on the commons as a living, growing, evolving thing. The integrated forum for feedback and response will give members a record of their involvement, making it possible for them to document the critical task of helping to refine a work. Just as formal peer review is coming to be recognized as a form of scholarship in its own right, the new Commons repository will provide members with a way to document their involvement in a kind of ongoing open “review,” so that their activity can “count” as part of their scholarly output. The result, we believe, will be a more collaborative and fruitful kind of academic conversation.    

Balance and Synergies

So the new Commons repository will be simultaneously a vault, a broadcast tower, a filter, and a workroom. There can be tension between these tasks. We cannot be the kind of filter that keeps out “inferior” works while also being an accessible broadcast tower and a workroom for evolving drafts. We have to perform the filtering function differently. Our decision to balance all of these roles also presents some challenges that we’ll have to overcome. We will need to structure the discussion around deposits in a way that discourages “trolls” and rewards constructive dialogue, and we have decisions to make about how to moderate contributions. 

But these are challenges we can meet creatively, with input from the Commons community. Some of the solutions have already been found. How can we be both a workroom and a vault? By versioning the works and allowing public drafts to develop in between versions. How do we balance being a broadcast tower with being a workroom? We will provide flexible access controls, allowing members and groups to have a limited conversation about a work until it is ready to join the broader conversation.

There are also unexpected synergies that emerge from this combination functions in a single repository. Members’ comments about various works will eventually (some time after launch) be publishable via ActivityPub streams. The discussion about each work will be visible on its repository page. But beyond the boundaries of the Commons, other people will be able to subscribe to particular discussions or follow a particular member’s comments from, say, their hcommons.social or other Mastodon account. This can boost the visibility of a work, and it can also raise the profile of a scholar’s work as an open “reviewer.” The workroom-like functions allow for new broadcast tower-like possibilities. 

When it launches in 2024, the new Commons repository will itself be a work-in-progress. We are committed to continual learning and experimentation. We believe, though, that what we are building–with InvenioRDM as a solid backbone–will be much more than just another “repository.” We may even decide that “repository” isn’t even a big enough label for what we’re making. Whatever we call it, this platform for our members’ work will be another step toward a healthier kind of academy. 

3 responses to “What Is A Repository For?”

  1. Thank you for the article. Looking forward to the new “repository”

  2. This is a wonderful vision for a new kind of repository that could change academic culture. Meanwhile, however, many academics must still publish in other journals, many of which are not open access. Yet HCommons communities may still want to include them in their curated list of resources, or otherwise make them findable through HCommons. Can the new repository act as a referatory as well as a repository?

    1. Yes! The new repository will let you create “metadata only” records that point (via links) to files deposited elsewhere. You’ll get all the benefits of discoverability and discussion on the Commons, even though the files can’t be included directly. You can even create records pointing to publications that are behind a paywall.

      There will also be flexibility about how you make files available when you upload them on the Commons. You’ll be able to restrict access to the files themselves, either indefinitely or for an embargo period.

      You’re right that we have to work in the world we’ve got, even while we’re building a different one. Hopefully the new Commons repository will let members bridge the two. And if there are any other features like this that you would like to see, please let us know!