tl;dr: We’ve had some downtime due to AI bots hammering us. We’re working on it. We had another bug where profile fields disappeared. We fixed it (duplicate metadata in the wp_bp_xprofile_field table). We’re doing lots of cool new development work.
When big corporate players have problems or downtime, they tend to hide them and see them as a sign of their failure. As a community initiative with a responsibility to our supporting members and everybody who uses the platform, we want to be transparent and open when we have things that don’t go quite to plan.
A good example to follow is that of Geoffrey Bilder when he used to work at Crossref. When there were downtime incidents on the DOI system, Geoffrey provided full technical details of what went wrong and how they went about fixing it, which fostered a level of trustworthiness in the community that you do not get from hiding problems.
Knowledge Commons has had two significant technical problems in the last couple of months that I want to outline here. I also want to give an indication of the steps we are taking/have taken to address these problems. Then, finally, I will end with a more positive roundup of what we’re doing in the technical team at the moment.
Downtime
The first problem is a case of periodic downtime. We are seeing unusual traffic that we believe is related to AI scraping. This is common to many scholarly platforms at this time and it’s extremely frustrating. The traffic volumes that are being sent are simply too much for our servers and are overloading our systems.
We are taking several steps to try to address this. The first, beyond upping our server resources and ensuring we have efficient configuration options in place, is that we are implementing a Content Delivery Network (CDN) system that will offload much of our hosting to edge servers on Amazon’s AWS service. This will go some way to alleviating the mega load on our EC2 systems that is generated at present when multiple bots hit the site at the same time. Second, we are also investigating systems that try to detect when such bot spam floods are incoming. This could be the open source Anubis, with which I gather the Janeway project has had great success recently. Or, conversely, we could opt for a commercial option such as Cloudflare, which has so many more resources than are available to us otherwise and which also has huge experience in blocking bot attacks.
We apologize for the inconvenience these brief periodic downtime instances may have caused our users. However, we believe that these attacks, which everyone is experiencing, and which might only get worse as now many more people can write scrapers, signal the urgent need for a protocol for bot harvesting of material. It is extremely frustrating that discourteous request practices from AI LLMs have caused us these incidents and wasted our developers’ time on maintaining the platform’s uptime instead of working on new features.
Profiles Data Bug
The second incident was more challenging to diagnose and fix, but appeared to be incredibly serious. Yesterday evening, a user reported that his ORCID had disappeared from his profile page, along with two other profile field boxes that he had filled in that contained vast quantities of data on his publications and upcoming talks. When the user went to edit these fields, the input text boxes were empty.
We needed to find the cause of this. Quickly. At present, Knowledge Commons uses BuddyPress for its profile system, embedded within a large multi-site, multi-network WordPress installation. The first thing that we checked was that this was not a caching problem. It certainly looked like one. BuddyPress caches various aspects of its output, and we figured that the data might just be stuck in a bad cache hit where it hadn’t come through the first time. However, after flushing the cache several times and even after disabling the cache altogether, we could see the problem was still there.
The next thing that our developers did was to dive into the database to see whether or not these data had been somehow lost. That was our greatest fear, that actually, somehow these data had been overwritten and it was going to be a case of selective restoration from backups, which would be extremely challenging. However, when we looked in the database at the user who reported this problem, the data that weren’t showing on the form were clearly still in the database itself. This was both a relief and baffling. We could not understand why these data were not being pulled through. However, we had not lost any data.
(It’s worth making just a quick digression at this point to say that we have solid backups of the Knowledge Commons database and we’re able to restore this extremely quickly to a new database instance. The challenge with this kind of problem, here, was that it looked like we might have selective data loss and we don’t want to lose all of the data put in by other users that weren’t affected by this bug when we restore. So while we have backups, a selective restoration from them could be quite painful. By the time we’d seen that the data were still in the database, we felt confident that we weren’t going to need to go down this path though.)
So, there are two database tables that are crucial to the BuddyPress profile system: wp_bp_xprofile_field and wp_bp_xprofile_data. Our fields table (truncated) looks like this:
| id | group_id | parent_id | type | name |
|---|---|---|---|---|
| 20 | 1 | 0 | textarea | Education |
| 1 | 1 | 0 | textbox | Name |
| 14 | 1 | 0 | textbox | Institutional or Other Affiliation |
| 15 | 1 | 0 | textbox | Title |
| 16 | 1 | 0 | textbox | Site |
| 17 | 1 | 0 | textbox | <em>Twitter</em> handle |
| 18 | 1 | 0 | textbox | <em>ORCID</em> iD |
The idea here is that the data table contains the actual output. So there might be, for example, a field in the data table that says that the type of data stored there is an ORCID. Hence, it would reference ID 18 to make this connection, which we can find in the fields table itself, above, and which points to “<em>ORCID</em> iD”. What this means is that we don’t store repetitively the text “<em>ORCID</em> iD” for each entry in the data table. We can also, if we like, update how that is displayed. So if we wanted to get rid of the formatting around “<em>ORCID</em> iD”, we might remove the HTML emphasis tags (“<em>”) and all the data points would be unaffected, but would have a different label.
For reasons that we don’t quite understand, though, BuddyPress had been generating sets of fields that were duplicates of existing entries. So we had three entries, all reading “<em>ORCID</em> iD”, we had several for “Publications”, and some for “Upcoming talks and conferences”. This meant that when BuddyPress went to look up a field, it wasn’t sure which entry it was supposed to be using. Should it just take the first entry of “<em>ORCID</em> id”? Or should the more recent strangely generated field, also called “<em>ORCID</em> id”, have precedence?
In short, the multiple entries in this database of fields was causing us severe headaches and making these fields’ data disappear from people’s profiles. There was also a huge risk. If people went to edit their profiles and had blank boxes loading because it was loading the wrong data from the wrong field, it was possible that they could press save and overwrite the good data that we still had in the database with blank entries. So it was a time-critical fix for us to get this done before we had any data loss. Indeed, Plan B was to shut off editing of profiles temporarily in order to make sure this didn’t happen. However, we were able to discover the cause of the bug quickly enough that this was not necessary and we managed to fix the bug on the night itself.
The solution was actually very simple. We simply needed to remove the duplicate entries and leave the entry in place that had the lowest ID field. We did a careful count on each of the higher ID duplicates to make sure that there weren’t a huge number of people who had put data into them and who might lose their profile information because they used the higher ID versions. We then deleted the duplicates.
What’s the takeaway lesson from this bug? I’m not entirely sure that there really is one. This was just one of those circumstances where one of the many plugins that we support acted in a way that we didn’t quite understand and we had to spend some time tracing through the code to work out precisely what was going on. On the other hand, every incident like this is an opportunity to evaluate our observability practices. Knowledge Commons is an incredibly complex piece of software. To do what it does and act as a scholarly communications bridge and networking platform, there are a huge number of customizations to WordPress. And then, on top of that, we are running WordPress core at a scale that most platforms can’t even touch. As a result, we are now working on more logging, greater insight into what might be running at any particular time period, and of course observability of our infrastructure status. This becomes challenging when third-party plugins such as BuddyPress do not have a level of logging that allows us to locate a bug quickly. The only option to improve this observability would be to fork all of these many third-party plugins, which is something we’re not keen to do.
In any case this should now be resolved. If anybody is still missing any profile data that they think they should have, please let us know. We are confident that no data have been lost, but it is possible that we may need to realign a small number of data fields with the, now unique, lower ID field entries.
Technical Development Work Updates
Meanwhile, technical development continues apace and we hope to have some exciting new features in the not-too-distant future. The main piece of code on which I am working is rewriting our login and identity management stack. The new IDMS will allow anybody with an institutional account at a university to use that to log into their Knowledge Commons account. It will also allow independent scholars to use ORCID, Google, Microsoft and other third-party login providers. So we have thought carefully in this about what we can do to give the greatest value to the community, making sure that this includes those without an institutional affiliation. Independent scholars are extremely important to us.
There are also crucial updates happening to KC Works (our repository system), including a stats dashboard on which Ian Scott has been working. This is a major improvement to the reporting of statistics that is in core InvenioRDM and we are hoping that it will be merged back into the core, so that we can improve the projects on which we’re building and give this functionality to others. We always like to be able to contribute changes that we’ve made back upstream rather than hoarding them. This seems more in line with the ethos of what we do.
Another member of our team, Grant Eben, has been working on a set of plugins for WordPress that allow users to display a bibliography of the works they have in either ORCID or KC Works and present that in a formatted way according to particular academic style sheets such as APA, MLA, MHRA, etc. This has applications in teaching, in the presentation of a research profile, and in other research contexts. The work on the ORCID side of this was funded by ORCID themselves, and they seem pleased with the results, which is great for our team. Once more, again, this is a case of us developing something in the open and giving it back to the community for the benefit of everyone. The plugins are called Linked Open Profiles (for ORCID) and KC Works (for KC Works and possibly other InvenioRDM-based systems).
Meanwhile, our head of infrastructure, Dimitris Tzouris, has been working like a trooper to try to battle the bot spam that we mentioned earlier, but also to keep the platform’s infrastructure in good shape. There is a false assumption that one can simply set up a platform on some infrastructure and then leave it running and all will be well.

This is simply not the case. Often we find ourselves running just to stand still, solely to maintain the platform’s infrastructure. There are constant needs for upgrades to operating systems and servers. There are needs for upgrades to hardware so that we are running on the most efficient and cost effective solutions for our hosting. And these changes can affect how our software runs.
Yes, of course, we use containerization to get around the dependency problem by controlling the containerized environment. However, sometimes there are unavoidable conflicts between our software and the un-containerized environment that we just can’t dodge, and we don’t know when an upgrade might cause that. Hence, we have a test environment where these changes are staged first, and we then use automated and manual testing to see whether or not everything is working as it should. But this is no small job.
Finally, behind the scenes, Mike Thicke, our resident elder wizard of the platform, has worked on numerous smaller bug fixes, worked with Dimitris on infrastructure reconfigurations, and generally been a source of wisdom for us when we have to battle problems.
In any case, I hope that gives you an idea of what we’re working on at present. We’ve got some other exciting updates including fresh grant-funded projects that we’ll be writing about in the coming days.
