Small warning- this is a long read (~2400 words). This post will cover some notable sting operations and what they intended to achieve. In another post, I’ll talk about people’s responses, and about the potential downsides of creating stings.
What’s an unexpected connection between Seinfield, chocolate, Mad Libs and Star Wars? All have inspired scientific sting operations – missions designed to expose flaws in how scientific research is published and publicised.
Scientific stings are interesting to me because they create a rare opportunity to talk to people about scientific publishing. Everyone loves a takedown story, and the familiar setup and resolution of a sting can bring the unfamilar world of creating scientific research closer to non-scientists.
Although there have been thousands of different sting operations, some have made more waves than others. Many target individual bad actors such as corrupt journals or conferences which focus on profit rather than knowledge, while others have taken aim at larger enemies including news media and even Google. Today I’ll be talking about how some of the most well-known sting operations were developed.
The “Sokal Affair”
Testing- Whether editors would accept nonsense if it used the “right” buzzwords. (They would).
In the 1990’s, many Western academics fought with each other about how science worked. Postmodernists argued that science was not built solely from facts; theories and scientists were influenced by political and social factors like class, gender, and race. Their opponents, scientific realists, claimed that postmodernism rejected objectivity and the scientific method.
Social Text, a journal of cultural studies, was on the postmodernists’ side : realists joked that it would publish anything which used the “right” postmodern buzzwords. Physicist Alan Sokal tested this theory by submitting “Transgressing the Boundaries: Towards a Transformative Hermeneutics of Quantum Gravity”, a jargon-filled parody of postmodern thought. His article used the language of cultural studies to disguise nonsensical ideas, including that gravity and reality itself were ideas invented solely by society’s beliefs.
Social Text published Trangressing… in early 1996, without realising it was a hoax until Sokal explained his experiment in another journal. The editors claimed they published Trangressing… because Sokal was an academic authority; to them it wasn’t a parody but a physicist’s flawed attempt to connect with postmodern philosophy. Social Text also called Sokal unethical for deceiving them.
…we engaged in speculation about his intentions, and concluded that this article was the earnest attempt of a professional scientist to seek some kind ofaffirmation from postmodern philosophy for developments in his field. His adventures in PostmodernLand were not really our cup of tea.
For Sokal, their response proved him correct. Social Text had published Transgressing… without asking anyone who specialised in physics to check it; they included the article based on its “correct” appearance and Sokal’s status, rather than on its (nonexistent) merits. Ultimately, Sokal had the best response:
anyone who believes that the laws of physics are mere social conventions is invited to try transgressing those conventions from the windows of my apartment. (I live on the twenty-first floor.)
In this case, no harm was done. Social Text retracted the article, and a few news stories were published before attention faded. Since the “Sokal Affair”, other researchers have conducted similar stings. Some repeated Sokal’s approcach and submitted jargon-filled papers with nonsense conclusions; others removed the writer altogether.
Testing: Whether certain academic conferences performed quality checking on submitted articles. (They didn’t).
In 2005, three MIT computer science students grew suspicious of their inboxes. They (like many others) had recieved floods of emails inviting them to submit their research to academic conferences. However, many of these conferences seemed suspect: invitees wondered whether they existed to profit from people’s entry fees rather than to promote real research.
The students- Jeremy Stribling, Dan Aguayo, and Max Krohn- developed a program called SCIgen which would automatically generate grammatically correct but meaningless sentences, like a computer science version of fill-in-the-blanks game MadLibs. SCIgen’s first paper, Rooter: A Methodology for the Typical Unification of Access Points and Redundancy, was accepted by the suspect conference, proving that the conference did not check submitted papers.
We consider an algorithm consisting of n semaphores. Any unproven synthesis of introspective methodologies will clearly require that the well-known reliable algorithm for the investigation of randomized algorithms by Zheng is in Co-NP; our application is no different.
Meaningless text from Rooter.
SCIgen is freely available online and still used to create stings, mostly against lower-quality journals which are suspected of not reviewing or proof-reading submissions. In 2014, 120 different SCIgen-written papers were removed from journals owned by publishing giants Springer and Nature. Their finder, Cyril Labbé, also developed a SCIgen detection website where anyone can upload suspect papers and compare them to SCIgen’s vocabulary.
Computer-generated papers also have a spiritual successor; papers written by autocorrect. Christoph Bartneck, a specialist in human-computer interaction with no knowledge of physics, was invited to submit an article for the 2016 International Conference on Atomic and Nuclear Physics. Bartneck, unable to write about physics, used his iPhone’s next-word-prediction and autocorrect functions to do the work. Three hours after Bartneck submitted his article, under the alias Iris Pear, it was accepted.
Another way researchers and skeptics have tested suspect journals is by using articles that human readers would instantly recognise as pop-culture references.
Recently, Star Wars has done the honours. Pseudonymous blogger Neuroskeptic devloped a paper about “midi-chlorians”, the sentient microscopic creatures which live inside cells, connecting living beings to the Force. Using the names “Lucas McGeorge” and “Annette Kin”, Neuroskeptic submitted the paper to nine suspect journals. Three journals published it, while another requested a $360 publication fee. (The journals have since deleted the paper, but it is still available on Scribd). One journal also offered Lucas McGeorge a job.
A spoof medical case study based on uromycitisis, the fictional condition featured in a Seinfeld epsiode, has also previously caught out a suspect journal.
For entertainment value, and news coverage, these papers are interesting. However, these are functionally identical to the SCIgen/autocorrect papers, so I’ll move on to some larger-scale stings.
Who’s Afraid of Peer Review?
Testing: Whether suspect journals would notice a disastrously flawed experiment. (Most of them didn’t).
John Bohannon, editor at Science magazine, carried out a sting on over 300 journals to see if they actually reviewed submitted papers. He did this by designing an experiment so glaringly flawed that it couldn’t produce meaningful results. In his words:
Any reviewer with more than a high-school knowledge of chemistry and the ability to understand a basic data plot should have spotted the paper’s short-comings immediately.
Bohannon created a computer program which wrote 304 papers in the same format; “molecule a, taken from lichen b, stops cancer cell c growing”. The papers were identical apart from their words for a, b, and c, which were taken from databases of molecules, lichen, and cancer cells. Each paper was sent to a different journal under an individual pseudonym which was randomly-generated from a database of common African names. (Bohannon based all fake authors in universities from Global South countries, so their lack of web presence would not alert curious editors who atempted to search for them).
255 journals responded to Bohannon. 60% of them accepted or rejected their paper without reviewing it at all – i.e. without performing the fundamental role of a journal. Although 40% (106 journals) made some attempt to review their paper, about 70 of those journals accepted the paper despite its fatal flaws. This included journals owned by major publishers such as Sage and Elsevier.
Malcolm Lader, Editor-in-Chief of the Sage journal which fell for Bohannon’s paper, apologised for the journal’s performance. He also criticised Bohannon’s entire sting, saying:
“An element of trust must necessarily exist in research including that carried out in disadvantaged countries. Your activities here detract from that trust.”
The Fake Scientist
Testing: How easily the numbers used on Google Scholar to measure the impact of researchers’ work can be distorted. (Very).
Before developing tools to detect computer-generated papers, Cyril Labbé had previously used them to carry out a sting. However, Labbé wasn’t attempting to expose journals this time; he was instead targeting Google Scholar. Google Scholar is a search engine which links to over 160 million academic publications, book chapters, and dissertations, as well as legal cases and patents. It’s also a way for researchers to keep an eye on their all-important H-index.
A researcher’s H-index number represents a combination of quality and quantity, in theory. Explaining it sounds like solving an algebra problem. “This researcher has published h papers in the last year, and each of those has been cited h times. Find the value of h for this researcher.” If researcher A published 10 papers in a year, and each was only cited once, their H-index would be 1. If researcher B published 5 papers, and all of them were cited at least 5 times, their H-index would be 5.
Although a H-index can be useful, it’s similar to a baseball batting average or a Kill/Death ratio in Call of Duty; people can falsely assume that one number represents everything you need to know. However, representing an individual’s career through just one number opens up plenty of opportunities for that number to be gamed.
Labbé created a scientist, Ike Antare, and used SCIgen to “write” a set of 102 different computer science papers under Antare’s name. Each of these papers referenced the entire set, plus one real paper, creating a network of self-referencing articles. When the papers were waved through low-quality journals, the connected web of citations swelled Antare’s H-Index. Antare held the 21st highest H-Index rank in the world; in that system, he was more famous than Einstein (36th place).
Chocolate Weight-Loss Study
Testing: Whether science journalists and news media would critique or question reports about scientific studies. (They didn’t).
A familiar face returns here, as this sting was carried out by John Bohannon. For a documentary about bad science in the diet industry, Bohannon (under the psuedonym Johannes) helped carry out a deliberately terrible study to see whether people would uncritically accept its results.
To be clear, the study was not fake science- they recruited real participants, and collected and analysed real data. The study was instead mediocre science, designed to make statistical changes -“right” answers- far too easy to obtain, and also meaningless. The researchers used a tiny sample of 16 people, split into three groups. This is already an alarm bell- any change found by comparing groups of five people is far more likely to be chance than meaningful. Even though the study was based on comparing diets, the control group was not asked to record what they ate; this meant there was no way to tell how different the experimental group’s diets were from the control group, so no way to be certain that eating chocolate caused any effects.
The study measured eighteen variables, from logical choices such as weight and cholesterol levels, to more niche choices like sleep quality and happiness. Measuring so many variablesacross so few people was faulty science, because it increased the likelihood of finding something apparently meaningful solely by chance. It meant the team were almost guaranteed to find some kind of variation between the groups: they did the research equivalent of shooting holes randomly across a wall then drawing a target around some of them.
Bohannon submitted the study to a few low-quality journals. Once they accepted the study, he created a press release to get media attention for the study. News stories started rolling out. According to Bohannon, few news sites asked him any questions about the article; sites who did ask questions were not critical or skeptical. Most news sites echoed the press release and uncritically proclaimed that chocolate was now a miracle weight loss tool, without getting any outside experts or fact-checkers to read the paper.
Testing: Whether suspect journals would accept fake job applications, and list fake staff members. (They would).
Previous stings established that suspect journals often accept articles without carrying out any checks. But are they more stringent about their staff?
In 2015, researchers from the University of Wroclaw in Poland created a CV for Anna O. Szust, complete with degrees, book chapters and academic social media accounts. All of these were fake, as was Anna herself. For Polish speakers, her name was an unsubtle clue- Oszust means “a fraud”.
The researchers submitted Anna’s CV, with an application for an editor role, to 360 open-access journals. 1/3 of these journals were known to be suspect, while 2/3 were from two whitelists of presumed-legitimate journals. One whitelist, the JCR, passed the sting successfully- all JCR journals either rejected or ignored Anna O. Szust’s application. The other whitelist, the DOAJ, fared less well- 8 of the 120 journals made her an editor.
For the already-suspect journals, it was a different story. One-third of them appointed Anna as an editor. Some instantly accepted her application, while others held her role behind a paywall, asking for financial donations to secure her appointment. A few journals even offered Anna opportunities to start her own spin-off journal, and share in their profits.
Looking at these examples, it’s easy to feel demoralised about how easily stings been carried out and how many organisations fell for them. However, successful stings aren’t the whole story, and every sting has faced critics. Carrying out stings has its own ethical problems; although they are entertaining news fodder, they may not be the best way to understand or solve problems within science. Critics argue that the distrust they create may do more harm than good.
In the next post I’ll discuss some of the criticisms levelled at sting operations, as well as the responses given by scientists and journals. I’ll also try to look at the deeper question of what these stings can really tell us about how science currently works.