Star Wars and Dr Fraud- A history of scientific sting operations.


Small warning- this is a long read (about 2400 words). This first post will cover the basics  of some notable sting operations, such as who designed them and why. Later, I’ll talk more about people’s responses, and about the potential downsides of creating stings.

What’s an unexpected connection between Seinfield, chocolate, Mad Libs and Star Wars? All have inspired scientific sting operations- missions designed to expose flaws with how scientific research is published and publicised.

Scientific stings are interesting to me because they create a rare opportunity for conversations about scientific publishing to take place. Everyone loves a takedown story, and the familiar setup and resolution of a sting can bring the unfamilar world of creating scientific research closer to non-scientists.

While there have been thousands of  different sting operations, some have made more waves than others.  Many have targeted bad actors within academia- corrupt journals and conferences which focus on profit rather than knowledge- while others have taken aim at news media and even Google. Today I’ll be talking about the stings which have recieved the most media attention, and made the most impact.

The “Sokal Affair”

Testing- Whether editors would accept nonsense if it used the “right” buzzwords. (They would).

In the 1990’s, many (mostly American) academics were caught up in arguments about the nature of science. Postmodernists argued that science was not built solely from facts but influenced by political and social factors like class, gender, and race.  Their opponents, scientific realists, claimed that pure scientific facts existed and that postmodernism devauled objectivity and the scientific method.

Social Text, a journal of cultural studies, was on the side of postmodernism: realists claimed it would publish anything which used the “right” postmodern buzzwords. Physicist Alan Sokal tested this theory by submitting “Transgressing the Boundaries: Towards a Transformative Hermeneutics of Quantum Gravity”, a jargon-filled parody of postmodern thought.  His article used the language of cultural studies to disguise nonsensical ideas, including that quantum gravity and reality itself were social constructions- ideas invented solely by society’s beliefs.

Social Text published Trangressing… in early 1996, not realising it was a hoax until Sokal revealed his experiment in another journal. The editors claimed they published Trangressing… because Sokal was an academic authority; to them it was not a parody but a physicist’s genuine, if poor, attempt to connect with postmodern philosophy. Social Text also argued that Sokal acted unethically by deceiving them.

 …we engaged in some speculation about his intentions, and concluded that this article was the earnest attempt of a professional scientist to seek some kind ofaffirmation from postmodern philosophy for developments in his field. His adventures in PostmodernLand were not really our cup of tea.

For Sokal, their response proved him correct. Social Text had published Transgressing… without asking anyone who specialised in physics to read it; the editors included the article based on its “correct” appearance and Sokal’s academic status, rather than its (nonexistent) merits. Ultimately, Sokal himself had the best response:

anyone who believes that the laws of physics are mere social conventions is invited to try transgressing those conventions from the windows of my apartment. (I live on the twenty-first floor.)

In this case, no harm was done. Social Text retracted the article, and a few news stories were published before attention faded away. However, since the “Sokal Affair”, other researchers have conducted similar sting operations against academic journals. Some of these repeated Sokal’s approcach and submitted jargon-filled papers with nonsense conclusions. Others removed the writer altogether.

Computer-Generated Texts

Testing: Whether some academic conferences did any quality checking on submitted articles.  (They didn’t).

In 2005, three MIT computer science students grew suspicious of their inboxes. They (and many others) had recieved floods of emails inviting them to submit their research to academic conferences. However, many of these conferences were suspect: researchers questioned whether these conferences were put on to profit from people’s entry fees rather than to promote real research.

The students- Jeremy Stribling, Dan Aguayo, and Max Krohn-  developed a program called SCIgen which would automatically generate grammatically correct yet meaningless sentences, like a computer science version of fill-in-the-blanks party game MadLibs.  SCIgen’s first paper, Rooter: A Methodology for the Typical Unification of Access Points and Redundancy, was accepted by the suspect conference, which proved the doubt over their credentials was justified. This sting was widely reported in both science and mainstream news.

We consider an algorithm consisting of n semaphores. Any unproven synthesis of introspective methodologies will clearly require that the well-known reliable algorithm for the investigation of randomized algorithms by Zheng is in Co-NP; our application is no different.

Meaningless text from Rooter.

SCIgen is freely available online and still used to create stings, mostly against lower-quality journals which are suspected of not reviewing or proof-reading submissions. In 2014, 120 different SCIgen-written papers were removed from journals owned by publishing giants Springer and Nature. (The finder, Cyril Labbé, also developed a SCIgen detection website, where anyone could upload suspect papers and compare them to SCIgen’s vocabulary).

Computer-generated papers also have a spiritual successor; papers written by autocorrect. Christoph Bartneck, a specialist in human-computer interaction with no knowledge of physics, was invited to submit an article to the 2016 International Conference on Atomic and Nuclear Physics. Bartneck, unable to write about physics, recruited his iPhone’s next-word-prediction and autocorrect functions to do the work instead. Three hours after Bartneck submitted his article, under the alias Iris Pear, it was accepted.

Pop-Culture Papers

Another way researchers and skeptics have tested suspect journals is through articles which human readers would instantly recognise as being pop-culture references.

Recently, Star Wars has done the honours. Pseudonymous blogger Neuroskeptic devloped a paper about “midi-chlorians”, the sentient microscopic creatures which live inside cells, connecting living beings to the Force, using the names “Lucas McGeorge” and “Annette Kin”. Neuroskeptic submitted the paper to nine different suspect journals; three published it, while another requested a $360 publication fee first. (All of the journals have since deleted the paper, but it is available on Scribd).  One of the journals also offered Lucas McGeorge a job.

A spoof medical case study based on uromycitisis, the fictional condition featured in a Seinfeld epsiode, has also previously caught out a suspect journal.

For entertainment value, and news coverage, these papers are interesting. However, these functionally don’t do anything different to the SCIgen/autocorrect papers, so I’m going to move on to some larger-scale stings.

Who’s Afraid of Peer Review?

Testing: Whether suspect journals would notice a disastrously flawed experiment. (Most of them didn’t).

John Bohannon, editor at Science magazine, carried out a sting on over 300 journals to see if they actually reviewed submitted papers. He did this by designing an experiment so glaringly flawed that it couldn’t produce any meaningful results. In his words:

Any reviewer with more than a high-school knowledge of chemistry and the ability to understand a basic data plot should have spotted the paper’s short-comings immediately.

Bohannon created a computer program which wrote 304 papers in the same format; “molecule a, taken from lichen b, stops cancer cell c growing”. All were identical apart from their words for a, b, and c, which were taken from databases of molecules, lichen, and cancer cells. The papers were each sent to a different journal using individual pseudonyms, all randomly-generated from a database of common African names and surnames. (Bohannon based all fake authors in universities based in Global South countries, so their lack of web presence would not alert any curious editors who atempted to search for them).

255 of the journals responded to Bohannon. 60% of them accepted or rejected their paper without reviewing it at all- i.e. without performing the fundamental role of a journal. Although 40% (106 journals) made some attempt to review their paper, about 70 of those journals still accepted their paper despite its fatal flaws. This included journals owned by major publishers such as Sage and Elsevier.

Malcolm Lader, Editor-in-Chief of the Sage journal which fell for Bohannon’s paper, apologised for the journal’s performance but also criticised Bohannon’s entire operation, saying:

“An element of trust must necessarily exist in research including that carried out in disadvantaged countries. Your activities here detract from that trust.”

The Fake Scientist

Testing: How easily the numbers used on Google Scholar, which measure the impact of researchers’ work, can be distorted. (Very).

Before developing tools to detect computer-generated papers, Cyril Labbé had previously used them to carry out a sting. However, Labbé wasn’t attempting to expose journals this time; he was instead targeting Google Scholar. Google Scholar is a search engine which links to over 160 million academic publications, book chapters, and dissertations, as well as legal cases and patents.  It’s also a way for researchers to keep an eye on their all-important H-index.

A researcher’s H-index number represents a combination of quality and quantity, in theory. Explaining it sounds like solving an algebra problem. “This researcher has published h papers in the last year, and each of those have been cited h number of times. Find the value of h for this researcher.” If researcher A published 10 papers in a year, but each was only cited once, then their H-index would be 1. If researcher B published 5 papers, but all of them were cited at least 5 times, their H-index would be 5.

While a H-index can be useful, it’s similar to a baseball batting average or a Kill/Death ratio in Call of Duty; people fasely assume that one number can tell you everything you need to know. As Labbé’s experiment showed, representing an individual’s career through just one number opens up plenty of opportunities for that number to be gamed.

Labbé created a scientist, Ike Antare, and used SCIgen to “write” a set of 102 different computer science papers under Antare’s name. Each of these papers referenced the entire set (plus one real paper), creating a network of self-referencing articles. When the papers were waved through low quality journals, the connected web of citations swelled Antare’s H-Index. Antare held the 21st highest H-Index rank in the world; by that system, he was more famous than Einstein (36th place).

Chocolate Weight-Loss Study

Testing:  Whether science journalists and news media would critique or question reports about scientific studies. (They didn’t).

A familiar face returns here, as this sting was also carried out by John Bohannon. As part of a documentary about bad science in the diet industry, Bohannon (under the psuedonym Johannes) helped carry out a deliberately terrible study to see how easily people would uncritically accept its results.

To be clear, the study carried out was not fake science- they recruited real participants, and collected and analysed real data. The study was instead mediocre science, designed to make statistical changes -“right” answers- far too easy to obtain, and also meaningless.  The researchers used a tiny sample of 16 people, split into three groups. This is already an alarm bell- any change found by comparing groups of five people is far more likely to be chance than meaningful. Even though the study was based on comparing diets, the control group was not asked to record what they ate; this means there was no way to tell how different the experimental group’s diets were from the control group, so  no way to be sure that eating chocolate caused any effect.

They also measured a large amount of variables, from logical choices such as weight and cholesterol levels, to more niche choices like sleep quality and happiness. Measuring so many variables (18) across so few people (16) is faulty science, because it increased the risk of finding something apparently meaningful solely by chance. It meant the team were almost guaranteed to find some kind of variation between the groups: they did the research equivalent of shooting holes randomly across a wall then drawing a target around some of them.

Bohannon submitted the study to a few low-quality journals. Once they accepted the study, he created a press release to get media attention for the story. News stories started rolling out. According to Bohannon, few news sites asked him any questions about the article, while those who did discuss the article did not request any details about the methods or study setup. Most news sites echoed the press release, uncritically proclaiming that chocolate was now a miracle weight loss tool without getting any outside experts or fact-checkers to read the paper.

“Dr Fraud”

Testing: Whether suspect journals would accept fake job applications, and list fake staff members. (They would).

Previous stings established that suspect journals often accept articles without carrying out any checks. But would they be more stringent about their staff?

In 2015, researchers from the University of Wroclaw in Poland created a CV for Anna O. Szust, complete with degrees, book chapters and academic social media accounts. All of these were fake, as was Anna herself. For Polish speakers, her name was an unsubtle clue- Oszust means “a fraud” in Polish.

They submitted Anna’s CV, with an application for an editor role, to 360 different open-access journals. 1/3 of these journals were known to be suspect, while 2/3 were from two different whitelists of presumed-legitimate journals. One whitelist, the JCR, passed the sting successfully- all of the JCR journals either rejected or ignored Anna O. Szust’s application. The other whitelist, the DOAJ, fared less well, as 8 of the 120 journals made her an editor.

For the already-suspect journals, it was a different story. One-third of the suspect journals appointed Anna as an editor. Some instantly accepted her application, while others held her role behind a paywall, asking for financial donations to secure her appointment. A few journals even offered Anna opportunities to start her own spin-off journal, and share in their profits.

What’s Next?

Looking at these examples, it’s easy to feel demoralised about how easily stings been carried out and how many organisations fell for them. However, successful stings aren’t the whole story, and every sting has faced critics. Carrying out stings has its own ethical problems; while they are entertaining news fodder, they might not be the best way to understand or solve problems within science.

In the next post I’ll look at some of the criticisms levelled at sting operations, as well as the responses given by scientists and journals. I’ll also try to look at the deeper question of what these stings can really tell us about how science currently works.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s