A machine learning model that can detect paper mill publications should be incorporated into medical journal manuscript screening processes, researchers say.
Almost 10% of cancer papers in medical journals are flagged as coming from paper mills, with China the most common source, a study has found.
Published in the BMJ, the study used a machine learning model to distinguish paper mill publications from genuine cancer research articles, and to screen the cancer research literature to assess the prevalence of papers that have textual similarities to paper mill papers.
When applied to the cancer research literature, the model flagged 261,245 of 2.7 million papers (9.87%).
More than 170,000 cancer papers affiliated with Chinese institutions were flagged, accounting for 36% of Chinese cancer research articles. Flagged papers were overrepresented in fundamental research and in gastric, bone, and liver cancer.
Flagged papers were seen across most medical journal publishers including the top 10% of journals by impact factor. However a small group of medical publishers had higher proportions of flagged papers, with companies such as Verduci Editore, International Scientific Literature, Frontiers and Baishideng having more than 10% of papers flagged.
There was a large increase in flagged papers from 1999 to a plateau in 2022.
The study authors said the higher proportion of flagged papers in gastric and liver cancer research may partly be explained by the high prevalence of these cancers in China.
“Their marked overrepresentation among misidentified cell lines—25% and 15% of all such lines, respectively—is striking,” they wrote.
“Given that some misidentified cell lines, such as BGC-823 and BEL-7402, appear almost exclusively in publications from Chinese institutions, this pattern may also reflect vulnerabilities exploited by paper mills when popular research topics are targeted. Furthermore, this pattern could result from inertia because early templates were reused and adapted repeatedly in these domains.”
They said the rise in the percentage of flagged papers in high impact journals suggested that paper mill papers were not just a problem for low impact journals.
“The concurrent increase in impact factors and the spread of flagged papers suggest that both phenomena may stem from the pressures of the publish-or-perish culture. The increase in flagged papers in high impact factor journals highlights an important limitation of using impact factors as proxies for research quality,” they wrote.
The authors said their model could be deployed by medical publishers to screen submitted cancer related manuscripts for paper mill involvement. The model was already integrated into the online submission systems of three journals from a major publisher, they noted
However they warned that paper mill publishers would be quick to react and innovate as detection methods such as this model threatened their income. The release of ChatGPT and the rise of generative AI might further blur the boundaries between genuine and fabricated texts, rendering future automated detection of fraudulent features more challenging, they predicted.
“While efforts to combat paper mills may evolve into an arms race, this problem has reached an unacceptable scale. Inaction risks allowing paper mills to spread further, potentially compromising entire journals and publishers—as already seen in the case of Hindawi,” they concluded.

No comments:
Post a Comment
Add a comment