Research Integrity Risk Index
The Research Integrity Risk Index (RI2): A Composite Metric for Detecting Risk Profiles
The RI2 is the first metric explicitly designed to profile research integrity risks using empirically grounded, transparent indicators. Unlike conventional rankings that reward research volume and citation visibility, RI2 shifts focus toward integrity-sensitive metrics that are resistant to manipulation and bibliometric inflation. In its current form, RI2 comprises two primary components:
- Retraction Risk: Measures the extent to which a university’s research portfolio includes retracted articles, particularly those retracted due to data fabrication, plagiarism, ethical violations, authorship or peer review manipulation, or serious methodological errors (Fang et al., 2012; Ioannidis et al., 2025). It is calculated as the number of retractions per 1,000 articles over the most recent two full calendar years before the last (e.g., 2022-2023 for an analysis conducted in 2025), normalizing for research output and time-lag effects. Elevated retraction rates may reflect weaknesses in research oversight and institutional culture. The analysis used retraction data from three databases to evaluate institutional vulnerability to research misconduct or oversight failures. To do so, the author extracted and uploaded into SciVal all available original DOIs and PubMed IDs (PMIDs) associated with retracted articles from Retraction Watch, Medline, and Web of Science. As of June 18, 2025, Retraction Watch Database listed 43,482 entries marked as “Retraction” and classified under the following document types: case reports, clinical studies, guidelines, meta-analyses, research articles, retracted articles, review articles, letters when associated with research articles, and revisions when associated with review articles. These types were selected because, after cross-referencing, they corresponded to articles and reviews in Medline and Web of Science. Following the exclusion criteria used by Ioannidis et al. (2025), 2,238 records were removed due to non-author-related reasons (e.g., “Retract and Replace,” “Error by Journal/Publisher”). Of the remaining Retraction Watch entries, 38,316 successfully matched with SciVal records using DOIs and PMIDs. The remaining 5,166 could not be matched, either because they were published in journals not indexed by SciVal or lacked identifiable DOIs or PMIDs in the database. To supplement the Retraction Watch dataset, an additional 4,416 unique retracted publications were identified from Medline and Web of Science (2,737 from Medline and 2,850 from Web of Science) that were classified as “Retracted” or “Retracted Publication” and tagged as articles or reviews. In total, 42,732 unique retracted articles were matched to SciVal and included in the analysis. Scopus was excluded due to inconsistent classification practices: its “Retracted” label encompasses a broad range of document types, including letters and editorials, making it unsuitable for isolating retracted research articles and reviews. To account for the time lag between publication and retraction, the analysis focused on articles published in 2022 and 2023, rather than 2023-2024, to better capture recent institutional behaviors while ensuring a broader view of retraction activity (Candal-Pedreira et al., 2024; Feng et al., 2024; Fang et al., 2012; Gedik et al., 2024). By June 18, 2025, the number of retracted articles stood at 10,579 for 2022, 2,897 for 2023, and 1,601 for 2024. Worldwide, as of June 18, 2025, the retraction rate for 2022-2023 averaged 2.2 per 1,000 articles, with the highest rates observed in mathematics (9.3) and computer science (7.6) and the lowest in arts and humanities (0.2). According to Retraction Watch, the 15 main reasons for retractions are: Investigation by Journal/Publisher (48%), Unreliable Results and/or Conclusions (42%), Investigation by Third Party (34%), Concerns/Issues About Data (30%), Concerns/Issues about Referencing/Attributions (26%), Paper Mill (25%), Concerns/Issues with Peer Review (23%), Concerns/Issues about Results and/or Conclusions (19%), Fake Peer Review (19%), Computer-Aided Content or Computer-Generated Content (18%), Duplication of/in Image (10%), Duplication of/in Article (8%), Euphemisms for Plagiarism (6%), Investigation by Company/Institution (6%), and Lack of IRB/IACUC Approval and/or Compliance (6%).
- Delisted Journal Risk: Quantifies the proportion of an institution’s publications that appear in journals removed from Scopus or Web of Science due to violations of publishing, editorial, or peer review standards (Cortegiani et al., 2020). This is measured over the most recent two full calendar years (e.g., 2023-2024 for an analysis conducted in 2025) and reflects structural vulnerabilities in quality control and publishing practices. Such publications continue to influence bibliometric metrics even after delisting, potentially distorting evaluative benchmarks. This component includes all articles published in journals delisted by Scopus. It also includes articles in journals delisted by Web of Science and still actively indexed in Scopus. Scopus discontinues or delists journals through an ongoing title re-evaluation program. Journals may be flagged for re-evaluation due to underperformance on key bibliometric benchmarks (citation rate, self-citation rate, and CiteScore); (2) formal complaints about publication practices; (3) outlier publishing behaviors detected algorithmically (e.g., sudden spikes in output or geographic concentration); and (4) continuous curation feedback from the Content Selection and Advisory Board. Journals that fail re-evaluation are delisted, with indexing discontinued prospectively but prior content retained. Web of Science delists journals following an in-depth editorial re-evaluation conducted by its independent in-house editors, who assess journals against 24 quality and four impact criteria. Titles are re-evaluated when flagged by internal monitoring, community feedback, or observed shifts in editorial quality. If a journal no longer meets quality standards, such as lacking editorial rigor, publishing ethically questionable content, or deviating from peer-review norms, it is removed from coverage, with future content no longer indexed. In serious cases, previously indexed articles may also be withdrawn. Between 2009 and June 2025, a total of 974 unique journals were delisted—855 by Scopus and 169 by Web of Science. Of these, 553 were indexed in Scopus during 2018-2019 and accounted for 193,369 articles, and 206 were indexed in 2023-2024 with 124,945 articles. Institutional affiliations for these articles were tracked globally to evaluate exposure to low-integrity publication channels.
Data for both components serve as proxies for broader research integrity concerns, such as paper mills (businesses that sell authorship), citation cartels (reciprocal citation networks used to inflate impact), citation farms (organizations or networks that generate or sell citations), fraudulent authorship practices, and other forms of metric gaming (Abalkina, 2023; Maisonneuve, 2025; Candal-Pedreira et al., 2024; Feng et al., 2024; Ioannidis & Maniadis, 2024; Lancho Barrantes et al., 2023; Smagulov & Teixeira da Silva, 2025; Teixeira da Silva & Nazarovets, 2023; Wright, 2024). Importantly, both components reflect verifiable outcomes rather than inferred behaviors, making them robust indicators of institutional-level risk.
To ensure consistency across diverse studies, whether analyzing 50 universities in one country or 5,000 globally, RI2 applies a fixed reference group: the 1,000 most publishing universities worldwide. This global baseline provides balanced disciplinary and geographic coverage, ensuring thresholds are not skewed by outliers. It functions analogously to clinical reference ranges: just as hypertension is diagnosed using globally standardized thresholds, RI2 classifications rely on a universal benchmark to detect structural anomalies. Key advantages of this approach include:
- Universal comparability: RI2 scores are always interpreted against the global baseline, not rescaled to local or sample-specific norms. This ensures consistency across geographic and temporal contexts.
- Stable thresholds: Risk tiers are fixed based on the empirical distribution of the reference group. For example, if the highest observed retraction rate is 3 per 1,000 articles, an institution with 1.5 retractions receives a normalized score of 0.5, regardless of sample size.
Normalization, Composite Scoring, and Tier Classification
Each indicator is scaled to a 0–1 range using Min-Max normalization relative to the global reference group. The composite RI2 score is the simple average of the two:
RI2 = (Normalized Retraction Rate + Normalized Delisted Rate) / 2
For the June 2025 edition, the retraction rate ranged from 0.00 to 26.82 retractions per 1,000 articles, and the share of articles in delisted journals ranged from 0.00% to 15.35%. These values define the normalization scale and remain fixed across all samples, ensuring stable cross-institutional comparisons regardless of geographic or disciplinary representation. Institutions are then classified into one of five risk tiers based on their RI2 score as follows:
Tier | Percentile Range | Interpretation | Score Range (June 2025 edition) |
Red Flag | ≥ 95th | Extreme anomalies; systemic integrity risk | RI2 ≥ 0.251 |
High Risk | ≥ 90th and < 95th | Significant deviation from global norms | 0.176 ≤ RI2 < 0.251 |
Watch List | ≥ 75th and < 90th | Moderately elevated risk; emerging concerns | 0.099 ≤ RI2 < 0.176 |
Normal Variation | ≥ 50th and < 75th | Within expected global variance | 0.049 ≤ RI2 < 0.099 |
Low Risk | < 50th | Strong adherence to publishing integrity norms | RI2 < 0.049 |
Key Features of the RI2 Methodology
- Global Benchmarking: Ensures representative and statistically reliable classification.
- Fixed Thresholds for Consistency: Applicable across large or small datasets, preserving interpretability.
- Transparency and Resistance to Gaming: Built on normalized, verifiable metrics rather than subjective assessments.
References
- Abalkina, A. (2023). Publication and collaboration anomalies in academic papers originating from a paper mill: Evidence from a Russia-based paper mill. Learned Publishing, 36(4), 689-702. https://doi.org/10.1002/leap.1574
- Candal-Pedreira, C., Guerra-Tort, C., Ruano-Ravina, A., Freijedo-Farinas, F., Rey-Brandariz, J., Ross, J. S., & Pérez-Ríos, M. (2024). Retracted papers originating from paper mills: a cross-sectional analysis of references and citations. Journal of Clinical Epidemiology, 172, Article 111397. https://doi.org/10.1016/j.jclinepi.2024.111397
- Cortegiani, A., Ippolito, M., Ingoglia, G., Manca, A., Cugusi, L., Severin, A., Strinzel, M., Panzarella, V., Campisi, G., Manoj, L., Gregoretti, C., Einav, S., Moher, D., & Giarratano, A. (2020). Citations and metrics of journals discontinued from Scopus for publication concerns: The GhoS(t)copus Project. F1000Research, 9, Article 415. https://doi.org/10.12688/f1000research.23847.2
- Fang, F. C., Steen, R. G., & Casadevall, A. (2012). Misconduct accounts for the majority of retracted scientific publications. Proceedings of the National Academy of Sciences of the United States of America, 109(42), 17028-17033. https://doi.org/10.1073/pnas.1212247109
- Feng, S., Feng, L., Han, F., Zhang, Y., Ren, Y., Wang, L., & Yuan, J. (2024). Citation network analysis of retractions in molecular biology field. Scientometrics, 129(8), 4795-4817. https://doi.org/10.1007/s11192-024-05101-4
- Ioannidis, J. P. A., & Maniadis, Z. (2024). Quantitative research assessment: using metrics against gamed metrics. Internal and Emergency Medicine, 19(1), 39-47. https://doi.org/10.1007/s11739-023-03447-w
- Ioannidis, J. P. A., Pezzullo, A. M., Cristiano, A., Boccia, S., & Baas, J. (2025). Linking citation and retraction data reveals the demographics of scientific retractions among highly cited authors. PLoS Biology, 23(1), Article e3002999. https://doi.org/10.1371/journal.pbio.3002999
- Lancho Barrantes, B. S., Dalton, S., & Andre, D. (2023). Bibliometrics methods in detecting citations to questionable journals. Journal of Academic Librarianship, 49(4), Article 102749. https://doi.org/10.1016/j.acalib.2023.102749
- Maisonneuve, H. (2025). Predatory journals and paper mills jeopardise knowledge management. Bulletin du Cancer, 112(1), 100-110. https://doi.org/10.1016/j.bulcan.2024.12.002
- Smagulov, K., & Teixeira da Silva, J. A. (2025). Scientific productivity and retracted literature of authors with Kazakhstani affiliations during 2013-2023. Journal of Academic Ethics. https://doi.org/10.1007/s10805-025-09624-0
- Teixeira da Silva, J. A., & Nazarovets, S. (2023). Assessment of retracted papers, and their retraction notices, from a cancer journal associated with “paper mills”. Journal of Data and Information Science, 8(2), 118-125. https://doi.org/10.2478/jdis-2023-0009
- Wright, D. E. (2024). Five problems plaguing publishing in the life sciences—and one common cause. FEBS Letters, 598(18), 2227-2239. https://doi.org/10.1002/1873-3468.15018