RI²: A Composite Metric for Detecting Risk Profiles
The Research Integrity Risk Index (RI²) is the first empirically grounded, composite metric designed to assess research integrity risk at the institutional level. Unlike conventional rankings that emphasize research volume and citation-based visibility, RI² focuses on integrity-sensitive indicators resistant to bibliometric manipulation and inflation. Currently, the RI² is based on three components: rate of articles in delisted journals (D-Rate), retraction rate (R-Rate), and institutional self-citation rate (S-Rate).
Delisted Journal Risk (D-Rate): Quantifies institutional reliance on questionable publication venues by measuring the share of their output published in journals removed from Scopus or Web of Science for violations of editorial, publishing, or peer-review standards. It should be emphasized that journals delisted by Scopus and Web of Science often exhibit clear warning signs long before their removal, including abnormally short review times, geographic or institutional clustering, poor editorial transparency, and aggressive solicitation practices (Cortegiani et al., 2020; Krauskopf, 2018; Wilches-Visbal et al., 2024). Both Scopus and WoS regularly re-evaluate indexed journals, with delisting typically applied prospectively, excluding new content while retaining past ones. The continued presence of content from delisted journals in citation databases has significant implications for research evaluation. According to Scopus, in 2023-2024, articles from delisted journals represented only 2% of the global output, but their distribution is highly concentrated. For example, India, Indonesia, Iraq, Malaysia, and Saudi Arabia together produce slightly over 10% of the world’s research, yet account for 30% of articles in delisted journals. This imbalance has the potential to distort outcomes in evaluation systems, including Clarivate’s Highly Cited Researchers, QS’s citations per faculty indicator, and several volume- and citation-based measures in THE.
Between January 2023 and August 2025, a total of 323 journals were delisted: 139 by Scopus and 184 by WoS, yielding 291 unique delisted journals after removing overlap. Of the 139 journals delisted by Scopus, 25 remained actively indexed in WoS. Of the 184 journals delisted by WoS, 68 remained actively indexed in Scopus, and 21 others had their Scopus coverage discontinued without being formally tagged as delisted. In total, these 228 journals (139, 68, and 21) produced 123,130 articles in 2023-2024. A key finding is that Q1-Q2 ranking is an unreliable indicator of journal integrity. Among the 212 journals, over 40% held a Q1 or Q2 CiteScore quartile at the time of their delisting. This pattern is supported by the case of the 2025 Chinese Early Warning Journal List. As of August 2025, three of its five listed journals remained active in Scopus with a Q1 CiteScore ranking. This disconnect confirms that quartile status should not be conflated with research quality or adherence to ethical publishing standards. D-Rate is calculated as follows:
Retraction Risk (R-Rate): Measures the proportion of an institution’s research output that has been retracted, serving as a critical marker of research integrity (Fang et al., 2012; Ioannidis et al., 2025). The rate is expressed as the number of retracted articles per 1,000 articles over a specified period. According to the Retraction Watch Database (RWDB), the most common reasons for the retraction of articles published in 2023-2024 were (categories are non-mutually exclusive): investigation by journal/publisher (86%), fake peer review (61%), unreliable results and/or conclusions (60%), concerns/issues about referencing/attributions (41%), investigation by third party (36%), paper mill (22%), concerns/issues with peer review, (21%), computer-aided content or computer-generated content (21%), objections by author(s) (17%), rogue editor (16%), and concerns/issues about third party involvement (16%). As with articles in delisted journals, elevated retraction rates may signal structural vulnerabilities in institutional research governance, quality control mechanisms, and academic culture.
To maximize coverage, this study draws retraction data from four sources–RWDB, MEDLINE, Scopus, and Web of Science–with RWDB serving as the primary source. To ensure accuracy and mitigate well-documented inconsistencies in retraction tagging and document type classification across databases, a publication is included in the retraction count only if it is (1) confirmed as retracted by one of the four databases and (2) verified to be a journal article or review via document type classification in SciVal, MEDLINE, and Web of Science. The following Figure summarizes the data collection and validation process, and the manuscript cited below provides a detailed description.
As of August 26, 2025, the global average retraction rate for articles published in 2023-2024 was 0.8 per 1,000. The highest rates were observed in mathematics (3.3), computer science (2.5), decision sciences (1.6), and engineering (1.4), while the lowest rates were in arts and humanities, agricultural and biological sciences, and social sciences (each < 0.2). These rates are expected to increase over time as additional retractions accrue.
Self-Citation Rate (S-Rate): Measures the proportion of citations to an institution’s articles that originate from the same institution. While self-citation is a legitimate scholarly practice, particularly in sequential research building directly on prior work, an anomalously high institutional self-citation rate can artificially inflate citation-based metrics used in rankings and evaluations. Elevated rates may thus indicate strategic citation practices aimed at gaming institutional metrics rather than reflecting genuine scholarly influence. The S-Rate in RI² is not designed to penalize self-citation per se. Instead, it functions as a field-normalized, percentile-based risk indicator to identify statistical outliers relative to global peer institutions within the same broad discipline. This ensures the metric flags only those institutions whose citation patterns significantly deviate from established norms. Self-citation data are sourced from InCites. As with other RI² components, articles with more than 100 co-authors are excluded to reduce distortions caused by large-scale collaborations.
Summary of components and their data sources (for the 2025 edition of RI²):
- Articles = Articles and Reviews published during 2023-2024 and indexed in Scopus as of the date of publication of RI² (September 2025). Only records with a SciVal match are included. Excludes items with more than 100 co-authors and Articles in press.
- D-Rate = Percentage of 2023-2024 articles and reviews that appeared in journals delisted by Scopus or Web of Science between January 2023 and the date of publication of RI² (September 2025).
- R-Rate = Retraction rate per 1,000: retractions among 2023-2024 articles and reviews, identified, as of September 2025, via the Retraction Watch Database, MEDLINE, Scopus, and Web of Science, and restricted to items with a SciVal match.
- S-Rate = Institutional self-citation share (%), as reported by InCites as of the date of publication of RI² (September 2025): percentage of citations to an institution’s 2023-2024 articles and reviews that originate from the same institution.
Research Integrity Risks Addressed: Collectively, the three indicators serve as robust proxies for a spectrum of research integrity concerns, including paper mills (businesses selling authorship), citation cartels (reciprocal citation networks), citation farms (organizations selling citations), fraudulent authorship practices, and other forms of metric manipulation (Abalkina, 2023; Candal-Pedreira et al., 2024; Feng et al., 2024; Ioannidis & Maniadis, 2024; Lancho Barrantes et al., 2023; Maisonneuve, 2025; Smagulov & Teixeira da Silva, 2025; Teixeira da Silva & Nazarovets, 2023; Wright, 2024). A key strength of this approach is that all indicators reflect verifiable outcomes (articles in delisted journals, retractions, observed citation patterns) rather than inferred behaviors, enhancing their objectivity and robustness for institutional-level risk assessment.
Temporal Scope and Data Collection Period: To balance recency with data reliability, each component of RI² is calculated using data of journal articles and reviews (articles hereafter) published during the two most recent complete calendar years. Data extraction occurs near the end of the RI² publication year to maximize coverage and accommodate inherent lags in bibliographic databases. For instance, the 2025 edition of RI² utilizes publication, citation, and retraction data collected toward the end of 2025 for articles published in 2023 and 2024. This two-year window and deferred extraction strategy account for delays in journal delisting, citation accumulation, and the retraction indexing processes (Candal-Pedreira et al., 2024; Fang et al., 2012; Feng et al., 2024; Gedik et al., 2024).
Field Normalization: To ensure fair comparisons, RI² field-normalizes each component based on institutional research strength using the OECD Fields of Science and Technology taxonomy (Frascati Manual), as implemented in InCites. InCites maps 254 Web of Science (WoS) subject categories into six OECD broad fields: Natural Sciences, Engineering and Technology, Medical and Health Sciences, Agricultural and Veterinary Sciences, Social Sciences, and Humanities and the Arts. For RI², these are grouped into three broader categories: STEM (including natural sciences, engineering and technology, and agricultural and veterinary sciences), Medical and Health Sciences (corresponding to the OECD Medical and Health Sciences field), and Multidisciplinary (institutions without a dominant STEM or Medical and Health Sciences profile). Institutions with strengths in Social Sciences and Humanities and the Arts are subsumed under Multidisciplinary.
-
A university is classified as STEM if its STEM output exceeds by more than threefold both (a) its Medical and Health Sciences output and (b) its combined Social Sciences and Humanities and the Arts output.
-
A university is classified as Medical and Health Sciences if its output in this field exceeds by more than threefold both (a) its STEM output and (b) its combined Social Sciences and Humanities and the Arts output.
Per InCites, among the world’s 2,000 most published universities, only the London School of Economics and Political Science and Italy’s Bocconi University have combined Social Sciences and Humanities and the Arts output that exceeds by more than threefold the output in both STEM and Medical and Health Sciences, and only eleven other institutions exceed this by twofold.
The threefold (3×) dominance rule ensures that the assigned field label accurately reflects a truly dominant research portfolio. Sensitivity analyses confirmed that a lower threshold would misclassify broad-portfolio universities with strong medical schools (e.g., Harvard, Johns Hopkins) as Medical and Health Sciences institutions, and technically-focused universities with significant other output (e.g., MIT) as STEM instead of Multidisciplinary in both cases, thereby undermining the construct validity of the categories. The output-based approach (rather than the presence of a specific school or college, such as engineering or medicine) ensures a data-driven, comparable categorization that automatically adapts to shifts in institutional research focus. Applied to the world’s 2,000 most publishing universities, this methodology resulted in 1,043 Multidisciplinary, 814 STEM, and 143 Medical and Health Sciences institutions.
Composite Scoring, Normalization, and Benchmarking: The construction of the final RI² composite score follows a structured process to ensure robust, transparent, and comparable institutional benchmarking.
- Field-Specific Winsorization and Normalization: To enable fair comparisons across diverse institutional types, each of the three component rates (D-Rate, R-Rate, S-Rate) is first processed within its respective RI² field category (Medical and Health Sciences, Multidisciplinary, STEM). To minimize the distorting effect of extreme outliers, raw values are winsorized at the 99th percentile. Each winsorized component is then scaled to a 0-1 range using Min-Max normalization relative to a fixed global reference group.
- Composite Score Calculation: The composite RI² score is calculated as the arithmetic mean of the three normalized component scores. Equal weighting was deliberately adopted to maximize transparency, interpretability, and resistance to arbitrary weighting schemes, particularly in the absence of robust empirical evidence justifying a differential weighting across components (Bellantuono et al., 2022; Fauzi et al., 2020). Future iterations may explore alternative weighting strategies informed by new evidence on risk severity or predictive value.
- The Global Reference Group and Benchmarking Rationale: A critical feature of RI² is its use of a fixed global reference group–the 1,000 most publishing universities worldwide–to ensure consistent benchmarking. This approach offers two key advantages: (a) Universal Comparability: RI² scores are always interpreted against this stable global baseline, not rescaled to local or sample-specific norms. This ensures the metric’s meaning remains consistent across geographic and temporal contexts; and (b) Stable Thresholds: Risk tiers are defined by the empirical distribution of this fixed reference group, guaranteeing consistent application.
- Interpretive Framework and Risk Tiers: While RI² assigns each institution a composite score, this metric is best interpreted as an indicator of relative structural vulnerability, not as a proxy for institutional prestige. To facilitate clear interpretation, institutions are classified into five fixed risk tiers based on their percentile position within the global reference distribution. Key features of this framework include: (a) Global Benchmarking: Ensures representative and statistically reliable classification; (b) Fixed Thresholds: Guarantees consistent application and interpretability across analyses; and (c) Transparency and Robustness: Built on verifiable, normalized metrics rather than subjective assessment.
By maintaining equal weights and by reporting both the composite score and its disaggregated components, RI² enables stakeholders to cross-examine each risk dimension independently while benefiting from a parsimonious overall index.
RI² 2025 Risk Tiers Framework (based on the 1,000 most publishing universities, 2023-2024), as of August 2025 | |||||
Tier | Percentile Range | Interpretation | Medical & Health Sciences | Multidisciplinary | STEM |
Red Flag | ≥ 95th | Extreme anomalies; systemic integrity risk | RI² ≥ 0.588 | RI² ≥ 0.488 | RI² ≥ 0.532 |
High Risk | ≥ 90th and < 95th | Significant deviation from global norms | 0.540 ≤ RI² < 0.588 | 0.380 ≤ RI² < 0.488 | 0.372 ≤ RI² < 0.532 |
Watch List | ≥ 75th and < 90th | Moderately elevated risk; emerging concerns | 0.413 ≤ RI² < 0.540 | 0.269 ≤ RI² < 0.380 | 0.268 ≤ RI² < 0.372 |
Normal Variation | ≥ 50th and < 75th | Within expected global variance | 0.285 ≤ RI² < 0.413 | 0.197 ≤ RI² < 0.269 | 0.206 ≤ RI² < 0.268 |
Low Risk | < 50th | Strong adherence to publishing integrity norms | RI² < 0.285 | RI² < 0.197 | RI² < 0.206 |
Indicator Correlation and Composite Rationale: To evaluate whether the three RI² components capture distinct dimensions of research integrity risk, Pearson correlation coefficients were calculated for the global reference group of 1,000 universities. The analysis revealed a strong correlation between D-Rate and R-Rate in STEM, moderate correlations between these two indicators in Medical & Health Sciences and Multidisciplinary categories, and consistently weak or near-zero correlations between self-citation and the other two components across all fields. These results suggest that while some overlap exists between D-Rate and R-Rate, self-citation practices remain largely independent, confirming that each component measures a distinct aspect of integrity risk. This supports the rationale for combining them into a composite RI² score, which offers a multidimensional profile of institutional risk that no single indicator can capture. At the same time, the relative independence of the components ensures that the composite does not merely replicate one underlying signal. Presenting both the composite score and disaggregated component values enhances interpretability by allowing users to identify whether elevated risk stems primarily from retractions, delisted publishing, self-citation, or a combination of these factors.
Pearson correlations among RI² components by field for the 1,000 most publishing universities | |||
Interpretation | Medical & Health Sciences | Multidisciplinary | STEM |
Medical and Health Sciences | 0.498 | -0.058 | -0.151 |
Multidisciplinary | 0.594 | 0.296 | 0.017 |
STEM | 0.793 | -0.005 | -0.050 |
References
- Abalkina, A. (2023). Publication and collaboration anomalies in academic papers originating from a paper mill: Evidence from a Russia-based paper mill. Learned Publishing, 36(4), 689-702. https://doi.org/10.1002/leap.1574
- Candal-Pedreira, C., Guerra-Tort, C., Ruano-Ravina, A., Freijedo-Farinas, F., Rey-Brandariz, J., Ross, J. S., & Pérez-Ríos, M. (2024). Retracted papers originating from paper mills: a cross-sectional analysis of references and citations. Journal of Clinical Epidemiology, 172, Article 111397. https://doi.org/10.1016/j.jclinepi.2024.111397
- Cortegiani, A., Ippolito, M., Ingoglia, G., Manca, A., Cugusi, L., Severin, A., Strinzel, M., Panzarella, V., Campisi, G., Manoj, L., Gregoretti, C., Einav, S., Moher, D., & Giarratano, A. (2020). Citations and metrics of journals discontinued from Scopus for publication concerns: The GhoS(t)copus Project. F1000Research, 9, Article 415. https://doi.org/10.12688/f1000research.23847.2
- Fang, F. C., Steen, R. G., & Casadevall, A. (2012). Misconduct accounts for the majority of retracted scientific publications. Proceedings of the National Academy of Sciences of the United States of America, 109(42), 17028-17033. https://doi.org/10.1073/pnas.1212247109
- Feng, S., Feng, L., Han, F., Zhang, Y., Ren, Y., Wang, L., & Yuan, J. (2024). Citation network analysis of retractions in molecular biology field. Scientometrics, 129(8), 4795-4817. https://doi.org/10.1007/s11192-024-05101-4
- Ioannidis, J. P. A., & Maniadis, Z. (2024). Quantitative research assessment: using metrics against gamed metrics. Internal and Emergency Medicine, 19(1), 39-47. https://doi.org/10.1007/s11739-023-03447-w
- Ioannidis, J. P. A., Pezzullo, A. M., Cristiano, A., Boccia, S., & Baas, J. (2025). Linking citation and retraction data reveals the demographics of scientific retractions among highly cited authors. PLoS Biology, 23(1), Article e3002999. https://doi.org/10.1371/journal.pbio.3002999
- Lancho Barrantes, B. S., Dalton, S., & Andre, D. (2023). Bibliometrics methods in detecting citations to questionable journals. Journal of Academic Librarianship, 49(4), Article 102749. https://doi.org/10.1016/j.acalib.2023.102749
- Maisonneuve, H. (2025). Predatory journals and paper mills jeopardise knowledge management. Bulletin du Cancer, 112(1), 100-110. https://doi.org/10.1016/j.bulcan.2024.12.002
- Smagulov, K., & Teixeira da Silva, J. A. (2025). Scientific productivity and retracted literature of authors with Kazakhstani affiliations during 2013-2023. Journal of Academic Ethics. https://doi.org/10.1007/s10805-025-09624-0
- Teixeira da Silva, J. A., & Nazarovets, S. (2023). Assessment of retracted papers, and their retraction notices, from a cancer journal associated with “paper mills”. Journal of Data and Information Science, 8(2), 118-125. https://doi.org/10.2478/jdis-2023-0009
- Wright, D. E. (2024). Five problems plaguing publishing in the life sciences—and one common cause. FEBS Letters, 598(18), 2227-2239. https://doi.org/10.1002/1873-3468.15018