RI²: A Diagnostic Framework for Assessing Exposure to Research Integrity Risks
The Research Integrity Risk Index (RI²) is a composite diagnostic framework designed to assess institutional exposure to research-integrity risks using transparent, reproducible bibliometric indicators. RI² is not a ranking and does not assess research quality, intent, or individual misconduct. Instead, it focuses on structural vulnerabilities within scholarly publishing systems that may undermine institutional credibility, trust, and long-term reputation. RI² integrates three complementary components–Delisted Journal Risk (D-Rate), Retraction Risk (R-Rate), and Self-Citation Rate (S-Rate)–that capture distinct, observable mechanisms through which integrity-related risks manifest in scholarly publishing systems. These components were selected deliberately and conservatively, based on verifiable outcomes and formal corrective actions taken by independent third parties, rather than subjective judgments or allegations. RI² is the first institutional-level framework to systematically operationalize these three integrity-risk dimensions–publishing in discontinued or delisted journals, article retractions, and unusually high self-citation–as core diagnostic signals of structural exposure. More recently, these same signals have been incorporated as adjustment factors in a new Institution-Level Percentile Ranking proposed by John P. A. Ioannidis (developer of the Stanford–Elsevier World’s Top 2% Scientists List), Jeroen Baas, Roy Boverhof, and Cyril Voyant. Their independent adoption of these indicators reflects growing recognition that volume- and impact-based institutional assessments are incomplete unless interpreted alongside evidence of potential metric distortion and formally corrected literature. D-Rate and R-Rate capture integrity failures that have already triggered formal corrective actions, while S-Rate provides a descriptive indicator of institutional self-citation behavior which–when unusually high relative to comparable peers–may signal metric distortion rather than genuine scholarly influence. All components are field-normalized and expressed as rates, enabling meaningful comparison across disciplines and institutional profiles. RI² indicators are interpreted as signals of structural exposure, not judgments, sanctions, or accusations. The framework is intended to support research analysis, policy development, and institutional self-assessment, offering a conservative, integrity-aware complement to performance-oriented metrics.
D-Rate (Delisted Journal Risk): Measures the share of an institution’s research output published in journals that have been delisted from Scopus and Web of Science for violations of editorial, publishing, or peer-review standards. Such journals typically exhibit warning signs well before removal, including abnormally short review times, limited editorial transparency, geographic or institutional clustering, and aggressive solicitation practices. Both Scopus and Web of Science conduct ongoing journal re-evaluations and apply delisting prospectively, excluding new content while retaining previously indexed articles. As a result, articles published in delisted journals remain visible and citable in major databases, with direct implications for research evaluation and benchmarking. For articles published in 2023-2024, the retraction rate among delisted journals was, as of December 2025, approximately five times higher than that of journals that remained actively indexed. Moreover, although articles published in delisted journals accounted for only about 3% of global research output during 2023-2024, their distribution was highly concentrated. A small group of countries accounted for a disproportionately large share of this output, a pattern that can distort comparative evaluation outcomes when integrity risks are ignored. Between January 2023 and December 2025, a total of 336 journals were delisted after accounting for overlap: 176 by Scopus and 197 by Web of Science. Of the journals delisted by Scopus, 30 remained indexed in Web of Science; of those delisted by Web of Science, 67 remained indexed in Scopus. Collectively, the delisted journals published approximately 215,000 articles in 2023-2024 (163,000 indexed in Scopus and included in D-Rate; 133,000 in Web of Science). Among the journals indexed in Scopus, 40% held a Q1 or Q2 CiteScore ranking at the time of delisting, confirming that journal quartiles should not be conflated with research quality or adherence to ethical publishing standards. D-Rate, which is based on records matched in SciVal, is calculated as follows:
R-Rate (Retraction Risk): R-Rate measures the proportion of an institution’s research output that has been retracted, expressed as the number of retracted articles per 1,000 published articles. Elevated retraction rates may signal structural vulnerabilities in research governance, quality-control mechanisms, and academic culture rather than isolated misconduct. According to the Retraction Watch Database (RWDB), the most common reasons for the retraction of articles published in 2023-2024 (categories are non-mutually exclusive) include: investigation by journal/publisher (86%), fake peer review (61%), unreliable results and/or conclusions (60%), concerns/issues about referencing/attributions (41%), investigation by third party (36%), paper mill (22%), concerns/issues with peer review, (21%), computer-aided content or computer-generated content (21%), objections by author(s) (17%), and rogue editor (16%). Retraction data are compiled from four independent sources–Retraction Watch Database (RWDB), MEDLINE, Scopus, and Web of Science. For institutional-level analysis, only records with a verified institutional match in SciVal are included, ensuring consistency between the numerator and denominator across D-Rate and R-Rate. Because retraction tagging, document-type classification, and publication-year assignment vary across databases, RI² applies a multi-layered inclusion and validation protocol designed to preserve comprehensive coverage while limiting classification error. After validation and exclusion of 817 ineligible records, 6,040 journal articles published in 2023-2024 were classified as genuine retractions worldwide. As of December 31, 2025, the global average retraction rate for articles published in 2023-2024 was 1.0 per 1,000 articles, with substantial variation across fields. Four hundred records were excluded because they did not meet the inclusion criteria, and another 417 were excluded because they were duplicates (identified via DOIs, PMIDs, and article titles), not attributable to author errors (as identified through RWDB), or were incorrectly classified as published in 2023-2024 (identified through cross-checking data across databases). The following figure summarizes the data collection and validation process, and the cited manuscript provides additional methodological detail.
R-Rate is calculated as follows:
S-Rate (Self-Citation Rate): Sourced from InCites, measures the proportion of citations to an institution’s articles that originate from the same institution. While moderate self-citation is legitimate in cumulative research, unusually high levels compared to peers can indicate metric manipulation rather than reflecting genuine scholarly influence. S-Rate is calculated as follows:
Across all three components, articles with more than 100 co-authors are excluded from the analysis to reduce potential distortions from large-scale collaborations.
Temporal Scope and Data Collection: To balance recency with data reliability, each RI² component is calculated using the data of those journal articles published during the two most recent calendar years. Data extraction is conducted near the end of the RI² publication year to maximize coverage and account for inherent lags in bibliographic databases. For instance, the 2025 edition of RI² utilizes publication and citation data collected in December 2025 for articles published in 2023 and 2024. This two-year window and deferred extraction strategy help account for delays in journal delisting, citation accumulation, and the retraction indexing process. An earlier publication window (e.g., 2022-2023) was not used because it would no longer reflect the most current state of institutional research activity. A broader publication window (e.g., 2022-2024) was likewise avoided to maintain sensitivity to recent shifts in institutional policy and practice.
Field Classification of Universities: To ensure equitable comparisons across diverse institutional profiles, each institution is classified into a field based on its research portfolio. The classification uses the OECD (Organisation for Economic Co-operation and Development) Fields of Science and Technology taxonomy (Frascati Manual), as implemented in InCites, which maps 254 non-mutually exclusive Web of Science subject categories into six broad fields. For RI², these six fields are consolidated into three broader categories: STEM (encompassing “Natural Sciences,” “Engineering and Technology,” and “Agricultural and Veterinary Sciences”), Medical and Health Sciences (corresponding directly to the OECD category), and Multidisciplinary (institutions without a dominant specialization in either STEM or Medical and Health Sciences). Institutions with strengths in “Social Sciences” and “Humanities and the Arts” are subsumed under this category. Institutions are algorithmically classified by publication output, not organizational structure, as follows:
- An institution is classified as STEM if its output in STEM fields exceeds by more than threefold both (a) its output in Medical and Health Sciences and (b) its combined output in Social Sciences and Humanities and the Arts (e.g., a publication share of 66% STEM, 21% Medical and Health Sciences, and 20% Social Sciences and Humanities and the Arts; totals may exceed 100% due to category overlap).
- An institution is classified as Medical and Health Sciences if its output in this field exceeds by more than threefold both (a) its STEM output and (b) its combined output in Social Sciences and Humanities and the Arts.
- All other institutions are classified as Multidisciplinary.
The threefold (3×) dominance rule ensures that each assigned field label reflects a genuinely dominant research portfolio. Sensitivity analyses confirmed that lower thresholds would misclassify universities, thereby compromising the construct validity of the categories.
RI² Score Field Normalization, Composite Scoring, and Benchmarking: Each component is field-normalized using Min-Max scaling and winsorization at the 99th percentile to mitigate the effects of outliers. The normalized values are then averaged under an equal-weighting scheme to produce the composite RI² score. Equal weighting maximizes transparency, interpretability, and resistance to arbitrary assumptions about component importance, especially in the absence of robust empirical evidence justifying a differential weighting. Future versions may refine weighting based on evidence of relative risk severity or predictive value. Meanwhile, reporting both the composite score and its disaggregated components enables stakeholders to cross-examine each risk dimension independently while benefiting from a streamlined overall index. A defining feature of RI² is its use of a fixed global reference group: the 1,000 most-publishing universities worldwide. This is to ensure consistent benchmarking across size, time, and geography. This design allows universal comparability as RI² scores are interpreted against a stable global baseline rather than rescaled to local or sample-specific norms. Each indicator is normalized to a 0-1 range using the empirical distribution of the global baseline. This distribution remains fixed across analyses to preserve stable scaling and longitudinal comparability, even as score distributions evolve over time or across fields.
Indicator Correlation and Composite Rationale: To assess whether the three RI² components capture distinct dimensions of research-integrity risk, Pearson correlation coefficients were calculated for the 1,000-university reference group, separately by field. The results show that D-Rate and R-Rate are strongly correlated in STEM (r = 0.78) and moderately correlated in Medical and Health Sciences (r = 0.54) and Multidisciplinary fields (r = 0.56), indicating partial overlap between venue-level and article-level enforcement outcomes. In contrast, correlations involving S-Rate are near zero across all fields, confirming that institutional self-citation behavior reflects a distinct citation-level mechanism rather than downstream consequences of journal delisting or article retraction. To further assess potential redundancy, Variance Inflation Factors (VIF) were computed for all three components within each field category. All VIF values were well below the conventional threshold of 5, confirming the absence of problematic multicollinearity. The slightly higher, but still acceptable, VIFs observed for D-Rate and R-Rate in STEM (≈ 2.8) reflect their conceptual proximity as enforcement-related indicators rather than statistical redundancy. In contrast, S-Rate consistently exhibited near-unit VIF values (≈ 1.0) across all fields, underscoring its independence from the other two components. These results demonstrate that while the three indicators are empirically related, they are not redundant. Their integration therefore enhances interpretive breadth and robustness, enabling RI² to capture institutional exposure to research-integrity risks as a multidimensional construct, rather than as a collection of isolated or overlapping metrics.
| Pearson correlations among RI² components by field for the 1,000 most publishing universities | |||
| Medical & Health Sciences | Multidisciplinary | STEM | |
| D-Rate and R-Rate | 0.54 | 0.56 | 0.78 |
| D-Rate and S-Rate | -0.09 | 0.29 | -0.05 |
| R-Rate and S-Rate | -0.16 | 0.07 | -0.07 |
Interpretive Framework: Risk Tiers, Not Rankings: RI² is designed as a diagnostic framework rather than a competitive ranking. Its primary purpose is to identify patterns of elevated research-integrity risk rather than to assign reputational value. Accordingly, each institution’s composite RI² score serves as a relative indicator of structural vulnerability, highlighting where bibliometric behaviors may warrant further scrutiny, governance review, or corrective policy action. To ensure interpretive clarity and stability, institutions are classified into five percentile-based tiers derived from the fixed global reference distribution of the 1,000 most-publishing universities worldwide. This percentile approach anchors thresholds in the empirical properties of a constant reference group, maintaining stability and comparability across years, samples, and disciplinary compositions. The specific percentiles and their interpretations are as follows:
| Tier | Percentile Range | Interpretation |
| Red Flag | ≥ 95th | Extreme anomalies; systemic integrity risk |
| High Risk | ≥ 90th and < 95th | Significant deviation from global norms |
| Watch List | ≥ 75th and < 90th | Moderately elevated risk; emerging concerns |
| Normal Variation | ≥ 50th and < 75th | Within expected global variance |
| Low Risk | < 50th | Strong adherence to publishing integrity norms |
The five tiers, ranging from Low Risk to Red Flag, are intended as qualitative signals of exposure, not ordinal measures of prestige or performance. Movement between tiers reflects shifts in institutional risk profiles, not improvement or decline in academic excellence. A university in the Red Flag tier is not “worse ranked” than one in the Low-Risk tier; instead, it exhibits bibliometric patterns consistent with greater exposure to integrity-related vulnerabilities, such as elevated retraction rates, greater reliance on delisted journals, or abnormal self-citation dynamics. This interpretive framework serves three critical purposes:
- Conceptual clarity: Prevents misreading RI² as another leaderboard and reinforces its diagnostic intent.
- Policy utility: Enables regulators, funders, and ranking agencies to identify systemic weaknesses.
- Institutional accountability: Provides university leaders with an evidence-based tool for internal review of research practices, incentive systems, and governance mechanisms.
The Research Integrity Risk Index (RI²) is an independent, composite metric developed to highlight potential research integrity risks. It does not assert misconduct or wrongdoing by any institution or individual. RI² and its author are not affiliated with, endorsed by, or acting on behalf of Elsevier, Clarivate, or any other data provider. All data, classifications, and rankings are subject to periodic updates and refinements as new information becomes available.