Methodology | Lokman I. Meho

RI²: A Composite Metric for Detecting Risk Profiles

The Research Integrity Risk Index (RI²) is the first empirically grounded, composite metric designed to assess research integrity risk at the institutional level. Unlike conventional rankings that emphasize research volume and citation-based visibility, RI² focuses on integrity-sensitive indicators resistant to bibliometric manipulation and inflation. Currently, the RI² is based on three components: the rate of articles in delisted journals (D-Rate), the retraction rate (R-Rate), and the institutional self-citation rate (S-Rate).

Delisted Journal Risk (D-Rate): Quantifies institutional reliance on questionable publication venues by measuring the share of their output published in journals removed from Scopus or Web of Science for violations of editorial, publishing, or peer-review standards. Per data from Scopus, the retraction rate among these journals is two-and-a-half times that of other indexed journals (for articles published in 2023-2024). Moreover, journals delisted by Scopus and Web of Science often exhibit clear warning signs long before their removal, including abnormally short review times, geographic or institutional clustering, poor editorial transparency, and aggressive solicitation practices (Cortegiani et al., 2020; Krauskopf, 2018; Wilches-Visbal et al., 2024). Both Scopus and WoS regularly re-evaluate indexed journals, with delisting typically applied prospectively, excluding new content while retaining past ones. The continued presence of content from delisted journals in citation databases has significant implications for research evaluation. According to Scopus, in 2023-2024, articles from delisted journals accounted for only 2% of the global output, yet their distribution was highly concentrated. For example, India, Indonesia, Iraq, Malaysia, and Saudi Arabia together produce 10% of the world’s research, yet account for 30% of articles in delisted journals. This imbalance has the potential to distort outcomes in evaluation systems.

Between January 2023 and August 2025, a total of 307 journals were delisted after accounting for overlap: 152 by Scopus and 189 by WoS. Of the 152 journals delisted by Scopus, 23 remained actively indexed in WoS. Of the 189 journals delisted by WoS, 68 remained actively indexed in Scopus. In total, these 307 journals produced 180,000 articles in 2023-2024. A key finding is that Q1-Q2 ranking is an unreliable indicator of journal integrity. Among the journals examined, 41% held a Q1 or Q2 CiteScore ranking at the time of their delisting. This disconnect confirms that quartile status should not be conflated with research quality or adherence to ethical publishing standards. D-Rate is calculated as follows:

D-Rate_y=
A_d,wA_t,w
× 100, where

y = edition year

A_d,w= articles in delisted journals within window

A_t,w= total articles within window w

D-Rate₂₀₂₅=
A_{d, 2023-2024}A_{t, 2023-2024}
× 100

Retraction Risk (R-Rate): Measures the proportion of an institution’s research output that has been retracted, serving as a critical marker of research integrity (Fang et al., 2012; Ioannidis et al., 2025). The rate is expressed as the number of retracted articles per 1,000 articles over a specified period. According to the Retraction Watch Database (RWDB), the most common reasons for the retraction of articles published in 2023-2024 were (categories are non-mutually exclusive): investigation by journal/publisher (86%), fake peer review (61%), unreliable results and/or conclusions (60%), concerns/issues about referencing/attributions (41%), investigation by third party (36%), paper mill (22%), concerns/issues with peer review, (21%), computer-aided content or computer-generated content (21%), objections by author(s) (17%), rogue editor (16%), and concerns/issues about third party involvement (16%). As with articles in delisted journals, elevated retraction rates may signal structural vulnerabilities in institutional research governance, quality control mechanisms, and academic culture.

To maximize coverage, this study draws on retraction data from four sources—RWDB, MEDLINE, Scopus, and Web of Science—with RWDB as the primary source. To ensure accuracy and mitigate well-documented inconsistencies in retraction tagging and document type classification across databases, a publication is included in the retraction count only if it is (1) confirmed as retracted by one of the four databases and (2) verified to be a journal article or review via document type classification in MEDLINE, Scopus, and Web of Science. The following Figure summarizes the data collection and validation process, and the manuscript cited below provides a detailed description.

⤢

(Fig. 1. Flowchart for identifying retracted articles)

As of August 26, 2025, the global average retraction rate for articles published in 2023-2024 was 0.8 per 1,000. The highest rates were observed in mathematics (3.3), computer science (2.5), decision sciences (1.6), and engineering (1.4), while the lowest rates were in arts and humanities, agricultural and biological sciences, and social sciences (each < 0.2). These rates are expected to increase over time as additional retractions accrue. R-Rate is calculated as follows:

R-Rate_y=
A_r,wA_t,w
× 1000, where

y = edition year

A_r,w= articles of window

A_t,w= total articles within window w

R-Rate₂₀₂₅=
A_{r, 2023-2024}A_{t, 2023-2024}
× 1000

Self-Citation Rate (S-Rate): Sourced from InCites, measures the proportion of citations to an institution’s articles that originate from the same institution. While moderate self-citation is legitimate in cumulative research, unusually high levels compared to peers can indicate metric manipulation rather than reflecting genuine scholarly influence. S-Rate is calculated as follows:

S-Rate_y=
C_t,w – C_s,wC_t,w
× 1000, where

y = edition year

C_t,w= total citations during window

C_s,w= citations excluding self-citations within window w

S-Rate₂₀₂₅=
C_{t, 2023-2024} – C_{s, 2023-2024}C_{t, 2023-2024}
× 100

In all three components, articles with more than 100 co-authors are excluded from analysis to reduce distortions caused by large-scale collaborations. Collectively, the three components provide empirical signals of institutional exposure to research-integrity risks, including fraudulent authorship, publication in questionable venues, paper mills, citation cartels, and other forms of metric manipulation (Ioannidis & Maniadis, 2024; Wright, 2024). All three indicators rely on verifiable bibliometric outcomes rather than inferred behaviors, enhancing objectivity and reproducibility (Ioannidis, 2024).

Temporal Scope and Data Collection: To balance recency with data reliability, each RI² component is calculated using the data of those journal articles published during the two most recent complete calendar years. Data extraction occurs near the end of the RI² publication year to maximize coverage and accommodate inherent lags in bibliographic databases. For instance, the 2025 edition of RI² utilizes publication and citation data collected in late 2025 for articles published in 2023 and 2024. This two-year window and deferred extraction strategy help account for delays in journal delisting, citation accumulation, and the retraction indexing process (Candal-Pedreira et al., 2024; Fang et al., 2012; Feng et al., 2024; Gedik et al., 2024). An earlier publication window (e.g., 2022-2023) was not used because it would no longer reflect the most current state of institutional research activity. A broader publication window (e.g., 2022-2024) was likewise avoided to maintain sensitivity to recent shifts in institutional policy and practice.

Field Normalization: To ensure equitable comparisons across diverse institutional profiles, each RI² component is field-normalized according to an institution’s research portfolio. The normalization uses the OECD (Organisation for Economic Co-operation and Development) Fields of Science and Technology taxonomy (Frascati Manual) as implemented in InCites, which maps 254 Web of Science subject categories, not mutually exclusive, into six broad fields. For RI², these six fields are consolidated into three broader categories:

STEM: Encompassing “Natural Sciences,” “Engineering and Technology,” and “Agricultural and Veterinary Sciences.”
Medical and Health Sciences: Corresponding directly to the OECD category.
Multidisciplinary: Institutions without a dominant specialization in either STEM or Medical and Health Sciences. Institutions with strengths in “Social Sciences” and “Humanities and the Arts” are subsumed under this category.

Institutions are algorithmically classified by publication output, not organizational structure:

An institution is classified as STEM if its output in STEM fields exceeds by more than threefold both (a) its output in Medical and Health Sciences and (b) its combined output in Social Sciences and Humanities and the Arts (e.g., a publication share of 66% STEM, 21% Medical and Health Sciences, and 20% Social Sciences and Humanities and the Arts; totals may exceed 100% due to category overlap).
An institution is classified as Medical and Health Sciences if its output in this field exceeds by more than threefold both (a) its STEM output and (b) its combined output in Social Sciences and Humanities.
All other institutions are classified as Multidisciplinary.

All RI² components are expressed as field- and size-normalized rates, inherently controlling for differences in institutional subject categories and output volume. Restricting the benchmark population to the 1,000 most publishing universities further minimizes extreme variation and ensures global comparability. This data-driven framework dynamically adapts to evolving institutional portfolios, providing a globally applicable alternative to national typologies such as the U.S.-centric Carnegie Classification.

The threefold (3×) dominance rule ensures that each assigned field label reflects a genuinely dominant research portfolio. Sensitivity analyses confirmed that lower thresholds would misclassify broad-portfolio universities with strong medical schools (e.g., Harvard, Johns Hopkins) as Medical and Health Sciences and technically focused universities with diversified output (e.g., MIT) as STEM, thereby compromising the construct validity of the categories. The output-based classification thus remains data-driven, comparable, and adaptable to shifts in institutional research focus. Applying this methodology to the current 1,000 most publishing universities yielded 572 Multidisciplinary, 360 STEM, and 68 Medical and Health Sciences institutions.

RI² Score Normalization, Composite Scoring, and Benchmarking: Each component (D-Rate, R-Rate, S-Rate) is normalized within its RI² field category using Min-Max scaling and winsorization at the 99th percentile to reduce outlier distortion. The normalized values are then averaged under an equal-weighting scheme to produce the composite RI² score. Equal weighting maximizes transparency, interpretability, and resistance to arbitrary assumptions about component importance, particularly in the absence of robust empirical evidence justifying a differential weighting (Bellantuono et al., 2022; Fauzi et al., 2020; Paruolo et al., 2013; Saisana et al., 2011). Future versions may refine weighting based on evidence of relative risk severity or predictive value. Meanwhile, reporting both the composite score and its disaggregated components enables stakeholders to cross-examine each risk dimension independently while benefiting from a streamlined overall index.

A defining feature of RI² is its use of a fixed global reference group—the 1,000 most-publishing universities worldwide—to ensure consistent benchmarking across size, time, and geography. This design allows universal comparability as RI² scores are interpreted against a stable global baseline rather than rescaled to local or sample-specific norms. This fixed baseline is fundamental to the interpretive framework described in the next section, in which each indicator is normalized to a 0-1 range using the empirical distribution of the global baseline. This distribution remains fixed across analyses to preserve stable scaling and longitudinal comparability, even as score distributions evolve over time or by field.

Indicator Correlation and Composite Rationale: To assess whether the three components capture distinct dimensions of integrity risk, Pearson correlation coefficients were calculated for the 1,000-university reference group. The results show that D-Rate and R-Rate are moderately to strongly correlated in STEM (r = 0.80) and modestly correlated in Medical and Health Sciences (r = 0.46) and Multidisciplinary fields (r = 0.51), indicating partial overlap. In contrast, near-zero correlations involving S-Rate confirm that self-citation behavior reflects a distinct, citation-level mechanism of metric distortion.

Pearson correlations among RI² components by field for the 1,000 most publishing universities
Interpretation	Medical & Health Sciences	Multidisciplinary	STEM
Medical and Health Sciences	0.46	-0.06	-0.09
Multidisciplinary	0.51	0.30	0.08
STEM	0.80	-0.02	-0.08

Variance Inflation Factors (VIF) were computed for the three components across the three RI² field categories to test for multicollinearity. All VIF values were well below the conventional threshold of 5, confirming the absence of multicollinearity. The slightly higher yet acceptable VIFs for D-Rate and R-Rate in STEM (≈ 2.8) reflect their conceptual proximity rather than redundancy. At the same time, S-Rate consistently exhibited independence (VIF ≈ 1.0) across all fields. These findings reinforce the empirical validity of combining the three components into a single composite index.

Accordingly, while the indicators are empirically related, they are not redundant. Their integration enhances interpretive breadth and robustness, enabling RI² to capture institutional exposure to research-integrity risks as a multidimensional construct rather than a collection of isolated metrics.

Interpretive Framework: Risk Tiers, Not Rankings: RI² is designed as a diagnostic framework, not a competitive ranking. Its primary purpose is to identify patterns of elevated research-integrity risk rather than to assign reputational value. Each institution’s composite RI² score serves as a relative indicator of structural vulnerability, highlighting where bibliometric behaviors may warrant further scrutiny, governance review, or corrective policy action.

To ensure interpretive clarity and stability, institutions are classified into five percentile-based tiers derived from the fixed global reference distribution of the 1,000 most-publishing universities worldwide. This percentile approach anchors thresholds in the empirical properties of a constant reference group, maintaining stability and comparability across years, samples, and disciplinary compositions. The specific percentiles and their interpretations are as follows:

Tier	Percentile Range	Interpretation
Red Flag	≥ 95^th	Extreme anomalies; systemic integrity risk
High Risk	≥ 90^th and < 95^th	Significant deviation from global norms
Watch List	≥ 75^th and < 90^th	Moderately elevated risk; emerging concerns
Normal Variation	≥ 50^th and < 75^th	Within expected global variance
Low Risk	< 50^th	Strong adherence to publishing integrity norms

The five tiers, ranging from Low Risk to Red Flag, are intended as qualitative signals of exposure, not ordinal measures of prestige or performance. Movement between tiers reflects shifts in institutional risk profiles, not improvement or decline in academic excellence. A university in the Red Flag tier is not “worse ranked” than one in the Low-Risk tier; instead, it exhibits bibliometric patterns consistent with greater exposure to integrity-related vulnerabilities, such as elevated retraction rates, greater reliance on delisted journals, or abnormal self-citation dynamics.

This interpretive framework serves three critical purposes:

Conceptual clarity: Prevents misreading RI² as another leaderboard and reinforces its diagnostic intent.
Policy utility: Enables regulators, funders, and ranking agencies to identify systemic weaknesses.
Institutional accountability: Provides university leaders with an evidence-based tool for internal review of research practices, incentive systems, and governance mechanisms.

References

Abalkina, A. (2023). Publication and collaboration anomalies in academic papers originating from a paper mill: Evidence from a Russia-based paper mill. Learned Publishing, 36(4), 689-702. https://doi.org/10.1002/leap.1574
Candal-Pedreira, C., Guerra-Tort, C., Ruano-Ravina, A., Freijedo-Farinas, F., Rey-Brandariz, J., Ross, J. S., & Pérez-Ríos, M. (2024). Retracted papers originating from paper mills: a cross-sectional analysis of references and citations. Journal of Clinical Epidemiology, 172, Article 111397. https://doi.org/10.1016/j.jclinepi.2024.111397
Cortegiani, A., Ippolito, M., Ingoglia, G., Manca, A., Cugusi, L., Severin, A., Strinzel, M., Panzarella, V., Campisi, G., Manoj, L., Gregoretti, C., Einav, S., Moher, D., & Giarratano, A. (2020). Citations and metrics of journals discontinued from Scopus for publication concerns: The GhoS(t)copus Project. F1000Research, 9, Article 415. https://doi.org/10.12688/f1000research.23847.2
Fang, F. C., Steen, R. G., & Casadevall, A. (2012). Misconduct accounts for the majority of retracted scientific publications. Proceedings of the National Academy of Sciences of the United States of America, 109(42), 17028-17033. https://doi.org/10.1073/pnas.1212247109
Feng, S., Feng, L., Han, F., Zhang, Y., Ren, Y., Wang, L., & Yuan, J. (2024). Citation network analysis of retractions in molecular biology field. Scientometrics, 129(8), 4795-4817. https://doi.org/10.1007/s11192-024-05101-4
Ioannidis, J. P. A., & Maniadis, Z. (2024). Quantitative research assessment: using metrics against gamed metrics. Internal and Emergency Medicine, 19(1), 39-47. https://doi.org/10.1007/s11739-023-03447-w
Ioannidis, J. P. A., Pezzullo, A. M., Cristiano, A., Boccia, S., & Baas, J. (2025). Linking citation and retraction data reveals the demographics of scientific retractions among highly cited authors. PLoS Biology, 23(1), Article e3002999. https://doi.org/10.1371/journal.pbio.3002999
Lancho Barrantes, B. S., Dalton, S., & Andre, D. (2023). Bibliometrics methods in detecting citations to questionable journals. Journal of Academic Librarianship, 49(4), Article 102749. https://doi.org/10.1016/j.acalib.2023.102749
Maisonneuve, H. (2025). Predatory journals and paper mills jeopardise knowledge management. Bulletin du Cancer, 112(1), 100-110. https://doi.org/10.1016/j.bulcan.2024.12.002
Smagulov, K., & Teixeira da Silva, J. A. (2025). Scientific productivity and retracted literature of authors with Kazakhstani affiliations during 2013-2023. Journal of Academic Ethics. https://doi.org/10.1007/s10805-025-09624-0
Teixeira da Silva, J. A., & Nazarovets, S. (2023). Assessment of retracted papers, and their retraction notices, from a cancer journal associated with “paper mills”. Journal of Data and Information Science, 8(2), 118-125. https://doi.org/10.2478/jdis-2023-0009
Wright, D. E. (2024). Five problems plaguing publishing in the life sciences—and one common cause. FEBS Letters, 598(18), 2227-2239. https://doi.org/10.1002/1873-3468.15018

Institutions are encouraged to review their data and report any discrepancies to:

Lokman I. Meho, University Librarian & Professor, Department of Internal Medicine, American University of Beirut: bl.ude.bua@oheml

To cite or reference this index:

Meho, L. I. (in press). Gaming the Metrics: Bibliometric Anomalies in Global University Rankings and the Research Integrity Risk Index (RI²).Scientometrics.

Page developed and maintained by Hicham Zahnan bl.ude.bua@92zh | Last updated: November 4, 2025

The views expressed on this site are those of the author and do not reflect or represent the official policy or position of the American University of Beirut (AUB), Elsevier, or Clarivate. The information available on this site is provided "as is," without warranty of any kind. AUB, Elsevier, and Clarivate shall not be liable for the accuracy, content, completeness, legality, or reliability of the information contained on this site. If this website provides links to other websites owned or operated by third parties, AUB is not responsible for the content or material available thereon.
The Research Integrity Risk Index (RI²) is an independent, composite metric developed to highlight potential research integrity risks. It does not assert misconduct or wrongdoing by any institution or individual. RI² and its author are not affiliated with, endorsed by, or acting on behalf of Elsevier, Clarivate, or any other data provider. All data, classifications, and rankings are subject to periodic updates and refinements as new information becomes available.