RI²: A Composite Metric for Detecting Risk Profiles

The Research Integrity Risk Index (RI²) is the first empirically grounded, composite metric designed to assess research integrity risk at the institutional level. Unlike conventional rankings that emphasize research volume and citation-based visibility, RI² focuses on integrity-sensitive indicators resistant to bibliometric manipulation and inflation.

The RI² comprises three components: rate of articles in delisted journals (D-Rate), retraction rate (R-Rate), and institutional self-citation rate (S-Rate). Each is calculated using data on journal articles and reviews (“articles” hereafter) from the two most recent complete calendar years. Data are extracted and analyzed near the end of the RI² publication year to maximize coverage. For example, the 2025 edition uses data collected in late 2025 on 2023-2024 articles. This two-year window balances recency with reliability, while the late extraction in the year accommodates lags in journal delisting, citation accumulation, and retraction indexing (Candal-Pedreira et al., 2024; Feng et al., 2024; Fang et al., 2012; Gedik et al., 2024).

To ensure fair comparisons, RI² field-normalizes each component based on institutional research strength using the OECD Fields of Science and Technology taxonomy (Frascati Manual), as implemented in InCites. InCites maps 254 Web of Science (WoS) subject categories into six OECD broad fields: Natural Sciences, Engineering and Technology, Medical and Health Sciences, Agricultural and Veterinary Sciences, Social Sciences, and Humanities and the Arts. For RI², these are grouped into three broader categories: STEM (including natural sciences, engineering and technology, and agricultural and veterinary sciences), Medical and Health Sciences (corresponding to the OECD Medical and Health Sciences field), and Multidisciplinary (institutions without a dominant STEM or Medical and Health Sciences profile). Institutions with strengths in Social Sciences and Humanities and the Arts are subsumed under Multidisciplinary (see below).

  • A university is classified as STEM if its STEM output exceeds by more than threefold both (a) its Medical and Health Sciences output and (b) its combined Social Sciences and Humanities and the Arts output.

  • A university is classified as Medical and Health Sciences if its output in this field exceeds by more than threefold both (a) its STEM output and (b) its combined Social Sciences and Humanities and the Arts output.

Per InCites data, only the London School of Economics and Political Science and Italy’s Bocconi University have combined Social Sciences and Humanities and the Arts output that exceeds by more than threefold the output in both STEM and Medical and Health Sciences, and only eleven other institutions exceed this by twofold; accordingly, all 13 are included in the Multidisciplinary category.

A threefold (3×) dominance rule is adopted to ensure the assigned field truly dominates an institution’s portfolio, reserving labels such as STEM or Medical and Health Sciences for unambiguous cases. Sensitivity analyses showed that thresholds below 3× would reclassify several broad-portfolio universities with strong medical schools (e.g., Duke, Harvard, Johns Hopkins, the University of Pennsylvania, the University of Pittsburgh, Vanderbilt, and Washington University in St. Louis) as Medical and Health Sciences, undermining construct validity and inflating the Medical and Health Sciences category. Similarly, thresholds below 3× would pull additional broad-portfolio institutions into the STEM category (e.g., MIT, the University of Maryland-College Park, and UC Santa Barbara), inflating STEM and reducing the size of the Multidisciplinary group of universities.

The presence or absence of a specific school (e.g., engineering, medicine) is not decisive in determining an institution’s RI² research category; the publication mix determines the classification. The RI² classification relies on observed publication patterns rather than institutional structures, as universities vary widely in organizational design (e.g., some lack formal schools but publish in medical fields). The use of publication mix ensures comparable, data-driven categorization. Moreover, output-based classification automatically adjusts to shifts in institutional research focus, avoiding rigid structural assumptions. The research classification rule is applied consistently to all institutions. Applying these criteria to the world’s 2,000 most publishing universities resulted in 809 STEM, 138 Medical and Health Sciences, and 1,053 Multidisciplinary institutions.

Each RI² component is field-normalized within the three broad categories and capped at the 99th percentile to reduce the impact of extreme outliers. The normalized values are then averaged to produce the composite RI² score, ensuring that no single component (whether D-Rate, R-Rate, or S-Rate) dominates the overall result. The remainder of this section presents a detailed account of the three RI² components, covering their scope, rationale, and calculation procedures.

Delisted Journal Risk (D-Rate): Measures the share of an institution’s output published in journals removed from Scopus or WoS due to violations of editorial, publishing, or peer-review standards (Cortegiani et al., 2020). Many delisted journals show warning signs long before removal (e.g., incoherent scope, abnormally short review times, excessive self-citation, geographic or institutional clustering, poor editorial transparency, aggressive solicitations). Continued publication in such venues, especially when flagged in one database but still active in another, can artificially inflate metrics and is treated by RI² as a measurable, foreseeable risk. D-Rate includes: (1) articles in journals delisted by Scopus; and (2) articles in journals delisted by WoS with records in Scopus. Both Scopus and WoS regularly re-evaluate indexed journals; removal is typically prospective (new content excluded, past content retained), with rare retroactive removals in WoS.

Between January 2023 and August 2025, a total of 309 journals were delisted: 132 by Scopus and 184 by WoS, yielding 286 unique delisted journals after removing overlap. Of the 132 journals delisted by Scopus, 22 remained actively indexed in WoS. Of the 184 journals delisted by WoS, 71 remained actively indexed in Scopus, and 9 others had their Scopus coverage discontinued without being formally tagged as delisted. In total, these 212 journals (132+71+9) produced 123,130 articles in 2023-2024. Implications include:

  • Quartile labels are not safeguards. Over 40% of the 212 journals carried Q1 or Q2 CiteScore quartiles at the time of delisting in Scopus and/or WoS, providing evidence that favorable quartiles do not inoculate against delisting.
  • Single-source journal quality vetting is risky. Over 53% of the 123,130 articles originated from the 71 WoS-delisted journals that are still active in Scopus, highlighting how database asymmetries and lags can materially affect institutional indicators if a university (or evaluator) relies solely on one database.

When calculated for the 2025 edition of RI², the formula for each institution is:

D-Ratei,2025 = 100 × 

Aidelisted(2023-2024)

Aitotal(2023-2024)

where Aidelisted counts articles in (a) Scopus-delisted journals and (b) WoS-delisted journals with Scopus records; articles from WoS-delisted journals without Scopus records in 2023-2024 are excluded to maintain a coherent denominator. The resulting D-Rates are then field-normalized (by RI² category) and winsorized at the 99th percentile before aggregation into the composite RI² score.

Retraction Risk (R-Rate): Measures the proportion of an institution’s research output that has been retracted, especially due to misconduct such as fabrication, plagiarism, ethical breaches, authorship or peer-review manipulation, or serious methodological flaws (Fang et al., 2012; Ioannidis et al., 2025). It is expressed as the number of retractions per 1,000 articles over the most recent two complete calendar years. According to RWDB, the ten most common reasons for article retractions are: Investigation by Journal/Publisher (48%), Unreliable Results and/or Conclusions (42%), Investigation by Third Party (34%), Concerns/Issues About Data (30%), Concerns/Issues About Referencing/Attributions (26%), Paper Mill (25%), Concerns/Issues with Peer Review (23%), Concerns/Issues About Results and/or Conclusions (19%), Fake Peer Review (19%), and Computer-Aided Content or Computer-Generated Content (18%).

RI² draws retraction data from four databases: the Retraction Watch Database (RWDB), Medline, Scopus, and Web of Science (WoS). RWDB is the world’s largest and most authoritative repository of retraction records, while Medline, Scopus, and WoS are widely used for publication, citation, and research assessment purposes, making them valuable complements for verification and enhanced coverage. All retraction records from these sources are consolidated, cross-referenced, and standardized using Elsevier’s SciVal platform to ensure consistency and comparability across institutions and data sources.

As of August 8, 2025, RWDB contained 65,990 records. Similar to Ioannidis et al. (2025), before matching with SciVal, we excluded 4,936 non-retractions (“correction,” “expression of concern,” “reinstatement”) and 2,517 unrelated to author misconduct (“retract and replace” and with certain conditions “error by journal/publisher,” “duplication of content through error by journal/publisher,” and “withdrawn as out of date”). We also excluded 13,984 non-article formats (e.g., book chapters, commentaries/editorials, letters, conference abstracts/papers). This yielded 44,553 retractions attributable to authors’ conduct, from which we removed 2,090 records lacking DOIs or PMIDs. Of the remaining 42,463 articles, 3,242 did not match to SciVal, resulting in 39,221 RWDB retractions for analysis.

In parallel, we retrieved retracted “articles” and “reviews” from Medline (n = 27,018) and WoS (n = 37,845) and “retracted” records from Scopus (n = 25,466), matching 25,686, 36,721, and 25,466 of these, respectively, to SciVal (via DOIs and PMIDs). We then merged the four matched datasets (RWDB, Medline, Scopus, WoS) in SciVal (n = 47,806) and identified 2,315 records where document type classifications were inconsistent across databases (e.g., records classified as retracted articles in one database but as retraction notices, conference papers, or editorials in another). Because such inconsistencies could distort retraction rates by misaligning the numerator (retracted articles) and denominator (all articles), these 2,315 records were excluded to maintain consistency in document-type definitions. We also removed via cross-referencing 739 records labeled as “retracted” in Scopus but classified as non-articles in WoS.

The final dataset consisted of 44,752 retractions matched in SciVal between 1996 and July 2025, including 4,972 from 2023-2024. After removing 18 retraction notices identified through manual inspection, the 2025 RI² R-Rate was calculated using 4,954 retracted articles, of which 3,331 were found in RWDB (Fig. 1. Flowchart for identifying retracted articles).

(Fig. 1. Flowchart for identifying retracted articles)

When calculated for the 2025 edition of RI², using data extracted in late 2025, the formula is:

R-Ratei,2025 = 1000 × 

Airetracted(2023-2024)

Aitotal(2023-2024)

where Airetracted counts retracted articles per RWDB, Medline, Scopus, and WoS with matching SciVal records, and where Aitotal counts all articles published by the institution, including the retracted ones, according to SciVal. The resulting R-Rates are then field-normalized (by RI² category) and winsorized at the 99th percentile before aggregation into the composite RI² score.

Self-Citation Rate (S-Rate): Measures the proportion of citations to an institution’s articles that originate from the same institution. While self-citation is not inherently problematic, such as in cases where sequential research builds on prior work, unusually high institutional self-citation rates can artificially inflate citation-based indicators used in rankings, research assessment exercises, and faculty evaluation. Elevated institutional self-citation rates may indicate strategic citation practices aimed at boosting institutional metrics rather than reflecting genuine scholarly influence. The S-Rate in RI² is not designed to treat every self-citation as a negative signal. Instead, it functions as a field-normalized, percentile-based risk indicator to identify statistical outliers relative to peer institutions in the same discipline, ensuring that the metric flags only those institutions whose citation patterns significantly deviate from disciplinary norms, rather than penalizing legitimate scholarly practices.

Self-citation data are obtained from InCites for articles published in the latest two full calendar years, to align with the other components of RI². Articles with more than 100 co-authors are excluded to reduce distortions caused by large-scale collaborations. InCites defines an institutional self-citation as a citation originating from any publication with at least one author affiliated with the same institution as the cited publication. When calculated for the 2025 edition of RI², using data extracted in November 2025, the formula for each institution is:

S-Ratei,2025 = 100 × 

Ciself(2023-2024)

Citotal(2023-2024)

where Ciself counts citations to the institution’s articles that originate from the same institution, and where Citotal counts all the citations to the institution’s articles, according to InCites. The resulting S-Rates are then field-normalized (by RI² category) and winsorized at the 99th percentile before aggregation into the composite RI² score.

Exploratory Watchlist of Problematic Journals

In addition to the three core RI² components (D-Rate, R-Rate, and S-Rate), the author is piloting an exploratory watchlist of journals that raise integrity concerns but are not yet part of the official RI² score. These journals were identified based on:

  • Excessive field-normalized self-citation rates

  • Abnormally high retraction concentrations (relative to field norms)

  • Inclusion on the Chinese Early Warning List (EWL)

This exploratory list is provided as a transparency measure and as an early warning tool for institutions, researchers, and policymakers. It does not contribute to institutional RI² scores in the 2025 edition but may inform future methodological developments.

Reference Group and Global Benchmarking Rationale
Collectively, the three components serve as empirically measurable proxies for broader research integrity concerns that extend beyond isolated misconduct cases. These include paper mills (businesses that sell authorship), citation cartels (reciprocal citation networks used to inflate impact), citation farms (organizations or networks that generate or sell citations), fraudulent authorship practices, and other forms of metric gaming, as well as systemic and institutional-level responsibilities toward safeguarding research quality and ethics. Such responsibilities encompass ensuring transparent authorship attribution, enforcing rigorous editorial and peer-review standards, promoting responsible citation behavior, implementing due diligence in journal selection, and fostering institutional cultures that prioritize integrity over metric-driven incentives. RI² also indirectly reflects institutional weaknesses such as insufficient training in research ethics, inadequate awareness-raising programs, weak accountability structures, and the absence of effective monitoring and early-warning systems. By focusing on verifiable, outcome-based indicators, RI² captures both the symptomatic manifestations of compromised research practices and the underlying structural vulnerabilities that institutions must actively address (Abalkina, 2023; Maisonneuve, 2025; Candal-Pedreira et al., 2024; Feng et al., 2024; Ioannidis & Maniadis, 2024; Lancho Barrantes et al., 2023; Smagulov & Teixeira da Silva, 2025; Teixeira da Silva & Nazarovets, 2023; Wright, 2024).

To ensure consistency across diverse studies, whether analyzing 50 universities in one country or 2,000 globally, RI² applies a fixed reference group: the 1,000 most publishing universities worldwide. This global baseline provides balanced disciplinary and geographic coverage, ensuring thresholds are not skewed by outliers. It functions analogously to clinical reference ranges: just as hypertension is diagnosed using globally standardized thresholds, RI² classifications rely on a universal benchmark to detect structural anomalies. Key advantages of this approach include:

  1. Universal comparability: RI² scores are always interpreted against the global baseline, not rescaled to local or sample-specific norms. This ensures consistency across geographic and temporal contexts.
  2. Stable thresholds: Risk tiers are fixed based on the empirical distribution of the reference group. For example, if the highest observed retraction rate is 3 per 1,000 articles, an institution with 1.5 retractions receives a normalized score of 0.5, regardless of sample size.

Normalization, Composite Scoring, and Tier Classification
Each indicator is scaled to a 0-1 range using Min-Max normalization relative to the global reference group. The composite RI² score is the simple average of the three:

RI²i,2025
Di + Ri + Si3

For the June 2025 edition, the retraction rate ranged from 0.00 to 26.82 retractions per 1,000 articles, and the share of articles in delisted journals ranged from 0.00% to 15.35%. These values define the normalization scale and remain fixed across all samples, ensuring stable cross-institutional comparisons regardless of geographic or disciplinary representation. Institutions are then classified into one of five risk tiers based on their RI² score as follows:

Tier Percentile Range Interpretation Score Range (2-component, June 2025 ed) Score Range (3-component, field-normalized, August 2025 beta ed)
Red Flag ≥ 95th Extreme anomalies; systemic integrity risk RI² ≥ 0.251 RI² ≥ 0.531
High Risk ≥ 90th and < 95th Significant deviation from global norms 0.176 ≤ RI² < 0.251 0.396 ≤ RI² < 0.531
Watch List ≥ 75th and < 90th Moderately elevated risk; emerging concerns 0.099 ≤ RI² < 0.176 0.270 ≤ RI² < 0.396
Normal Variation ≥ 50th and < 75th Within expected global variance 0.049 ≤ RI² < 0.099 0.194 ≤ RI² < 0.270
Low Risk < 50th Strong adherence to publishing integrity norms RI² < 0.049 RI² < 0.194

 

To assess whether the three RI² components capture distinct or overlapping dimensions of research risk, Pearson correlation coefficients were calculated across five segments of the 1,500 most-published universities. The correlation was r = 0.709 among the top 500 institutions, r = 0.600 for those ranked 501–1,000, r = 0.409 for those ranked 1,001–1,500, r = 0.635 for the top 1,000 universities, and r = 0.535 for the top 1,500 universities overall. These results indicate a moderate-to-strong association in the upper tiers, with the strength of the relationship diminishing among lower-output institutions. This gradient suggests that while the two indicators are related, they increasingly reflect distinct dimensions of research integrity risk as institutional publishing volume decreases, reinforcing the rationale for combining them in a composite index while maintaining their interpretive independence.

Equal weighting was deliberately adopted to ensure transparency, interpretability, and resistance to arbitrary weighting schemes, especially in the absence of robust empirical evidence favoring one indicator over the other. Future iterations of RI² may explore alternative weighting strategies informed by empirical risk severity, disciplinary norms, or differential predictive value. For now, the equal-weighted model offers a balanced approach to capturing integrity-related anomalies without overfitting to specific contexts or assumptions.

Key Features of the RI² Methodology

  1. Global Benchmarking: Ensures representative and statistically reliable classification.
  2. Fixed Thresholds for Consistency: Applicable across large or small datasets, preserving interpretability.
  3. Transparency and Resistance to Gaming: Built on normalized, verifiable metrics rather than subjective assessments.
  4. Strong correlation with other integrity-related indicators, such as self-citation rate: With correlations exceeding 0.96 to 0.99, RI² aligns closely with other known red flags in the literature integrity landscape.

References

  • Abalkina, A. (2023). Publication and collaboration anomalies in academic papers originating from a paper mill: Evidence from a Russia-based paper mill. Learned Publishing, 36(4), 689-702. https://doi.org/10.1002/leap.1574
  • Candal-Pedreira, C., Guerra-Tort, C., Ruano-Ravina, A., Freijedo-Farinas, F., Rey-Brandariz, J., Ross, J. S., & Pérez-Ríos, M. (2024). Retracted papers originating from paper mills: a cross-sectional analysis of references and citations. Journal of Clinical Epidemiology, 172, Article 111397. https://doi.org/10.1016/j.jclinepi.2024.111397
  • Cortegiani, A., Ippolito, M., Ingoglia, G., Manca, A., Cugusi, L., Severin, A., Strinzel, M., Panzarella, V., Campisi, G., Manoj, L., Gregoretti, C., Einav, S., Moher, D., & Giarratano, A. (2020). Citations and metrics of journals discontinued from Scopus for publication concerns: The GhoS(t)copus Project. F1000Research, 9, Article 415. https://doi.org/10.12688/f1000research.23847.2
  • Fang, F. C., Steen, R. G., & Casadevall, A. (2012). Misconduct accounts for the majority of retracted scientific publications. Proceedings of the National Academy of Sciences of the United States of America, 109(42), 17028-17033. https://doi.org/10.1073/pnas.1212247109
  • Feng, S., Feng, L., Han, F., Zhang, Y., Ren, Y., Wang, L., & Yuan, J. (2024). Citation network analysis of retractions in molecular biology field. Scientometrics, 129(8), 4795-4817. https://doi.org/10.1007/s11192-024-05101-4
  • Ioannidis, J. P. A., & Maniadis, Z. (2024). Quantitative research assessment: using metrics against gamed metrics. Internal and Emergency Medicine, 19(1), 39-47. https://doi.org/10.1007/s11739-023-03447-w
  • Ioannidis, J. P. A., Pezzullo, A. M., Cristiano, A., Boccia, S., & Baas, J. (2025). Linking citation and retraction data reveals the demographics of scientific retractions among highly cited authors. PLoS Biology, 23(1), Article e3002999. https://doi.org/10.1371/journal.pbio.3002999
  • Lancho Barrantes, B. S., Dalton, S., & Andre, D. (2023). Bibliometrics methods in detecting citations to questionable journals. Journal of Academic Librarianship, 49(4), Article 102749. https://doi.org/10.1016/j.acalib.2023.102749
  • Maisonneuve, H. (2025). Predatory journals and paper mills jeopardise knowledge management. Bulletin du Cancer, 112(1), 100-110. https://doi.org/10.1016/j.bulcan.2024.12.002
  • Smagulov, K., & Teixeira da Silva, J. A. (2025). Scientific productivity and retracted literature of authors with Kazakhstani affiliations during 2013-2023. Journal of Academic Ethics. https://doi.org/10.1007/s10805-025-09624-0
  • Teixeira da Silva, J. A., & Nazarovets, S. (2023). Assessment of retracted papers, and their retraction notices, from a cancer journal associated with “paper mills”. Journal of Data and Information Science, 8(2), 118-125. https://doi.org/10.2478/jdis-2023-0009
  • Wright, D. E. (2024). Five problems plaguing publishing in the life sciences—and one common cause. FEBS Letters, 598(18), 2227-2239. https://doi.org/10.1002/1873-3468.15018
The views expressed on this site are those of the authors and do not reflect or represent the official policy or position of AUB. The information available on this site is provided “as is” without warranty of any kind. AUB will not be liable for the accuracy, content, completeness, legality or reliability of the information contained on this site. If this website provides links to other websites owned by third parties, AUB is not responsible for the content available thereon. RI² is a composite metric intended to highlight potential research integrity risks using publicly available data. It does not assert misconduct or wrongdoing by any institution or individual. All data and rankings are subject to updates and refinements as new information becomes available.