The bibliometric analysis was conducted on a final dataset comprising 2,096 unique journal articles focused on the intersection of Artificial Intelligence (AI) and wastewater treatment, published up to the end of 2024. The data was retrieved from Scopus and Web of Science Core Collection databases, following the curation process detailed in the Methodology section and illustrated in Fig. 1. This section presents the key findings derived from this dataset.
A summary of the main bibliometric characteristics of the dataset is presented in Table 1. The analyzed literature spans from 1987 to 2024, indicating the earliest identified work in this specific cross-section of AI and wastewater treatment. These 2,096 articles were published across 616 different scientific sources (journals). The field has demonstrated significant dynamism, evidenced by an annual growth rate of 17.14% over the period where publications appeared consistently.
The research involves a substantial community, with 7,411 authors contributing to the publications. Collaboration appears to be a dominant feature of this research area. This is highlighted by the low number of single-authored documents (only 47) and a high average number of co-authors per document (5.64). Furthermore, international collaboration is prominent, with 25.1% of the documents involving co-authorship between researchers from different countries.
The dataset encompasses 5,970 unique author keywords (DE), reflecting the thematic diversity within the field. The average age of the documents in the dataset is relatively low at 4.01 years (calculated relative to the search date), suggesting that the literature is, on average, quite recent. Finally, the publications in this dataset have received an average of 17.63 citations per document, indicating a notable level of scientific impact and visibility within the academic community.
Table 1
Main Information about the Bibliometric Dataset
|
Metric
|
Value
|
|
Timespan
|
1987–2024
|
|
Sources
|
616
|
|
Documents
|
2096
|
|
Annual Growth Rate
|
17.14%
|
|
Authors
|
7411
|
|
Authors of Single-Authored
|
47
|
|
International Co-Authorship
|
25.1%
|
|
Co-Authors per Document
|
5.64
|
|
Author’s Keywords (DE)
|
5970
|
|
Document Average Age
|
4.01
|
|
Average Citations per Doc
|
17.63
|
3.1 Annual Scientific Production and Citation Trends
To understand the temporal evolution and impact of research in this field, the annual scientific production and corresponding citation counts were analyzed. Figure 2 presents the distribution of publications (blue bars, left axis) and the total citations received by those publications per year (red line, right axis) from 1987 to 2024.
The analysis reveals a distinct growth pattern. The field emerged sparsely, with the first publications appearing in 1987 (n = 2). Activity remained very low and sporadic for nearly two decades, typically fewer than 10 articles published annually until 2008. A phase of slow but steady growth began around 2008, with publication numbers consistently reaching double digits.
There was a notable acceleration in publications beginning in around 2016 (n = 33). The acceleration was particularly strong, and near-exponential, from 2019 onward. Publications nearly doubled between 2020 (n = 98) and 2021 (n = 209). The volume of publication crossed the 200 mark for the first time in 2021, then grew to nearly 300 (n = 299) in 2022 and surpassed 400 (n = 416) in 2023. 2024 records the steepest increase in the total number of publications recorded in this dataset, totaling 697. This acceleration noticeable in recent years indicates the growing interest and activity using AI methods for wastewater treatment related challenges.
The trend of annual citations generally mirrors the publication trend, albeit with an expected time lag inherent to the citation process. Citations remained low during the early years but started to climb significantly after 2017, corresponding to the increased publication volume and growing recognition of the field. The peak in annual citations occurred for papers published in 2021 (6,399 citations). The subsequent decrease in total citations for publications from 2022, 2023, and 2024 is a typical bibliometric artifact, as these more recent articles have had less time to accumulate citations compared to older ones. Nonetheless, the substantial number of citations accrued even by these recent papers highlights the ongoing impact and relevance of the research being published.
3.2 Most Relevant Sources
The dissemination channels for research on AI in wastewater treatment were identified by analyzing the journals that published the articles in the dataset. Table 2 lists the top 10 most productive journals based on the number of publications (TP), along with metrics reflecting their impact both within this specific dataset (TC, TC/TP, h-index) and more broadly (Impact Factor 2023, CiteScore 2023) and their primary subject categories.
Table 2
Top 10 Most Relevant Sources Publishing Research on AI in Wastewater Treatment (1987–2024)
|
Rank
|
Journal
|
TP1
|
TC2
|
(TC/TP)3
|
h-index (Dataset)4
|
IF 20235
|
CS 20236
|
Categories/positions (Exemplary)7
|
ISSN numbers
|
Initial year
|
|
1
|
SCIENCE OF THE TOTAL ENVIRONMENT
|
95
|
1743
|
18.35
|
23
|
8.2
|
17.6
|
Env Sci, Env Eng (Q1/9/197); Env Sci, Pollut (Q1/9/167); Env Sci, Waste Mgt (Q1/10/134)
|
0048-9697
|
1972
|
|
2
|
JOURNAL OF ENVIRONMENTAL MANAGEMENT
|
85
|
1530
|
18
|
22
|
8
|
13.7
|
Env Sci, Mgt/Policy (Q1/17/399); Env Sci, Env Eng (Q1/13/197); Env Sci, Waste Mgt (Q1/15/134)
|
0301–4797
|
1973
|
|
3
|
JOURNAL OF WATER PROCESS ENGINEERING
|
70
|
645
|
9.21
|
13
|
6.3
|
10.7
|
Chem Eng, Process Tech (Q1/12/73); Env Sci, Waste Mgt (Q1/24/134); Biotech (Q1/42/311)
|
2214–7144
|
2014
|
|
4
|
WATER RESEARCH
|
65
|
1455
|
22.38
|
23
|
11.5
|
20.8
|
Env Sci, Water Sci/Tech (Q1/2/261); Env Sci, Env Eng (Q1/5/197); Env Sci, Waste Mgt (Q1/6/134)
|
0043-1354
|
1967
|
|
5
|
CHEMOSPHERE
|
63
|
1451
|
23.03
|
23
|
8.1
|
15.8
|
Env Sci, Env Chem (Q1/14/147); Env Sci, Env Eng (Q1/11/197); Env Sci, Pollut (Q1/12/167)
|
0045-6535
|
1972
|
|
6
|
WATER SCIENCE AND TECHNOLOGY
|
59
|
795
|
13.47
|
15
|
2.5
|
4.9
|
Env Sci, Water Sci/Tech (Q2/86/261); Env Sci, Env Eng (Q2/77/197)
|
0273–1223
|
1969
|
|
7
|
JOURNAL OF CLEANER PRODUCTION
|
49
|
1683
|
34.35
|
25
|
9.8
|
20.4
|
Env Sci, Gen Env Sci (Q1/5/233); Energy, Renew/Sustain/Env (Q1/17/270); Eng, Indust/Manuf (Q1/10/384)
|
0959–6526
|
1993
|
|
8
|
BIORESOURCE TECHNOLOGY
|
43
|
812
|
18.88
|
15
|
9.7
|
20.8
|
Env Sci, Env Eng (Q1/6/197); Env Sci, Waste Mgt (Q1/7/134); Energy, Renew/Sustain/Env (Q1/16/270)
|
0960–8524
|
1991
|
|
9
|
WATER
|
38
|
514
|
13.53
|
12
|
3
|
5.8
|
Env Sci, Water Sci/Tech (Q1/63/261); Agri/Bio Sci, Aquatic Sci (Q1/40/247)
|
2073–4441
|
2009
|
|
10
|
ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH
|
31
|
411
|
13.26
|
13
|
N/A
|
8.7
|
Env Sci, Pollut (Q1/31/167); Env Sci, Env Chem (Q1/33/147); Env Sci, Health/Tox/Mutagen (Q1/25/148)
|
0944–1344
|
1994
|
| 1 TP: Total Publications in dataset |
| 2 TC: Total Citations from dataset |
| 3 TC/TP: Average Citations per Publication |
4 h-index (Dataset): h-index based on TP & TC within the dataset
5 Journal Impact Factor for the year 2023. N/A indicates the value was not available from the source checked
6 CiteScore for the year 2023
7 Categories/Positions (Exemplary): Shows selected subject categories relevant to the study's topic, indicating journal quartile, rank within category, and total journals in category (Format: Qx/Rank/Total). "(Exemplary)" signifies that this is a selection, not necessarily an exhaustive list of all categories for the journal. Category Abbreviations: Agri/Bio Sci = Agricultural and Biological Sciences; Biotech = Biotechnology; Chem Eng = Chemical Engineering; Env Chem = Environmental Chemistry; Env Eng = Environmental Engineering; Env Sci = Environmental Science; Gen Env Sci = General Environmental Science; Geog/Plan/Dev = Geography, Planning and Development; Health/Tox/Mutagen = Health, Toxicology and Mutagenesis; Indust/Manuf = Industrial and Manufacturing Engineering; Mgt/Policy = Management, Monitoring, Policy and Law; Pollut = Pollution; Process Tech = Process Chemistry and Technology; Renew/Sustain/Env = Renewable Energy, Sustainability and the Environment; Waste Mgt = Waste Management and Disposal; Water Sci/Tech = Water Science and Technology.
Science of the Total Environment emerged as the leading journal, publishing the highest number of articles (TP = 95) in this field. It is closely followed by the Journal of Environmental Management (TP = 85) and the Journal of Water Process Engineering (TP = 70). These three journals collectively account for a substantial portion (250 articles, approx. 11.9%) of the total publications, indicating their central role in disseminating research at the intersection of AI and wastewater treatment.
Several other journals also serve as key outlets, including Water Research (TP = 65) and Chemosphere (TP = 63). While ranking slightly lower in publication volume within the top 10, some journals demonstrate high impact. Notably, the Journal of Cleaner Production exhibits the highest average citations per paper (TC/TP = 34.35) and the highest dataset-specific h-index (h = 25), suggesting its published articles in this area are particularly influential, despite having fewer publications (TP = 49) than the top-ranked journals. Water Research (TC/TP = 22.38, h = 23) and Chemosphere (TC/TP = 23.03, h = 23) also show high citation impact per paper specific to this topic. The high overall IF (11.5) and CS (20.8) for Water Research, and the high CS for Bioresource Technology (20.8) and Journal of Cleaner Production (20.4), further underscore their prominence.
An analysis of the journal categories indicates that most published research is featured in journals addressing Environmental Science, Environmental Engineering, Water Science and Technology, Waste Management, and Pollution. The journals Science of the Total Environment, Journal of Environmental Management, Water Research, and Chemosphere fall under core outlets for environmental and water research. The Journal of Water Process Engineering signifies the role of chemical and process engineering contexts, while the Journal of Cleaner Production and Bioresource Technology expand the context to sustainability and resource management and bioengineering perspectives. Water Science and Technology (primarily Q2) clarifies that relevant research is also available in reputable journals with specific, respected impact factors, despite being lower than the top-tier Q1 journals at the top of the list. Most of the top-10 journals are ranked as Q1 journals in their respectivemost relevant category, indicating that research on AI for wastewater treatment is frequently published in high-impact, reputable outlets related specifically to environmental science and engineering.
3.3 Geographic Distribution and Performance of Countries
To understand the global landscape of research on AI in wastewater treatment, the geographic distribution of publications and the performance of contributing countries were analyzed. The analysis considered total publication output, citation impact, and patterns of national and international collaboration.
3.3.1 Introduction & Overall Productivity
The overall geographic distribution of publications is visualized in Fig. 3, which maps the total number of publications (TP) attributed to authors' affiliations across different countries. Countries are colored based on publication volume according to the legend.
The map reveals a wide international interest in the field, with contributions originating from numerous countries across all continents. However, research productivity is highly concentrated in certain regions. Asia emerges as the most prolific continent, largely driven by China, which stands out with a significantly higher publication volume (TP = 2,072) than any other country. North America, represented primarily by the USA (TP = 686), is another major center of research. Significant contributions also originate from other Asian countries like India (TP = 530), Saudi Arabia (TP = 400), Iran (TP = 290), and South Korea (TP = 253), as well as several European nations like Spain (TP = 191) and the UK (TP = 180), and Australia (TP = 154) in Oceania. Many countries in Europe, South America, and Africa show moderate to low levels of activity, indicating varying degrees of engagement with this research topic globally.
For a deeper quantitative analysis of country performance, Table 3 displays the top 10 most productive countries ranked by Total Publications (TP). It also presents the citation impact variables (Total Citations - TC; Average Citations per Publication - TC/TP) and collaboration (Corresponding Author papers, Single Country Publications - SCP; Multiple Country Publications - MCP, and the MCP rate as a percentage). These aspects will be analyzed further in subsequent sections.
Table 3
Performance and Collaboration Metrics for the Top 10 Countries Contributing to AI in Wastewater Treatment Research (1987–2024)
|
Rank
|
Country
|
TP
|
TC
|
TC/TP
|
Corr. Auth.8
|
SCP9
|
MCP10
|
MCP Rate (%)11
|
|
1
|
CHINA
|
2072
|
9340
|
4.51
|
601
|
488
|
113
|
18.8%
|
|
2
|
USA
|
686
|
4485
|
6.54
|
174
|
142
|
32
|
18.4%
|
|
3
|
INDIA
|
530
|
1818
|
3.43
|
158
|
132
|
26
|
16.5%
|
|
4
|
SAUDI ARABIA
|
400
|
933
|
2.33
|
70
|
30
|
40
|
57.1%
|
|
5
|
IRAN
|
290
|
1910
|
6.59
|
97
|
65
|
32
|
33.0%
|
|
6
|
SOUTH KOREA
|
253
|
1591
|
6.29
|
74
|
53
|
21
|
28.4%
|
|
7
|
SPAIN
|
191
|
1999
|
10.47
|
81
|
69
|
12
|
14.8%
|
|
8
|
UK
|
180
|
695
|
3.86
|
44
|
30
|
14
|
31.8%
|
|
9
|
CANADA
|
176
|
719
|
4.09
|
56
|
42
|
14
|
25.0%
|
|
10
|
AUSTRALIA
|
154
|
682
|
4.43
|
43
|
23
|
20
|
46.5%
|
8 Number of publications where the corresponding author is affiliated with the country (measure of research leadership).
9 Single Country Publications (number of Corr. Auth. papers involving authors only from that country).
10 Multiple Country Publications (number of Corr. Auth. papers involving authors from at least one other country).
11 Percentage of corresponding author papers involving international collaboration ((MCP / Corr. Auth.) * 100).
As indicated on the map and detailed in Table 3, China is the clear leader in publication volume (TP = 2,072), followed by the USA (TP = 686) and India (TP = 530). These top three countries alone account for a substantial share of the total research output in this domain.
3.3.2 Citation Impact
Examining the citation impact metrics in Table 3 provides further insights into the influence of contributions from different countries. In terms of Total Citations (TC), the ranking largely mirrors the productivity ranking for the very top positions. China leads significantly with 9,340 citations, followed by the USA with 4,485 citations. However, some variations emerge further down the list. Notably, Spain (ranked 7th in TP) achieves the 3rd highest total citations (TC = 1,999), surpassing India, Saudi Arabia, Iran, and South Korea despite having lower publication output. Similarly, Iran (5th in TP) also garnered substantial citations (TC = 1,910), placing it 4th by this metric.
A more nuanced view of citation impact, normalized by publication volume, is offered by the Average Citations per Publication (TC/TP). Here, Spain stands out prominently with the highest average impact (TC/TP = 10.47), suggesting that its publications in this field are highly cited on average. Iran (TC/TP = 6.59), the USA (TC/TP = 6.54), and South Korea (TC/TP = 6.29) also demonstrate strong citation impact relative to their output, achieving notably higher average citations per paper compared to several other leading nations in this list. Conversely, countries like India (TC/TP = 3.43) and Saudi Arabia (TC/TP = 2.33), despite their high productivity (ranked 3rd and 4th in TP respectively), show lower average citation impact per paper compared to other leading nations within this top 10 list. This indicates that while contributing significantly in volume, the average citation influence of their papers in this specific dataset is comparatively lower than that of publications from countries like Spain or Iran. China's TC/TP (4.51) is moderate relative to the other top performers in this group, suggesting its leading position in total citations is primarily driven by its vast publication volume.
3.3.3 Leadership & Collaboration Intensity
Beyond overall productivity and citation impact, the analysis explored patterns of research leadership and international collaboration. Figure 4 presents the distribution of publications based on the country affiliation of the corresponding author, differentiating between Single Country Publications (SCP), involving authors only from the corresponding author's country, and Multiple Country Publications (MCP), which include authors from at least one other country. Table 3 provides the specific counts (Corr. Auth., SCP, MCP) and the calculated MCP Rate (%), indicating the proportion of internationally collaborative papers led by authors from that country.
The ranking based on corresponding authorship (Table 3, Corr. Auth. column) generally aligns with the overall productivity ranking, with China (601 papers), the USA (174 papers), and India (158 papers) leading in the number of publications where they hold the corresponding author role. This suggests these countries are not only high producers but also frequently lead the research projects reported.
Figure 4 and the MCP Rate (%) in Table 3 reveal diverse patterns of international collaboration leadership among the top countries. Some nations demonstrate a high propensity for leading internationally co-authored studies. Saudi Arabia exhibits the highest rate, with 57.1% of its corresponding-authored papers involving international partners (MCP). Australia (46.5%), Iran (33.0%), and the UK (31.8%) also show substantial rates of leading collaborative international publications. This indicates a strong outward-looking collaborative strategy or integration into international networks for researchers in these countries when they are leading projects in this field.
Conversely, several other highly productive countries show a stronger tendency towards domestic leadership or collaboration. Spain has a notably low MCP Rate (14.8%), suggesting that while impactful (as seen from its high TC/TP), its research leadership in this field is predominantly within nationally contained projects. Similarly, major players like China (18.8%), the USA (18.4%), and India (16.5%) have relatively lower MCP rates compared to countries like Saudi Arabia or Australia, indicating that a large proportion of the research led from these nations involves domestic collaborators primarily.
3.3.4 Collaboration Network Structure
To further elucidate the structure of international cooperation, a co-authorship network map based on country collaborations was generated, as shown in Fig. 5. In this network, nodes represent countries, and the links between them indicate co-authorship on publications within the dataset. The size of the nodes corresponds to the number of publications or collaborative links, while the thickness of the links reflects the frequency of collaboration between two countries. Colors are used to delineate clusters of countries that collaborate more frequently with each other than with countries outside their cluster. Clusters identified using the Walktrap algorithm.
The collaboration network visually reveals a distinct core-periphery structure. China appears prominently positioned near the center of the map with the largest node size and numerous connections, suggesting a central role in the network. The USA also occupies a visually central position with a large node size and many links. Saudi Arabia is noticeable for its relatively large node size and strong connections, particularly bridging towards a distinct cluster of countries. Other countries like the UK, Australia, Canada, and India are also visibly well-connected within the main network structure.
The visualization highlights several distinct collaborative clusters based on proximity and link color:
-
The largest cluster (Red) appears centered around China and the USA, encompassing major contributors from North America (Canada), Europe (UK, Germany, Sweden, Netherlands), Asia (India, Korea, Japan, Vietnam, Singapore), and Africa (South Africa, Nigeria). This visually represents the main global research nexus.
-
A prominent secondary cluster (Green) visibly connects Saudi Arabia with Malaysia, Egypt, Pakistan, Iraq, UAE, Algeria, Indonesia, and Russia, suggesting strong regional collaborations.
-
A Blue cluster links European nations like Spain, Portugal, and Austria.
-
An Orange cluster connects Italy, Poland, and Brazil.
-
Other smaller groups and more isolated nodes (e.g., Israel, Morocco) indicate countries with fewer observed collaborations within this network visualization.
Visually strong collaboration pathways (indicated by thicker links) appear between China and the USA, China and Saudi Arabia, Saudi Arabia and Malaysia, and within European and North American subgroups. Overall, the visual structure of the network underscores the prominent role of a few key countries, particularly China and the USA, in international collaboration, alongside evidence of distinct regional collaborative groups.
3.3.5 Temporal Dynamics
To understand the historical context leading to the current landscape, the evolution of publication output over time for the five most productive countries is presented in Fig. 6.
Figure 6 highlights distinct development trajectories among the leading nations. The USA was among the earliest and most consistent contributors during the initial decades of research in this field. In contrast, China's involvement began later but experienced near-exponential growth, particularly visible from approximately 2018 onwards. This rapid acceleration resulted in China surpassing all other nations to become the most prolific publisher in recent years. Significant growth in publication output, especially since 2017–2018, is also evident for other leading Asian countries shown, namely India, Iran, and Saudi Arabia, highlighting a recent intensification of research activity in these nations alongside the established players.
3.4 Performance of Research Institutions
To identify the key institutional players and their collaborative structures in the field of AI for wastewater treatment, an analysis of institutional productivity and co-authorship networks was conducted.
3.4.1 Leading Research Institutions
The productivity of research institutions was quantified by their total number of publications (TP) within the collected dataset. Table 4 lists the top 10 most productive institutions in this field.
Table 4
Top 10 Most Productive Research Institutions in AI for Wastewater Treatment Research (1987–2024)
|
Rank
|
Institutions
|
TP
|
Country
|
|
1
|
King Fahd University of Petroleum and Minerals
|
73
|
Saudi Arabia
|
|
2
|
Tsinghua University
|
51
|
China
|
|
3
|
Tongji University
|
48
|
China
|
|
4
|
King Khalid University
|
43
|
Saudi Arabia
|
|
5
|
Duy Tan University
|
40
|
Vietnam
|
|
6
|
Central South University
|
38
|
China
|
|
7
|
University Tehran
|
38
|
Iran
|
|
8
|
Prince Sattam Bin Abdulaziz University
|
36
|
Saudi Arabia
|
|
9
|
South China University of Technology
|
34
|
Chine
|
|
10
|
King Abdulaziz University
|
33
|
Saudi Arabia
|
As detailed in Table 4, King Fahd University of Petroleum and Minerals (Saudi Arabia) stands out as the most prolific institution, contributing 73 publications. Chinese universities also demonstrate significant output, with Tsinghua University (TP = 51) and Tongji University (TP = 48) ranking second and third, respectively. The strong presence of Saudi Arabian institutions is further evidenced by King Khalid University (TP = 43), Prince Sattam Bin Abdulaziz University (TP = 36), and King Abdulaziz University (TP = 33) all featuring in the top 10. This underscores a considerable concentration of research activity in this field within both Saudi Arabia and China, each nation being home to four of the top 10 institutions.
Duy Tan University from Vietnam holds the fifth position with 40 publications, highlighting its notable research involvement. The University of Tehran (Iran), with 38 publications, also shows substantial productivity. The overall distribution indicates that a select group of institutions leads research output, significantly shaping the knowledge landscape in this domain.
3.4.2 Institutional Collaboration Network
To explore the collaborative ties between institutions, a co-authorship network was constructed using Bibliometrix, with network parameters detailed as follows: the visualization displays the top 50 institutions, applying an automatic layout, association normalization for link strength, and the Walktrap algorithm for community detection (clustering). Isolated nodes were removed, and a minimum of one edge was required for inclusion. The resulting network is presented in Fig. 7.
The institutional collaboration network Fig. 7 reveals distinct patterns of research partnerships. Several prominent clusters are evident, often centered on highly productive institutions. For instance, a significant purple-colored cluster features King Khalid University (Saudi Arabia) as a central node, indicating strong collaborative links with other Saudi Arabian institutions and select international partners. Similarly, an orange-colored cluster highlights the dense collaborations among Chinese institutions, with Tsinghua University and the University of Chinese Academy of Sciences appearing as key players; Duy Tan University (Vietnam) is notably integrated into this cluster, suggesting its research output is significantly bolstered by these international partnerships.
King Fahd University of Petroleum and Minerals (Saudi Arabia), the institution identified as the leader regarding volume of publications, was the anchor institution for another distinct (green-colored) collaboration group. Overall, the network indicates that leading research institutions also serve as hubs within broader collaborative networks that may be national and/or international in scope. Furthermore, the prominence of the institutions Tsinghua University, King Khalid University, and King Fahd University of Petroleum and Minerals within the network aligns with high publication volume, again suggestive that high research productivity typically aligns with expansive collaborative engagement.
3.5 Most Relevant Authors
Following the analysis of journals, countries, and institutions, this section focuses on the individual researchers who are key contributors to the field of AI in wastewater treatment. The analysis identifies the most productive and impactful authors, examines their publication trajectory over time, and maps their collaborative networks.
3.5.1 Leading Authors by Productivity and Impact
Writing quality of authors has been measured by their overall publications (TP), total citations (TC) within the dataset, average citations per publication (TC/TP), and h-index. Table 5 presents the top 10 most influential authors ranking for these parameters.
Table 5
Top 10 Most Relevant Authors in AI for Wastewater Treatment Research (1987–2024)
|
Rank
|
Author
|
Institution
|
TP
|
TC
|
TC/TP
|
h-index
|
|
1
|
WANG Y
|
HUANG JINGANG
|
51
|
736
|
14.43
|
15
|
|
2
|
LI J
|
YANG JIAQIAN
|
40
|
1419
|
35.48
|
17
|
|
3
|
WANG J
|
WANG JINGTING
|
39
|
258
|
6.62
|
8
|
|
4
|
LI Y
|
FU SONGZHE
|
38
|
723
|
19.03
|
14
|
|
5
|
ZHANG Y
|
WANG JINGTING
|
38
|
646
|
17
|
15
|
|
6
|
LIU Y
|
MA SHUAIYIN
|
37
|
696
|
18.81
|
12
|
|
7
|
WANG X
|
ZHANG KAI
|
36
|
1085
|
30.14
|
14
|
|
8
|
WANG Z
|
YUAN SHIDENG
|
35
|
473
|
13.51
|
12
|
|
9
|
LI X
|
LI XINGYANG
|
34
|
672
|
19.76
|
13
|
|
10
|
WANG H
|
HUANG JINGANG
|
31
|
414
|
13.35
|
12
|
As presented in Table 5, Wang Y emerges as the most productive author with 51 publications. Following closely are Li J with 40 publications and Wang J with 39 publications. While productivity is a key indicator, citation metrics reveal further nuances regarding impact. Notably, Li J, despite being second in TP, exhibits the highest total citations (TC = 1419), the highest average citations per publication (TC/TP = 35.48), and the highest h-index (h = 17) among the top 10 authors, underscoring the significant influence of this author's work. Similarly, Wang X also demonstrates considerable impact with a high TC/TP of 30.14 and an h-index of 14. Conversely, some highly productive authors may have a more moderate citation impact per paper, illustrating the diverse contribution profiles within the leading research cohort. The prevalence of common surnames such as Wang, Li, and Zhang among the top authors further highlights the importance of considering potential name disambiguation challenges.
3.5.2 Temporal Production Trends of Top Authors
To understand the research life cycle and publication dynamics of top authors, their yearly scientific output is shown in Fig. 8. The bubble sizes indicate the number of articles the author published in that year, while the color brightness of the bubbles represents the total citations received (TC per Year) by the articles published by that author in that year.
Figure 8 reveals diverse publication trajectories among the top authors. For instance, Li X appears as an early contributor to the field, with publications dating back to the mid-1990s, followed by more consistent output in recent years. However, the majority of the top 10 authors, including Wang Y, Li J, and Wang J, began publishing significantly in this domain more recently, typically from around 2014–2018 onwards, yet have achieved high productivity in a relatively short period. This aligns with the overall accelerated growth of the research field itself. The sustained or increasing publication output from these authors in recent years highlights their ongoing active engagement.
3.5.3 Author Collaboration Network
The collaborative relationships among authors were analyzed to map the social structure of research in this field. Figure 9 presents the co-authorship network, displaying the top 50 authors based on their collaborative activities. The network was generated using Bibliometrix, employing an automatic layout, association normalization for link strength, and the Walktrap algorithm for community detection. Node size is proportional to the author's total publications (TP), and link thickness represents the frequency of co-authorship.
The co-authorship network, Fig. 9 illustrates a landscape characterized by a large, densely connected main component, alongside a few smaller, detached groups. Many of the top-ranking authors from Table 5, such as Wang Y and Li J, are prominently positioned within this main component, often appearing as central nodes within distinct colored clusters. This indicates that these highly productive individuals are also deeply embedded in collaborative research groups. For example the red cluster appears to connect authors like Li J and Wang X, while the blue cluster is centered around Wang Y and Zhang Y. The presence of several distinct clusters within the main component suggests the existence of multiple active research teams or communities. The smaller, isolated groups (e.g., Comas J and Poch M; Cho K) represent authors or small teams working with fewer connections to the broader network shown. Overall, the network underscores a high degree of collaboration among the leading researchers in the field of AI for wastewater treatment.
3.6 Most Cited Documents
To identify the most impactful research contributions at the intersection of AI and wastewater treatment, the top 10 most globally cited documents within the dataset were analyzed. These publications often represent significant advancements, foundational reviews, or widely adopted methodologies that have substantially influenced the direction and development of the field. Table 6 lists these highly cited documents along with their authors, publication year, source journal, and total citation count (TC) within the current dataset.
Table 6
Top 10 Most Cited Documents on AI in Wastewater Treatment Research (1987–2024)
|
Rank
|
Title
|
Authors
|
Year
|
Source
|
TC
|
|
1
|
STATE OF THE ART FOR GENETIC ALGORITHMS AND BEYOND IN WATER RESOURCES PLANNING AND MANAGEMENT
|
NICKLOW J, REED P, SAVIC D, DESSALEGNE T, HARRELL L, CHAN-HILTON A, KARAMOUZ M, MINSKER B, OSTFELD A, SINGH A, ZECHMAN E
|
2010
|
JOURNAL OF WATER RESOURCES PLANNING AND MANAGEMENT
|
532
|
|
2
|
DEEPARG: A DEEP LEARNING APPROACH FOR PREDICTING ANTIBIOTIC RESISTANCE GENES FROM METAGENOMIC DATA
|
ARANGO-ARGOTY G, GARNER E, PRUDENT A, HEATH, LENWOOD S L, VIKESLAND P, ZHANG L
|
2018
|
MICROBIOME
|
457
|
|
3
|
ACTIVATED SLUDGE WASTEWATER TREATMENT PLANT MODELLING AND SIMULATION: STATE OF THE ART
|
GERNAEY K, VAN L M, HENZE M, LIND M, JORGENSEN S
|
2004
|
ENVIRONMENTAL MODELLING \& SOFTWARE
|
394
|
|
4
|
BACTERIAL COMMUNITY STRUCTURES ARE UNIQUE AND RESILIENT IN FULL-SCALE BIOENERGY SYSTEMS
|
WERNER J, KNIGHTS D, GARCIA M, SCALFONE, NICHOLAS B N, SMITH S, YARASHESKI K, CUMMINGS, THERESA A T, BEERS A, KNIGHT R, ANGENENT L
|
2011
|
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
|
375
|
|
5
|
THE APPLICATION OF MACHINE LEARNING METHODS FOR PREDICTION OF METAL SORPTION ONTO BIOCHARS
|
ZHU X, WANG X, OK Y
|
2019
|
JOURNAL OF HAZARDOUS MATERIALS
|
244
|
|
6
|
AN INSIGHT INTO MACHINE LEARNING MODELS ERA IN SIMULATING SOIL, WATER BODIES AND ADSORPTION HEAVY METALS: REVIEW, CHALLENGES AND SOLUTIONS
|
YASEEN Z
|
2021
|
CHEMOSPHERE
|
239
|
|
7
|
AUTOMATED DETECTION OF SEWER PIPE DEFECTS IN CLOSED-CIRCUIT TELEVISION IMAGES USING DEEP LEARNING TECHNIQUES
|
CHENG J, WANG M
|
2018
|
AUTOMATION IN CONSTRUCTION
|
232
|
|
8
|
A DEEP LEARNING CNN ARCHITECTURE APPLIED IN SMART NEAR-INFRARED ANALYSIS OF WATER POLLUTION FOR AGRICULTURAL IRRIGATION RESOURCES
|
CHEN H, CHEN A, XU L, XIE H, QIAO H, LIN Q, CAI K
|
2020
|
AGRICULTURAL WATER MANAGEMENT
|
228
|
|
9
|
APPLICATION OF THE ANALYTIC HIERARCHY PROCESS AND THE ANALYTIC NETWORK PROCESS FOR THE ASSESSMENT OF DIFFERENT WASTEWATER TREATMENT SYSTEMS
|
BOTTERO M, COMINO E, RIGGIO V
|
2011
|
ENVIRONMENTAL MODELLING AND SOFTWARE
|
199
|
|
10
|
FUEL PROPERTIES OF HYDROCHAR AND PYROCHAR: PREDICTION AND EXPLORATION WITH MACHINE LEARNING
|
LI J, PAN L, SUVARNA M, TONG Y, WANG, XIAONAN X
|
2020
|
APPLIED ENERGY
|
195
|
The analysis of the most cited documents, as presented in Table 6, reveals a diverse set of influential works. The leading publication by Nicklow et al. (2010) [31], titled "State of the art for genetic algorithms and beyond in water resources planning and management," has garnered 532 citations, highlighting the enduring relevance of optimization techniques, including early AI approaches like genetic algorithms, in the broader water resources sector. Another highly cited earlier work is by Gernaey et al. (2004) [32], on activated sludge wastewater treatment plant modeling, which serves as a foundational piece for subsequent AI applications in process control and optimization.
A significant number of the top-cited articles are more recent, particularly those published from 2018 onwards, reflecting the rapid advancements and growing interest in contemporary AI techniques. For instance, Arango-Argoty et al. (2018) [33], on the application of deep learning for predicting antibiotic resistance genes (TC = 457), and also (Yaseen Z., 2021) [34] with (TC = 239) and Cheng & Wang (2018) on using deep learning for automated sewer pipe defect detection (TC = 232) showcase the impact of sophisticated AI models in addressing specific wastewater-related challenges. Similarly, publications focusing on machine learning applications, such as Bottero et al., (2011) [35] with (TC = 199) and Li et al. (2020) [25] (TC = 195)for predicting fuel properties of hydrochar/pyrochar, also feature prominently.
The thematic scope of these highly cited papers is broad, encompassing AI applications in water resources planning, prediction of emerging contaminants like antibiotic resistance genes, wastewater process modeling, analysis of microbial communities (Werner et al., 2011) [36], contaminant removal, infrastructure assessment, and resource recovery from waste. This diversity is also mirrored in the publication outlets, with influential articles appearing in specialized journals across water resources management (Journal of Water Resources Planning and Management), microbiology (Microbiome), environmental modeling (Environmental Modelling & Software), general high-impact science (PNAS), hazardous materials (Journal of Hazardous Materials), environmental chemistry (Chemosphere), automation (Automation in Construction), agricultural water management (Agricultural Water Management), and energy (Applied Energy).
3.7 Analysis of Keywords
A systematic examination of author keywords presents the general picture of the research domain's central themes and intellectual foundation. Such an examination reveals its leading research themes and time trends.
3.7.1 Most Frequent Keywords
A preliminary evaluation was conducted to determine the most commonly utilized author keywords, and its results are represented in Fig. 10. The term "machine learning" was identified as the leading term, utilized 562 times, consequently redefining its status as the leading methodology within this research area. Next, the important subject "wastewater treatment" was referred 185 times, followed by the generic concept of "artificial intelligence," which was utilized 183 times. More frequently cited keywords are advanced methods like "deep learning" referred 144 times, and "artificial neural network" referred 87 times, in addition to key operational terms like "optimization," referred 84 times, and "prediction," which was referred 64 times.
To illustrate the value and hierarchical structure of these topics, a treemap was generated from the co-occurrence frequencies of keyword bigrams from the abstracts of research articles (as shown in Fig. 11). The treemap highlights significantly the prevalence of "machine learning," which occupies the largest proportional area, sharing 21% of all areas. Next in ranking are the application areas of "wastewater treatment" (7%) and the novel method "deep learning" (5%). The treemap accurately distinguishes the major sub-topic constituents embedded within each broader category; for example, the "wastewater" category contains particular applications and media like "adsorption" and "water quality," while the AI category contains different methodologies like "artificial neural network" and "prediction," thus revealing the subtle hierarchy of valuable research emphases in the area.
3.7.2 Keyword Co-occurrence Network
To examine the relationships and intellectual associations between different concepts, a keyword co-occurrence network was created, as shown in Fig. 12. Keywords are represented as nodes in this network, and the edges between them indicate their simultaneous occurrences in the literature, revealing the structural basis of the research. Network analysis produced three distinct, strongly connected clusters.
The first cluster, indicated in green, refers to "machine learning" and its collection of algorithms and models, like "support vector," "random forest," and "decision tree," and thus reveals its methodological nature. The second cluster, colored in red, corresponds to "wastewater treatment" and related concepts like "activated sludge," "water quality," and "oxygen demand," and thus reveals its area of interest. The third cluster, marked in blue, concerns "neural network" and "artificial intelligence," which are important connectors between methodology-centered and application-based disparate clusters. The high number of interlinks between central nodes from all three clusters, that is, "machine learning," "wastewater treatment," and "neural network," reveal this research area's fundamentally interdisciplinary nature.
3.8 Thematic Map
Thematic mapping was employed to position research themes based on two criteria, centrality (relevance to the whole field) and density (how developed the theme is in the field) as shown in Fig. 13 below. The four quadrants of this thematic map visually divide the research landscape and provide an overview of the status and role of research themes within that landscape.
-
Motor Themes (Upper-Right Quadrant): These include elements that are highly developed and were identified as important to the research field such as "neural network", "artificial intelligence", and "support vector". These developments point to established and important themes necessary for the advancement of the discipline.
-
Basic Themes (Lower-Right Quadrant): This is where core aspects for the research field exist that have importance but plenty of room for growth. The core themes of "machine learning", "wastewater treatment", and "treatment plant" are located here, demonstrating high centrality and a place for future research.
-
Niche Themes (Upper-Left Quadrant): These are highly developed but specialized elements with less impact on the field. Things like "artificial neural network ann" and "removal efficiency" appeared here, locating them as mature subjects of specific research streams.
-
Emerging or Declining Themes (Lower-Left Quadrant): Themes located in this quadrant have low development and centrality. Things like "random forest", "gradient boosting", and "decision support system" are located here. These could either signify emerging niche methods that have not yet become more centrally located or they indicate older paradigms that are being replaced by the more comprehensive "machine learning" paradigm.
3.9 Thematic Evolution
In order to evaluate the shifts in thematic emphasis throughout time, a thematic change analysis was performed to consider the periods of 1987–2015 and 2016–2024. Figure 14 depicts a Sankey diagram to illustrate the changes in research themes from one period to the next.
In the first period (1987–2015), the first authors focused their research on fundamental themes like "wastewater treatment," "artificial neural network," "optimization," and "decision support systems." These inventions formed a foundation on the initial work undertaken using computational intelligence to further wastewater treatment research projects.
The second period (2016–2024) indicates that research themes have been significantly redeveloped and consolidated. The flow chart indicates that many of those previous themes, such as "artificial neural networks" and "optimization," have now been encompassed under the broader and growing theme of "machine learning." What this conveys is an advancement in what is now expected as the encompassing framework. In addition, it is noteworthy that there remains the emergence of "deep learning" as a significant independent theme, which has evolved from "artificial neural networks." Further, it is evident that this evolutionary chart exhibited a scientific trajectory from discrete models of computational intelligence toward encompassing platforms of machine learning, to a rise and evolution of complex models of deep learning.