Online water quality monitoring typically relies on sensors deployed at multiple sites to collect various indicators—such as pH, dissolved oxygen, nitrogen, phosphorus, permanganate index, and chlorophyll—over the long term and at high frequency. However, raw data often suffer from significant gaps and noise interference, leading to poor data quality. Sensors may halt measurements due to equipment failure, routine maintenance, calibration, or communication interruptions, resulting in extensive gaps in data sequences. Additionally, they are susceptible to biofouling, extreme weather, and human interference, which can generate anomalous data or random noise that significantly deviates from true values. These issues severely constrain the accuracy of data analysis and pose major challenges for water quality assessment, pollution source tracking, and prediction and early warning systems. Therefore, developing high-precision data restoration technology is particularly crucial.
Water quality data inherently possess three dimensions—time, space, and indicators—which can be represented as a three-dimensional tensor structure in information space. Traditional statistical imputation methods struggle to effectively leverage these multidimensional correlations. In contrast, tensor factorization models can break down the three-dimensional tensor into a product of low-rank matrices (core factors), extracting temporal variation patterns, spatial distribution patterns, and indicator correlation patterns, respectively. This enables intelligent imputation of missing values.
To more accurately capture the intrinsic temporal fluctuation characteristics of different water quality indicators, the Chongqing Institute of Green and Intelligent Technology innovatively integrated tensor factorization method with bias correction and intelligent optimization algorithms, proposing a diverse biases-integrated adaptive latent-factorization-of-tensors model (DBAL) and a diversified biased nonnegative tensor factorization ensemble model (DBNE). These models were applied and validated using online water quality monitoring data from Lake Dianchi, a plateau lake in Yunnan Province.
The self-developed models achieved several methodological breakthroughs: by imposing non-negative constraints on the indicators, the restored water quality parameters are ensured to align with physical reality; the incorporation of multiple mechanisms, such as single linear bias, preprocessing bias, and time-varying perception bias, effectively captures both the seasonal patterns of long-term variations and the short-term fluctuation characteristics of actual indicators; the introduction of a differential evolution algorithm enables adaptive optimization of model hyperparameters, significantly improving tuning efficiency. Experimental results demonstrate that under various scenarios, including random missingness (missing rate: 20%–80%) and continuous missingness (missing duration: 1–4 weeks), the self-developed models exhibit outstanding overall imputation accuracy for multiple water quality indicators, with the Nash-Sutcliffe efficiency coefficient (NSE) exceeding 0.90, and the root mean square error (RMSE) and mean absolute error (MAE) are significantly superior to existing state-of-the-art models. Additionally, the models demonstrate high operational efficiency, processing the entire dataset in under 5 minutes, meeting the requirements for real-time water quality data restoration in practical applications.
Fig. Application of Enhanced Tensor Decomposition Model in Online Water Quality Monitoring of Lake Dianchi, a Plateau Lake in Yunnan Province
The "tensor factorization – multiple bias correction" framework proposed by the research institute has strong universality and transferability. It not only can restore the time-series data of water quality, but can also be widely applied in fields such as hydrology and water resources, air pollution, soil environment and ecological quality assessment, effectively reconstructing the missing data of various complex environmental factors.
The related findings have been published in Environmental Modelling & Software and Ecological Informatics, leading journals in the field of ecological environment modeling. The first author of the papers is WU Xuke, a doctoral candidate jointly trained by the Chongqing Institute of Green and Intelligent Technology and Chongqing University of Posts and Telecommunications. The corresponding author is Professor Shan Kun. The research received support from various funding programs, including the National Natural Science Foundation of China and the Yunnan Provincial-Municipal Integration Project.
Links to related papers:
1. Xuke Wu#, Kun Shan*, Friedrich Recknagel, Lan Wang, Mingsheng Shang. Enhanced tensor factorization for spatiotemporal imputation of high-frequency water quality monitoring data. Environmental Modelling and Software, 2025, 193, 106667.
https://doi.org/10.1016/j.envsoft.2025.106667
2. Xuke Wu#, Kun Shan*, Lan Wang, Jingkai Wang, Mingsheng Shang. Spatiotemporal water quality data reconstruction: A tensor factorization framework. Ecological Informatics, 2025, 90, 103283.
https://doi.org/10.1016/j.ecolinf.2025.103283