Regression-based models are widely used for retrieving water quality parameters from optical imagery. However, developing robust and accurate models in inland and nearshore coastal waters remains challenging, particularly when transferring the models in space or time. This study builds upon a machine learning regression model called extreme gradient boosting (XGBoost) to retrieve total suspended matter (TSM) concentration in optically-complex waters. XGBoost is an ensemble of decision tree models that benefits from a boosting mechanism to compensate for the prediction errors by adding more trees. We employ the trending XGBoost method for the first time for TSM retrieval. The dark spectrum fitting (DSF) atmospheric correction method is first performed on multitemporal Sentinel-2 imagery of our study area in San Francisco Bay. Then, the XGBoost-based model is trained considering samples distributed in space and time. For training, the atmospherically-corrected spectral bands of Sentinel-2 at the visible and near-infrared portion of the spectrum are used, along with the collocated in-situ measurements of TSM. We examine the temporal transferability of the proposed model by retrieving TSM for images acquired after the training period. The results are assessed based on independent in-situ matchups (70 samples). Moreover, we compare the TSM estimates with a standard optimal band ratio analysis (OBRA) model. The in-situ matchup analysis indicates a high potential of XGBoost in providing temporally robust retrievals of TSM (R2 ≈ 0.77; RMSE ≈ 6 g/m3 for estimates up to 70 g/m3). On the contrary, OBRA provides poor results when transferring the model in time. Moreover, the XGBoost demonstrated to be robust to sun-glint effects.
Extreme gradient boosting machine learning for total suspended matter (TSM) retrieval from Sentinel-2 imagery
Niroumand-Jadidi, Milad
;Bovolo, Francesca
2022-01-01
Abstract
Regression-based models are widely used for retrieving water quality parameters from optical imagery. However, developing robust and accurate models in inland and nearshore coastal waters remains challenging, particularly when transferring the models in space or time. This study builds upon a machine learning regression model called extreme gradient boosting (XGBoost) to retrieve total suspended matter (TSM) concentration in optically-complex waters. XGBoost is an ensemble of decision tree models that benefits from a boosting mechanism to compensate for the prediction errors by adding more trees. We employ the trending XGBoost method for the first time for TSM retrieval. The dark spectrum fitting (DSF) atmospheric correction method is first performed on multitemporal Sentinel-2 imagery of our study area in San Francisco Bay. Then, the XGBoost-based model is trained considering samples distributed in space and time. For training, the atmospherically-corrected spectral bands of Sentinel-2 at the visible and near-infrared portion of the spectrum are used, along with the collocated in-situ measurements of TSM. We examine the temporal transferability of the proposed model by retrieving TSM for images acquired after the training period. The results are assessed based on independent in-situ matchups (70 samples). Moreover, we compare the TSM estimates with a standard optimal band ratio analysis (OBRA) model. The in-situ matchup analysis indicates a high potential of XGBoost in providing temporally robust retrievals of TSM (R2 ≈ 0.77; RMSE ≈ 6 g/m3 for estimates up to 70 g/m3). On the contrary, OBRA provides poor results when transferring the model in time. Moreover, the XGBoost demonstrated to be robust to sun-glint effects.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.