Regression-based models are widely used for retrieving water quality parameters from optical imagery. However, developing robust and accurate models in inland and nearshore coastal waters remains challenging, particularly when transferring the models in space or time. This study builds upon a machine learning regression model called extreme gradient boosting (XGBoost) to retrieve total suspended matter (TSM) concentration in optically-complex waters. XGBoost is an ensemble of decision tree models that benefits from a boosting mechanism to compensate for the prediction errors by adding more trees. We employ the trending XGBoost method for the first time for TSM retrieval. The dark spectrum fitting (DSF) atmospheric correction method is first performed on multitemporal Sentinel-2 imagery of our study area in San Francisco Bay. Then, the XGBoost-based model is trained considering samples distributed in space and time. For training, the atmospherically-corrected spectral bands of Sentinel-2 at the visible and near-infrared portion of the spectrum are used, along with the collocated in-situ measurements of TSM. We examine the temporal transferability of the proposed model by retrieving TSM for images acquired after the training period. The results are assessed based on independent in-situ matchups (70 samples). Moreover, we compare the TSM estimates with a standard optimal band ratio analysis (OBRA) model. The in-situ matchup analysis indicates a high potential of XGBoost in providing temporally robust retrievals of TSM (R2 ≈ 0.77; RMSE ≈ 6 g/m3 for estimates up to 70 g/m3). On the contrary, OBRA provides poor results when transferring the model in time. Moreover, the XGBoost demonstrated to be robust to sun-glint effects.

Extreme gradient boosting machine learning for total suspended matter (TSM) retrieval from Sentinel-2 imagery

Niroumand-Jadidi, Milad
;
Bovolo, Francesca
2022-01-01

Abstract

Regression-based models are widely used for retrieving water quality parameters from optical imagery. However, developing robust and accurate models in inland and nearshore coastal waters remains challenging, particularly when transferring the models in space or time. This study builds upon a machine learning regression model called extreme gradient boosting (XGBoost) to retrieve total suspended matter (TSM) concentration in optically-complex waters. XGBoost is an ensemble of decision tree models that benefits from a boosting mechanism to compensate for the prediction errors by adding more trees. We employ the trending XGBoost method for the first time for TSM retrieval. The dark spectrum fitting (DSF) atmospheric correction method is first performed on multitemporal Sentinel-2 imagery of our study area in San Francisco Bay. Then, the XGBoost-based model is trained considering samples distributed in space and time. For training, the atmospherically-corrected spectral bands of Sentinel-2 at the visible and near-infrared portion of the spectrum are used, along with the collocated in-situ measurements of TSM. We examine the temporal transferability of the proposed model by retrieving TSM for images acquired after the training period. The results are assessed based on independent in-situ matchups (70 samples). Moreover, we compare the TSM estimates with a standard optimal band ratio analysis (OBRA) model. The in-situ matchup analysis indicates a high potential of XGBoost in providing temporally robust retrievals of TSM (R2 ≈ 0.77; RMSE ≈ 6 g/m3 for estimates up to 70 g/m3). On the contrary, OBRA provides poor results when transferring the model in time. Moreover, the XGBoost demonstrated to be robust to sun-glint effects.
2022
9781510655294
9781510655300
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/335408
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact