Screening for colorectal cancer (CRC) continues to rely on colonoscopy and/or fecal occult blood testing since other (non-invasive) risk-stratification systems have not yet been implemented into European guidelines. In this study, we evaluate the potential of machine learning (ML) methods to predict advanced adenomas (AAs) in 5862 individuals participating in a screening program for colorectal cancer. Adenomas were diagnosed histologically with an AA being ≥ 1 cm in size or with high-grade dysplasia/villous features being present. Logistic regression (LR) and extreme gradient boosting (XGBoost) algorithms were evaluated for AA prediction. The mean age was 58.7 ± 9.7 years with 2811 males (48.0%), 1404 (24.0%) of whom suffered from obesity (BMI ≥ 30 kg/m²), 871 (14.9%) from diabetes, and 2095 (39.1%) from metabolic syndrome. An adenoma was detected in 1884 (32.1%), as well as AAs in 437 (7.5%). Modelling 36 laboratory parameters, eight clinical parameters, and data on eight food types/dietary patterns, moderate accuracy in predicting AAs with XGBoost and LR (AUC-ROC of 0.65–0.68) could be achieved. Limiting variables to established risk factors for AAs did not significantly improve performance. Moreover, subgroup analyses in subjects without genetic predispositions, in individuals aged 45–80 years, or in gender-specific analyses showed similar results. In conclusion, ML based on point-prevalence laboratory and clinical information does not accurately predict AAs.

Machine Learning Models Cannot Replace Screening Colonoscopy for the Prediction of Advanced Colorectal Adenoma

Mamandipoor, Behrooz;Osmani, Venet
2021-01-01

Abstract

Screening for colorectal cancer (CRC) continues to rely on colonoscopy and/or fecal occult blood testing since other (non-invasive) risk-stratification systems have not yet been implemented into European guidelines. In this study, we evaluate the potential of machine learning (ML) methods to predict advanced adenomas (AAs) in 5862 individuals participating in a screening program for colorectal cancer. Adenomas were diagnosed histologically with an AA being ≥ 1 cm in size or with high-grade dysplasia/villous features being present. Logistic regression (LR) and extreme gradient boosting (XGBoost) algorithms were evaluated for AA prediction. The mean age was 58.7 ± 9.7 years with 2811 males (48.0%), 1404 (24.0%) of whom suffered from obesity (BMI ≥ 30 kg/m²), 871 (14.9%) from diabetes, and 2095 (39.1%) from metabolic syndrome. An adenoma was detected in 1884 (32.1%), as well as AAs in 437 (7.5%). Modelling 36 laboratory parameters, eight clinical parameters, and data on eight food types/dietary patterns, moderate accuracy in predicting AAs with XGBoost and LR (AUC-ROC of 0.65–0.68) could be achieved. Limiting variables to established risk factors for AAs did not significantly improve performance. Moreover, subgroup analyses in subjects without genetic predispositions, in individuals aged 45–80 years, or in gender-specific analyses showed similar results. In conclusion, ML based on point-prevalence laboratory and clinical information does not accurately predict AAs.
File in questo prodotto:
File Dimensione Formato  
Machine_Learning_Models_Cannot_Replace_Screening_Colonoscopy_for_the_Prediction_of_Advanced_Colorectal_Adenoma.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: Creative commons
Dimensione 2.43 MB
Formato Adobe PDF
2.43 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/328386
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact