Hepatitis C is an infectious disease that affects more than 70 million people worldwide, even killing 400 thousand of them annually. To better understand this disease and its prognosis, medical doctors can take advantage of the electronic health records (EHRs) of patients, which contain data that computer-based approaches built on statistics and computational intelligence can process to unveil new discoveries and trends otherwise unnoticeable by physicians. In this study, we analyze EHRs of 540 healthy controls and 75 patients diagnosed with hepatitis C, and use machine learning classifiers to predict their diagnosis. We employ the top classifier (Random Forests) to detect the most diagnostic variables for hepatitis C, that result being aspartate aminotransferase (AST) and alanine aminotransferase (ALT). These two enzyme levels are also employed by physicians in the AST/ALT ratio, a traditional measure commonly employed in gastroenterology and hepatology. We apply the same approach to a validation dataset of 123 patients with hepatitis C and cirrhosis, and the same two variables arose as most relevant. We therefore compared our approach with the AST/ALT ratio, and noticed that our two-features ensemble learning model outperforms the traditional AST/ALT ratio on both datasets. Our results confirm the usefulness of ensemble machine learning for hepatitis C and cirrhosis diagnosis prediction. Moreover, our discoveries can have an impact on clinical practice, helping physicians predict diagnoses of patients at risk of hepatitis C and cirrhosis more precisely.

An ensemble learning approach for enhanced classification of patients with hepatitis and cirrhosis

Jurman, Giuseppe
2021-01-01

Abstract

Hepatitis C is an infectious disease that affects more than 70 million people worldwide, even killing 400 thousand of them annually. To better understand this disease and its prognosis, medical doctors can take advantage of the electronic health records (EHRs) of patients, which contain data that computer-based approaches built on statistics and computational intelligence can process to unveil new discoveries and trends otherwise unnoticeable by physicians. In this study, we analyze EHRs of 540 healthy controls and 75 patients diagnosed with hepatitis C, and use machine learning classifiers to predict their diagnosis. We employ the top classifier (Random Forests) to detect the most diagnostic variables for hepatitis C, that result being aspartate aminotransferase (AST) and alanine aminotransferase (ALT). These two enzyme levels are also employed by physicians in the AST/ALT ratio, a traditional measure commonly employed in gastroenterology and hepatology. We apply the same approach to a validation dataset of 123 patients with hepatitis C and cirrhosis, and the same two variables arose as most relevant. We therefore compared our approach with the AST/ALT ratio, and noticed that our two-features ensemble learning model outperforms the traditional AST/ALT ratio on both datasets. Our results confirm the usefulness of ensemble machine learning for hepatitis C and cirrhosis diagnosis prediction. Moreover, our discoveries can have an impact on clinical practice, helping physicians predict diagnoses of patients at risk of hepatitis C and cirrhosis more precisely.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/324586
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact