In this paper, we introduce a new methodology for the evaluation of alternative algorithms in capturing the deep statistical structure of datasets of different types and nature, called MST Fitness, and based on the notion of Minimum Spanning Tree (MST). We test this methodology on six different databases, some of which artificial and widely used in similar experimentations, and some related to real world phenomena. Our test set consists of eight different algorithms, including some widely known and used, such as Principal Component Analysis, Linear Correlation, or Euclidean Distance. We moreover consider more sophisticated Artificial Neural Network based algorithms, such as the Self-Organizing Map (SOM) and a relatively new algorithm called Auto-Contractive Map (AutoCM). We find that, for our benchmark of datasets, AutoCM performs consistently better than all other algorithms for all of the datasets, and that its global performance is superior to that of the others of several orders of magnitude. It is to be checked in future research if AutoCM can be considered a truly general-purpose algorithm for the analysis of heterogeneous categories of datasets.

MST Fitness Index and implicit data narratives: A comparative test on alternative unsupervised algorithms

P. Sacco
2016

Abstract

In this paper, we introduce a new methodology for the evaluation of alternative algorithms in capturing the deep statistical structure of datasets of different types and nature, called MST Fitness, and based on the notion of Minimum Spanning Tree (MST). We test this methodology on six different databases, some of which artificial and widely used in similar experimentations, and some related to real world phenomena. Our test set consists of eight different algorithms, including some widely known and used, such as Principal Component Analysis, Linear Correlation, or Euclidean Distance. We moreover consider more sophisticated Artificial Neural Network based algorithms, such as the Self-Organizing Map (SOM) and a relatively new algorithm called Auto-Contractive Map (AutoCM). We find that, for our benchmark of datasets, AutoCM performs consistently better than all other algorithms for all of the datasets, and that its global performance is superior to that of the others of several orders of magnitude. It is to be checked in future research if AutoCM can be considered a truly general-purpose algorithm for the analysis of heterogeneous categories of datasets.
File in questo prodotto:
File Dimensione Formato  
Physica_A_2016.pdf

non disponibili

Tipologia: Documento in Post-print
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 4.44 MB
Formato Adobe PDF
4.44 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/313497
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact