In this paper, we introduce a new methodology for the evaluation of alternative algorithms in capturing the deep statistical structure of datasets of different types and nature, called MST Fitness, and based on the notion of Minimum Spanning Tree (MST). We test this methodology on six different databases, some of which artificial and widely used in similar experimentations, and some related to real world phenomena. Our test set consists of eight different algorithms, including some widely known and used, such as Principal Component Analysis, Linear Correlation, or Euclidean Distance. We moreover consider more sophisticated Artificial Neural Network based algorithms, such as the Self-Organizing Map (SOM) and a relatively new algorithm called Auto-Contractive Map (AutoCM). We find that, for our benchmark of datasets, AutoCM performs consistently better than all other algorithms for all of the datasets, and that its global performance is superior to that of the others of several orders of magnitude. It is to be checked in future research if AutoCM can be considered a truly general-purpose algorithm for the analysis of heterogeneous categories of datasets.
MST Fitness Index and implicit data narratives: A comparative test on alternative unsupervised algorithms
P. Sacco
2016-01-01
Abstract
In this paper, we introduce a new methodology for the evaluation of alternative algorithms in capturing the deep statistical structure of datasets of different types and nature, called MST Fitness, and based on the notion of Minimum Spanning Tree (MST). We test this methodology on six different databases, some of which artificial and widely used in similar experimentations, and some related to real world phenomena. Our test set consists of eight different algorithms, including some widely known and used, such as Principal Component Analysis, Linear Correlation, or Euclidean Distance. We moreover consider more sophisticated Artificial Neural Network based algorithms, such as the Self-Organizing Map (SOM) and a relatively new algorithm called Auto-Contractive Map (AutoCM). We find that, for our benchmark of datasets, AutoCM performs consistently better than all other algorithms for all of the datasets, and that its global performance is superior to that of the others of several orders of magnitude. It is to be checked in future research if AutoCM can be considered a truly general-purpose algorithm for the analysis of heterogeneous categories of datasets.File | Dimensione | Formato | |
---|---|---|---|
Physica_A_2016.pdf
non disponibili
Tipologia:
Documento in Post-print
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
4.44 MB
Formato
Adobe PDF
|
4.44 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.