IRIS Institutional Research Information System

In this paper, we introduce a new methodology for the evaluation of alternative algorithms in capturing the deep statistical structure of datasets of different types and nature, called MST Fitness, and based on the notion of Minimum Spanning Tree (MST). We test this methodology on six different databases, some of which artificial and widely used in similar experimentations, and some related to real world phenomena. Our test set consists of eight different algorithms, including some widely known and used, such as Principal Component Analysis, Linear Correlation, or Euclidean Distance. We moreover consider more sophisticated Artificial Neural Network based algorithms, such as the Self-Organizing Map (SOM) and a relatively new algorithm called Auto-Contractive Map (AutoCM). We find that, for our benchmark of datasets, AutoCM performs consistently better than all other algorithms for all of the datasets, and that its global performance is superior to that of the others of several orders of magnitude. It is to be checked in future research if AutoCM can be considered a truly general-purpose algorithm for the analysis of heterogeneous categories of datasets.

MST Fitness Index and implicit data narratives: A comparative test on alternative unsupervised algorithms

M. Buscema;P. Sacco

2016-01-01

Abstract

In this paper, we introduce a new methodology for the evaluation of alternative algorithms in capturing the deep statistical structure of datasets of different types and nature, called MST Fitness, and based on the notion of Minimum Spanning Tree (MST). We test this methodology on six different databases, some of which artificial and widely used in similar experimentations, and some related to real world phenomena. Our test set consists of eight different algorithms, including some widely known and used, such as Principal Component Analysis, Linear Correlation, or Euclidean Distance. We moreover consider more sophisticated Artificial Neural Network based algorithms, such as the Self-Organizing Map (SOM) and a relatively new algorithm called Auto-Contractive Map (AutoCM). We find that, for our benchmark of datasets, AutoCM performs consistently better than all other algorithms for all of the datasets, and that its global performance is superior to that of the others of several orders of magnitude. It is to be checked in future research if AutoCM can be considered a truly general-purpose algorithm for the analysis of heterogeneous categories of datasets.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2016

Appare nelle tipologie:

1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Physica_A_2016.pdf non disponibili Tipologia: Documento in Post-print Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 4.44 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	4.44 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/313497

Citazioni

ND

social impact