Clustering ensemble has become a very popular technique in the past few years due to its potentialities for improving the clustering results. Roughly speaking it consists in the combination of different partitions of the same set of objects in order to obtain a consensus one. A common way of defining the consensus partition is as the solution of the median partition problem. In this way, the consensus partition is defined as the solution of a complex optimization problem. In this paper, we study possible prunes of the search space for this optimization problem. Particularly, we introduce a new prune that allows a dramatic reduction of the search space. We also provide a characterization of the family of dissimilarity measures that can be used to take advantage of this prune and we present two measures that fit into this family. We carry out an experimental study on synthetic data by comparing, under different circumstances, the size of the original search space and the size after the proposed prunes. Outstanding reductions are obtained, which can be beneficial for the development of clustering ensemble algorithms. We also compare, on real data, the behavior of a simulated annealing-based ensemble algorithm in the original partition space and in the two proposed pruned spaces. In all cases, the proposed prunes allow the algorithm to find solutions closer to the theoretical optimum.
On Pruning the Search Space for Clustering Ensemble Problems
Vega Pons, Sandro;Avesani, Paolo
2015-01-01
Abstract
Clustering ensemble has become a very popular technique in the past few years due to its potentialities for improving the clustering results. Roughly speaking it consists in the combination of different partitions of the same set of objects in order to obtain a consensus one. A common way of defining the consensus partition is as the solution of the median partition problem. In this way, the consensus partition is defined as the solution of a complex optimization problem. In this paper, we study possible prunes of the search space for this optimization problem. Particularly, we introduce a new prune that allows a dramatic reduction of the search space. We also provide a characterization of the family of dissimilarity measures that can be used to take advantage of this prune and we present two measures that fit into this family. We carry out an experimental study on synthetic data by comparing, under different circumstances, the size of the original search space and the size after the proposed prunes. Outstanding reductions are obtained, which can be beneficial for the development of clustering ensemble algorithms. We also compare, on real data, the behavior of a simulated annealing-based ensemble algorithm in the original partition space and in the two proposed pruned spaces. In all cases, the proposed prunes allow the algorithm to find solutions closer to the theoretical optimum.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.