In application fields such as linguistic and computer vision there is an increasing need of reference data for the empirical analysis of new methods and the assessment of different algorithms. Current evaluations are based on few real-life collections or on artificial data generators built on models that are too simplistic to cover real scenarios and to allow researchers to identify crucial limitations of their algorithms. We propose a flexible approach to generate high-dimensional vectors, with directional properties controlled by the distribution of their pair-wise cosine distances. The generation method is formulated as a non-linear continuous optimization problem, which is solved with a computationally efficient local search algorithm. We show with an empirical study that our approach can create large high-dimensional data collections with desired properties in reasonable time.
A local search approach for generating directional data
Turchi, Marco;
2016-01-01
Abstract
In application fields such as linguistic and computer vision there is an increasing need of reference data for the empirical analysis of new methods and the assessment of different algorithms. Current evaluations are based on few real-life collections or on artificial data generators built on models that are too simplistic to cover real scenarios and to allow researchers to identify crucial limitations of their algorithms. We propose a flexible approach to generate high-dimensional vectors, with directional properties controlled by the distribution of their pair-wise cosine distances. The generation method is formulated as a non-linear continuous optimization problem, which is solved with a computationally efficient local search algorithm. We show with an empirical study that our approach can create large high-dimensional data collections with desired properties in reasonable time.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.