NII Technical Report (NII-2009-018E)

Title Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?
Authors Michael E. Houle, Hans-Peter Kriegel, Peer Kroeger, Erich Schubert and Arthur Zimek
Abstract Similarity measures based on distances are usually more or less sensitive to variations within a data distribution, or the dimensionality of a data space. The effects of the notorious `curse of dimensionality' have been studied in data generated by one single mechanism. In this paper, we study the effects of this phenomenon on different similarity measures in the presence of several data distributions as a setting relevant to many data mining, indexing, or similarity search applications. In particular, we assess the performance of shared-neighbor similarity measures, which are secondary similarity measures based on the rankings of data objects induced by some primary distance measure. Our findings are that the use of rank-based similarity measures can result in more stable performance than their associated primary distance measures.
Language English
Published Dec 24, 2009
Pages 28p

NII Technical Reports
National Institute of Informatics