NII Technical Report (NII-2009-018E)

Title	Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?
Authors	Michael E. Houle, Hans-Peter Kriegel, Peer Kroeger, Erich Schubert and Arthur Zimek
Abstract	Similarity measures based on distances are usually more or less sensitive to variations within a data distribution, or the dimensionality of a data space. The effects of the notorious `curse of dimensionality' have been studied in data generated by one single mechanism. In this paper, we study the effects of this phenomenon on different similarity measures in the presence of several data distributions as a setting relevant to many data mining, indexing, or similarity search applications. In particular, we assess the performance of shared-neighbor similarity measures, which are secondary similarity measures based on the rankings of data objects induced by some primary distance measure. Our findings are that the use of rank-based similarity measures can result in more stable performance than their associated primary distance measures.
Language	English
Published	Dec 24, 2009
Pages	28p