NII Technical Report (NII-2012-004E)

Title Rank-Based Similarity Search: Reducing the Dimensional Dependence
Authors Michael E. Houle and Michael Nett
Abstract This paper introduces a probabilistic data structure for k-NN search, the rank cover tree (RCT), that entirely avoids the use of constraints involving similarity values. All internal selections are made according to the ranks of the objects with respect to the query, allowing much tighter control on the overall execution costs, an important consideration for data mining applications. A formal theoretical analysis shows that with very high probability, the RCT returns a correct query result in time that depends very competitively on a measure of the intrinsic dimensionality of the data set. The experimental results for the RCT show that purely rank-based methods for similarity search can be practical even when the representational dimension of the data is extremely high. They also show that the RCT is capable of meeting or exceeding the level of performance of state-of-the-art methods that make use of explicit similarity constraints.
Language English
Published Jun 18, 2012
Pages 15p

NII Technical Reports
National Institute of Informatics