EVENT

Event News

NII Seminar Series on Dimensionality and Scalability VII

This series of seminars - launched at an NII Shonan Meeting held in May 2013 - explores the issues of dimensionality and scalability in the context of such application areas as atabases, data mining, machine learning and multimedia. All are welcome to attend.

Date:

March 13 (Tuesday)

Time:

10:30-12:30
14:00-17:00

Place:

National Institute of Informatics
Room 2009-2010, 20th floor

Schedule:
  • 10:30-11:10
    Local Intrinsic Dimensionality I: An Extreme-Value-Theoretic Foundation for Similarity Applications
    Michael E. Houle, NII, Japan
  • 11:10-11:50
    Local Intrinsic Dimensionality II: Multivariate Analysis and Distributional Support
    Michael E. Houle, NII, Japan
  • 11:50-12:30
    Good and Bad Neighborhood Approximations for Outlier Detection Ensembles
    Arthur Zimek, University of Southern Denmark (SDU), Denmark
  • Lunch Break
  • 14:00-14:40
    Improving k-NN Graph Accuracy Using Local Intrinsic Dimensionality
    Vincent Oria, New Jersey Institute of Technology (NJIT), USA
  • 14:40-15:20
    NN-Descent on High-Dimensional Data
    Milos Radovanovic, University of Novi Sad, Serbia
  • Break
  • 15:40-16:20
    Dimensionality and Adversarial Subspaces
    Daniel Ma, The University of Melbourne, Australia
  • 16:20-17:00
    Adversarial Generation of Real-Time Feedback with Neural Networks for Simulation-Based Training
    James Bailey, The University of Melbourne, Australia
List of Abstracts:
Presenter:

Michael E. Houle
Visiting Professor
National Institute of Informatics, Japan

Title:

Local Intrinsic Dimensionality I: An Extreme-Value-Theoretic Foundation for Similarity Applications

Abstract:

Researchers have long considered the analysis of similarity applications in terms of the intrinsic dimensionality (ID) of the data. This theoretical presentation is concerned with a generalization of a discrete measure of ID, the expansion dimension, to the case of smooth functions in general, and distance distributions in particular. A local model of the ID of smooth functions is first proposed and then explained within the well-established statistical framework of extreme value theory (EVT).

Moreover, it is shown that under appropriate smoothness conditions, the cumulative distribution function of a distance distribution can be completely characterized by an equivalent notion of data discriminability. As the local ID model makes no assumptions on the nature of the function (or distribution) other than continuous differentiability, its extreme generality makes it ideally suited for the non-parametric or unsupervised learning tasks that often arise in similarity applications. An extension of the local ID model is also provided that allows the local assessment of the rate of change of function growth, which is then shown to have potential implications for the detection of inliers and outliers.

Presenter:

Michael E. Houle
Visiting Professor
National Institute of Informatics, Japan

Title:

Local Intrinsic Dimensionality II: Multivariate Analysis and Distributional Support

Abstract:

Distance-based expansion models of intrinsic dimensionality have had recent application in the analysis of complexity of similarity applications, and in the design of efficient heuristics. This theoretical presentation extends one such model, the local intrinsic dimension (LID), to a multivariate form that can account for the contributions of different distributional components towards the intrinsic dimensionality of the entire feature set, or equivalently towards the discriminability of distance measures defined in terms of these feature combinations. Formulas are established for the effect on LID under summation, product, composition, and convolution operations on smooth functions in general, and cumulative distribution functions in particular. For some of these operations, the dimensional or discriminability characteristics of the result are also shown to depend on a form of distributional support. As an example, an analysis is provided that quantifies the impact of introduced random Gaussian noise on the intrinsic dimension of data. Finally, a theoretical relationship is established between the LID model and the classical correlation dimension.

Presenter:

Arthur Zimek
Associate Professor
Department of Mathematics and Computer Science (IMADA)
University of Southern Denmark (SDU), Denmark

Title:

Good and Bad Neighborhood Approximations for Outlier Detection Ensembles

Abstract:

Outlier detection methods have used approximate neighborhoods in filter-refinement approaches. Outlier detection ensembles have used artificially obfuscated neighborhoods to achieve diverse ensemble members.

Here we argue that outlier detection models could be based on approximate neighborhoods in the first place, thus gaining in both efficiency and effectiveness. It depends, however, on the type of approximation, as only some seem beneficial for the task of outlier detection, while no (large) benefit can be seen for others. In particular, we argue that space-filling curves are beneficial approximations, as they have a stronger tendency to underestimate the density in sparse regions than in dense regions.

In comparison, LSH and NN-Descent do not have such a tendency and do not seem to be beneficial for the construction of outlier detection ensembles.

Presenter:

Vincent Oria
Professor
Department of Computer Science
New Jersey Institute of Technology (NJIT), USA

Title:

Improving k-NN Graph Accuracy Using Local Intrinsic Dimensionality

Abstract:

The k-nearest neighbor (k-NN) graph is an important data structure for many data mining and machine learning applications. The accuracy of k-NN graphs depends on the object feature vectors, which are usually represented in high-dimensional spaces. Selecting the most important features is essential for providing compact object representations and for improving the graph accuracy. Having a compact feature vector can reduce the storage space and the computational complexity of search and learning tasks. In this presentation, we propose NNWID-Descent, a similarity graph construction method that utilizes the NNF-Descent framework while integrating a new feature selection criterion, Support Weighted Intrinsic Dimensionality, that estimates the contribution of each feature to the overall intrinsic dimensionality.

Through extensive experiments on various datasets, we show that NNWID-Descent allows a significant amount of local feature vector parsification while still preserving a reasonable level of graph accuracy.

Presenter:

Milos Radovanovic
Associate Professor
Department of Mathematics and Informatics, Faculty of Science
University of Novi Sad, Serbia

Title:

NN-Descent on High-Dimensional Data

Abstract:

K-nearest neighbor graphs (K-NNGs) are used in many data-mining and machine-learning algorithms. Naive construction of K-NNGs has a complexity of O(n^2), which could be a problem for large-scale data sets. In order to achieve higher efficiency, many exact and approximate algorithms have been developed, including the NN-Descent algorithm of Dong, Charikar and Li.

Empirical evidence suggests that the practical complexity of this algorithm is in O(n^{1.14}), which is a significant improvement over brute force construction. However, NN-Descent has a major drawback --- it produces good results only on data of low intrinsic dimensionality. In this talk we present an experimental analysis of this behavior, and investigate possible solutions. We link the quality of performance of NN-Descent with the phenomenon of hubness, defined as the tendency of intrinsically high-dimensional data to contain hubs --- points with high in-degrees in the K-NNG. We propose approaches to alleviate the observed negative influence of hubs on NN-Descent performance.

Presenter:

James Bailey
Professor
School of Computing and Information Systems,
The University of Melbourne, Australia

Title:

Adversarial Generation of Real-Time Feedback with Neural Networks for Simulation-Based Training

Abstract:

Simulation-based training (SBT) is gaining popularity as a low-cost and convenient training technique in a vast range of applications.

However, for a SBT platform to be fully utilized as an effective training tool, it is essential that feedback on performance is provided
automatically in real time during training. It is the aim of this work to develop an efficient and effective feedback generation method for the provision of real-time feedback in SBT. Existing methods either have low effectiveness in improving novice skills or suffer from low efficiency, resulting in their inability to be used in real-time. In this study, we propose a neural network based method to generate feedback using the adversarial technique. The proposed method utilizes a bounded adversarial update to minimize a L1 regularized loss via back-propagation. We empirically show that proposed method can be used to generate simple, yet effective feedback. Also, it was observed to have high effectiveness and efficiency when compared to existing methods, thus making it a promising option for real-time feedback generation in SBT.

Presenter:

Daniel Ma
PhD Candidate
School of Computing and Information Systems,
The University of Melbourne, Australia

Title:

Dimensionality and Adversarial Subspaces

Abstract:

Deep Neural Networks (DNNs) have recently been shown to be vulnerable against adversarial examples, which are carefully crafted instances that can mislead DNNs to make errors during prediction. To better understand such attacks, a characterization is needed of the properties of regions (the so-called `adversarial subspaces') in which adversarial examples lie. In particular, effective measures are required to discriminate adversarial examples from normal examples in such regions. We tackle this challenge by characterizing the dimensional properties of adversarial regions, via the use of Local Intrinsic Dimensionality (LID). LID assesses the space-filling capability of the region surrounding a reference example, based on the distance distribution of the example to its neighbors. We provide explanations about how adversarial perturbation can affect the LID characteristic of adversarial regions, and then show empirically that LID characteristics can facilitate the detection of adversarial examples generated using the state-of-the-art attacks.

Contact:

Michael Houle <meh [at] nii.ac.jp>

entry2931

SPECIAL