EVENT

Event News

NII Seminar Series on Dimensionality and Scalability VI

This series of seminars - launched at an NII Shonan Meeting held in May 2013 - explores the issues of dimensionality and scalability in the context of such application areas as databases, data mining, machine learning and multimedia. All are welcome to attend.

Outline

Date:: Thursday 25 February 2016
Time:: 10:30-12:30 and 14:30-16:30
Place:: National Institute of Informatics, Room 1208
access

Schedule

10:30-11:30: On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study
Arthur Zimek, LMU Munich, Germany
11:30-12:30: Clustering Evaluation in High-Dimensional Data
Milos Radovanovic, University of Novi Sad, Serbia
(Lunch Break)
14:30-15:30: Why is my Entity Typical or Special? Approaches for Inlying and Outlying Aspects Mining
James Bailey, The University of Melbourne, Australia
15:30-16:30: Advances in Feature Selection and Training Robust Deep Neural Networks
Vinh Nguyen, The University of Melbourne, Australia

List of Abstracts

-----------------

Presenter:: PD Dr. Arthur Zimek
Database and Information Systems Institut für Informatik
Ludwig-Maximilians-Universität München (LMU Munich), Germany
Title:: On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study
Abstract:: The evaluation of unsupervised outlier detection algorithms is a constant challenge in data mining research. Little is known regarding the strengths and weaknesses of different standard outlier detection models, and the impact of parameter choices for these algorithms. The scarcity of appropriate benchmark datasets with ground truth annotation is a significant impediment to the evaluation of outlier methods. Even when labeled datasets are available, their suitability for the outlier detection task is typically unknown. Furthermore, the biases of commonly-used evaluation measures are not fully understood. It is thus difficult to ascertain the extent to which newly-proposed outlier detection methods improve over established methods.

-----------------

Presenter:: Milos Radovanovic
Assistant Professor
Department of Mathematics and Informatics, Faculty of Science University of Novi Sad, Serbia
Title:: Clustering Evaluation in High-Dimensional Data
Abstract:: Clustering evaluation plays an important role in unsupervised learnings systems, as it is often necessary to automatically quantify the quality of generated cluster configurations. This is especially useful for comparing the performance of different clustering algorithms as well as determining the optimal number of clusters in clustering algorithms that do not estimate it internally. Many clustering quality indexes have been proposed over the years and different indexes are used in different contexts. There is no unifying protocol for clustering evaluation, so it is often unclear which quality index to use in which case.
In this talk, we review existing clustering quality measures and evaluate them in the challenging context of high-dimensional data clustering.
High-dimensional data is sparse and distances tend to concentrate, possibly affecting the applicability of various clustering quality indexes.
We analyze the stability and discriminative power of a set of standard clustering quality measures with increasing data dimensionality.
Our evaluation shows that the curse of dimensionality affects different clustering quality indexes in different ways, and that some are to be preferred when determining clustering quality in many dimensions.

-----------------

Presenter:: James Bailey
Professor
School of Computing and Information Systems, The University of Melbourne, Australia
Title:: Why is my Entity Typical or Special? Approaches for Inlying and Outlying Aspects Mining
Abstract:: When investigating an individual entity, we may wish to identify aspects in which it is usual or unusual compared to other entities. We refer to this as the inlying/outlying aspects mining problem and it is important for comparative analysis and answering questions such as "How is this entity special?" or "How does it coincide or differ from other entities?"
Such information could be useful in a disease diagnosis setting (where the individual is a patient) or in an educational setting (where the individual is a student). We discuss possible algorithmic approaches to this task - an approach based on feature selection, an approach based on density estimation and a hybrid framework. We also investigate the scalability and effectiveness of these different approaches.

-----------------

Presenter:: Vinh Nguyen
Research Fellow
School of Computing and Information Systems, The University of Melbourne, Australia
Title:: Advances in Feature Selection and Training Robust Deep Neural Networks
Abstract:: This talk will present an overview of my recent research carried out at the Data Mining and Knowledge Discovery group at the University of Melbourne. It focuses on two main areas:
* Feature selection: Feature selection is a fundamental task in data mining.
I will present our recent progress in globally-optimized mutual information based feature selection. I will also cover the applications of feature selection techniques to novel data mining problems, namely outlying aspects mining and contrast subspace mining.
* Training robust deep neural networks: Deep neural networks are currently the state-of-the-art in many challenging pattern recognition tasks.
They are, however, sensitive to adversarial noise and fooling samples.
In this talk, I will present a novel technique for training robust neural networks, using the random projection regularizer.