NII SEMINAR SERIES ON DIMENSIONALITY AND SCALABILITY VI
This series of seminars - launched at an NII Shonan Meeting held in May 2013 - explores the issues of dimensionality and scalability in the context of such application areas as databases, data mining, machine learning and multimedia. All are welcome to attend.
Thursday 25 February 2016
10:30-12:30 and 14:30-16:30
National Institute of Informatics, Room 1208 access
On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study Arthur Zimek, LMU Munich, Germany
Clustering Evaluation in High-Dimensional Data Milos Radovanovic, University of Novi Sad, Serbia
Why is my Entity Typical or Special? Approaches for Inlying and Outlying Aspects Mining James Bailey, The University of Melbourne, Australia
Advances in Feature Selection and Training Robust Deep Neural Networks Vinh Nguyen, The University of Melbourne, Australia
List of Abstracts
PD Dr. Arthur Zimek Database and Information Systems Institut für Informatik Ludwig-Maximilians-Universität München (LMU Munich), Germany
On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study
The evaluation of unsupervised outlier detection algorithms is a constant challenge in data mining research. Little is known regarding the strengths and weaknesses of different standard outlier detection models, and the impact of parameter choices for these algorithms. The scarcity of appropriate benchmark datasets with ground truth annotation is a significant impediment to the evaluation of outlier methods. Even when labeled datasets are available, their suitability for the outlier detection task is typically unknown. Furthermore, the biases of commonly-used evaluation measures are not fully understood. It is thus difficult to ascertain the extent to which newly-proposed outlier detection methods improve over established methods.
Milos Radovanovic Assistant Professor Department of Mathematics and Informatics, Faculty of Science University of Novi Sad, Serbia
Clustering Evaluation in High-Dimensional Data
Clustering evaluation plays an important role in unsupervised learnings systems, as it is often necessary to automatically quantify the quality of generated cluster configurations. This is especially useful for comparing the performance of different clustering algorithms as well as determining the optimal number of clusters in clustering algorithms that do not estimate it internally. Many clustering quality indexes have been proposed over the years and different indexes are used in different contexts. There is no unifying protocol for clustering evaluation, so it is often unclear which quality index to use in which case. In this talk, we review existing clustering quality measures and evaluate them in the challenging context of high-dimensional data clustering. High-dimensional data is sparse and distances tend to concentrate, possibly affecting the applicability of various clustering quality indexes. We analyze the stability and discriminative power of a set of standard clustering quality measures with increasing data dimensionality. Our evaluation shows that the curse of dimensionality affects different clustering quality indexes in different ways, and that some are to be preferred when determining clustering quality in many dimensions.
James Bailey Professor School of Computing and Information Systems, The University of Melbourne, Australia
Why is my Entity Typical or Special? Approaches for Inlying and Outlying Aspects Mining
When investigating an individual entity, we may wish to identify aspects in which it is usual or unusual compared to other entities. We refer to this as the inlying/outlying aspects mining problem and it is important for comparative analysis and answering questions such as "How is this entity special?" or "How does it coincide or differ from other entities?" Such information could be useful in a disease diagnosis setting (where the individual is a patient) or in an educational setting (where the individual is a student). We discuss possible algorithmic approaches to this task - an approach based on feature selection, an approach based on density estimation and a hybrid framework. We also investigate the scalability and effectiveness of these different approaches.
Vinh Nguyen Research Fellow School of Computing and Information Systems, The University of Melbourne, Australia
Advances in Feature Selection and Training Robust Deep Neural Networks
This talk will present an overview of my recent research carried out at the Data Mining and Knowledge Discovery group at the University of Melbourne. It focuses on two main areas: * Feature selection: Feature selection is a fundamental task in data mining. I will present our recent progress in globally-optimized mutual information based feature selection. I will also cover the applications of feature selection techniques to novel data mining problems, namely outlying aspects mining and contrast subspace mining. * Training robust deep neural networks: Deep neural networks are currently the state-of-the-art in many challenging pattern recognition tasks. They are, however, sensitive to adversarial noise and fooling samples. In this talk, I will present a novel technique for training robust neural networks, using the random projection regularizer.
Michael Houle, Visiting Professor meh[at]nii.ac.jp *Please replace [at] with @.