Multimodal Cognitive Load Measurement and its Applications
Fang Chen (Research Group Manager at NICTA, Australia)
Dr. Fang Chen holds a PhD in Communications and Electronic Systems and an MBA. She was appointed the Dean of Faculty of Electronic and Information Engineering in Beijing Jiaotong University in 1997. She then became the Team Leader in Intel Research Centre and joined the Motorola Research Centre afterwards, as the founding manager of the Speech and Language Generation Research Laboratory of MCRC. Joining NICTA in 2004, which is the largest Australian government funded research organisation dedicated in Information and Communication technologies, she is now the Research Group Manager for the Making Sense of Data research theme at the ATP Laboratory. Dr. Chen is a conjoint professor in The University of New South Wales, Honorary Associate in The University of Sydney and Adjunct Professor in Beijing Jiaotong University. He has supervised 8 PhD students to completion and is supervising 7 PhD students now. Her main research interests are in multimodal human-machine interaction, cognitive load modelling, and digital signal processing. She has researched and developed in many aspects of multimodal human-machine interaction, such as speech and natural language dialogue, manual gesture and pen gesture recognition, emotion, usability and human factors. In past 6 years, she led the team researched in objective non-intrusive cognitive load measurement and created world first real-time cognitive load modelling through users' multimodal behaviour. She has more than 100 refereed publications and 26 patent filings in Australia, US, Europe, Canada, China, Japan, Korea and Mexico.
Cognitive load refers to the capacity and resources used in working memory while learning, and more broadly, while completing tasks where novel information and novel processing are required. In complex and time-critical situations, users can experience extremely high cognitive load, which can interfere with successful task completion. An understanding of each user's cognitive load will enable us to alleviate these problems by implementing strategies to adjust the system's behaviour, provide resources needed according to each individual's cognitive burden and help them complete the task with maximum efficiency. Cognitive load is difficult to measure, particularly in individual users. Our research focuses on using the multimodal interaction paradigm to detect fluctuations in cognitive load. The primary advantage of this approach is that cognitive load can be determined implicitly by monitoring pattern variations in specific multimodal communication features executed in day-to-day tasks. Such unobtrusive measures may help determine user's cognitive load in real-time and achieve the ultimate goal of adapting information content selection and presentation (multimodal output generation) accordingly, in order to ensure optimal user performance. However, assessing a user's cognitive load through nuances in their multimodal interactive behaviour requires identifying a number of indices that can reliably reflect fluctuations. In this talk, following a short introduction to cognitive load theory, experiments designed to identify the relationships between combined speech, gesture and other inputs and users' cognitive load are described. The feasibility of using multimodal behaviour as an index of cognitive load is supported by the results of our studies. The data suggests that semantic multimodal behavioural features are sensitive to cognitive load variations, with the structure of multimodal production changing to reflect increases in load. Similarly, data collected using physiological sensors (e.g. skin conductance) is also shown to correlate highly with reported levels of cognitive load.
It is about a century since the proposal of the Yerkes-Dodson law, which states that there is an optimum mental arousal for performing a task, below and above which performance will deteriorate. Despite this, there are few methods that have been demonstrated to measure cognitive load in practise, and fewer still in real time. Speech-based methods are attractive because they are non-intrusive, inexpensive and can be real-time. Variations in the speech signal as cognitive load changes are directly due to changes in the manner in which voice articulation structures (muscles and joints) are employed. Signal-based features have the advantage of being extracted from the speech data collected in a completely automated way, requiring very little manual intervention. Firstly, techniques for the speech based measurement of cognitive load from the literature will be discussed. The physical characteristics of speech signals such as energy, changes in pitch and fundamental frequency have been explored as possible candidates for load indicators. Prosodic features such as pitch and intensity also provide extra information related to the emotion or intention of the user and shown a potential relationship to cognitive load levels for example, Prosodic features can be extracted and modelled automatically with statistical modelling approaches. However, in examining the speech signal, the difficulty lies in finding features related to cognitive load and extracting these from the raw speech data with a "signal-based approach". The system would need to solicit a sufficient amount of speech data from the user for training and evaluation purposes to make a valid assessment.Like other paralinguistic classification tasks, cognitive load measurement is a challenging problem, and one that must account for variability posed by linguistic, contextual and speaker-specific characteristics. Unlike some other paralinguistic classification tasks, cognitive load measurement requires classification along an ordinal scale, motivating the use of very specific machine learning techniques. Most notably, we have found that speech signal features undergo significant changes under high levels of load and are the best candidate indicators of load and I will give all details in the talk.
A critical part of any speech classification system is the choice of features. This topic investigates the choice from a few different perspectives: (i) the psychophysiological effects of cognitive load on the speech production system, (ii) features known to correspond with emotional variation or expressive speech, (iii) pragmatic considerations for real time systems developed from limited data, and (iv) feature diversity, in the context of systems that fuse information from multiple subsystems in order to improve system accuracy. In this topic, we review a range of predominantly acoustic feature extraction techniques, and compare their efficacy during evaluations. Effort will be devoted to identifying similarities or correlation between features, and differences or uniqueness between features. Recently proposed features that have shown promise will form a part of the presentation. Due to the virtual impossibility of extracting features from speech that are due entirely to emotion or cognitive load, without any phonetic or speaker identity information, feature normalization is an important topic. Two key approaches for dealing with feature variability due to speech phonetic content will be contrasted. Techniques for modelling emotion and cognitive load will be discussed. Comparisons between human and machine based emotion and cognitive load classification from speech will be made. Finally, some comments concerning the ordinal nature of the cognitive load classification problem will be made. The study of cognitive load also requires some specific approaches to database construction. An important consideration for any emotion or cognitive load recognition system is the design, availability and use of suitable databases. These issues are discussed, with particular reference to experimental design and procedures, the importance of natural speech data, and the optimal use of data during training, development and online operation.
Furthermore, I will present the results of our speech content analysis, conducted on the users' choice of linguistic and grammatical features, which suggest these also reliable indicators of cognitive load. Several novel linguistic features are proposed as potential indices of user's experienced cognitive load with an objective to identify linguistic features. Linguistic and dialogue related patterns (word and phrase level features) vary in reference to cognitive load presumably due to the strategies used when subjects are under high levels of load, responses might be structured or instantiated differently, depending on the available resources. Specifically, we were interested in semantic word categories, such as cognition and perception words, positive and negative words would change from low load to high load tasks. Similarly, we hypothesized some grammatical features, such as the use of singular or plural pronouns would also change depending on the task complexity and hence cognitive load. Analyses show that users demonstrated different linguistic patterns under various cognitive load levels. This is illustrated in a real-life example, using several linguistic features extracted from a speech data set collected during the completion of highly time-critical and data-intense bushfire management tasks in regional locations around Australia. Detailed results will be given in this talk.
In much the same way as some linguistic and prosodic features of speech can be indicative of high cognitive load; it seems plausible that the surface features of other modal inputs, such as pen-gesture, may also be affected during high load. Pen-gesture trajectories can be used in a variety of applications and devices e.g., to select entities or to draw predefined symbolic shapes. These gesture trajectories, whether they are produced with the participant's hands, fingers or using a stylus or other ink-style implement, can also betray symptoms of cognitive load when task complexity and mental demand increases. For example, geometric and temporal patterns in trajectories from low to high load tasks may differ, as well changes in function and usage of the modality itself – i.e. the situations in which pen is and isn't used. Finally, an analysis of what is being drawn or gestured can give us some insight into the cognitive load being experienced by the user.br />In this lecture, we describe longitudinal user studies and one such cognitive training tool, equipped with an interactive pen interface, and think-aloud protocols. The aim is to verify whether cognitive load can be inferred directly from changes in geometric and temporal features of the pen signal collected. We compare pen input trajectories between cognitive load levels and overall Pre and Post training tests. The results show trajectory durations decrease, lengths decrease and speeds increase, all changing significantly as cognitive load increases. These feature changes are attributed to mechanisms for dealing with high cognitive load in working memory, with minimal rehearsal. In terms of expertise, the trajectory duration further decreases and speed further increases.br />These changes are attributed in part to cognitive skill acquisition and the development of schema, both in extraneous (interface and interaction related) and intrinsic (domain related) networks, in the time between Pre-test to Post-test. As such, these pen trajectory features offer further insight into implicit communicative changes related to load fluctuations.br />
Multimodal cognitive load measurement has only relatively recently been proposed as a method for cognitive load measurement. Physiologists have long used these kinds of measures to evaluate stress, affective and arousal states. In contrast with subject reporting on the overall impression of the task, physiological sensors can uncover a number of fluctuations that may occur as the subject completes the task, indicating fluctuating levels of experienced cognitive load.The talk will also briefly discuss the use of other physiological signals for cognitive load measurement. Physiological sensors on the other hand, seem to be ideal in providing real-time control indications of experienced load, despite their level of intrusiveness. The choice of sensors, their reliability and accuracy in reflecting cognitive load are still open issues discussed in the art. High-complexity, safety-critical tasks could benefit from cognitive load assessment technology as part of routine performance monitoring in the first instance. These kinds of environments lend themselves to implicit (multi)modal cognitive load assessment because they have a variety of communicative behavioural resources currently available, from speech to manual gesture input (mice, keyboards, digital ink and touch-screens), which can all be used to form an assessment of the subject's load experience. As a follow on from performance monitoring applications, targeted training can also benefit from cognitive load assessments. The use of cognitive load indices in multimodal interface environments, possibly in conjunction with performance and other measures, could provide an individually targeted learning experience. Recent advances in the design of applications and user interfaces have promoted the awareness of the user context as well as user preferences. The cognitive load of a user represents an important factor to be considered in adaptive human computer interfaces, especially in scenarios of high intensity work conditions and complex tasks. Multimodal interfaces are, in themselves, known to reduce the level of experienced cognitive load over tasks completed using unimodal interfaces. Systems equipped with methods for unobtrusive, real-time detection of cognitive load and general cognitive load awareness will be able to adapt content delivery in a more intelligent way by sensing what the user is able to cope with at any given moment. Key insights, results and conclusions will be recapitulated and discussion will be directed towards research problems that are either currently unresolved or still on the horizon of this research field. Participants will be exposed to likely future challenges, both during presentation and during the ensuing discussion.