> HOME > Data List > Osaka University Multimodal Dialogue Corpus (Hazumi)

Osaka University Multimodal Dialogue Corpus (Hazumi)

Dataset that NII received from Osaka University and provides to researchers.

update: 2021-09-22

Outline of the Data

This is a multimodal human-agent dialogue corpus collected at the Institute of Scientific and Industrial Research (SANKEN), Osaka University. The virtual agent was operated by a human operator located in a different room using Wizard-of-Oz method. The dialogue was chit-chat about several topics. Each participant's dialogue was about 15 to 20 minutes long, which was recorded using video and Microsoft Kinect.

  • Hazumi1712 (data started being collected in December 2017)

    The participants were asked in advance whether or not they were interested in several topics (e.g., sports, dramas, celebrities, games, trains, etc.) and talked about three topics in which the participant was interested and three in which the participant was not interested. There were 29 participants (14 male and 15 female) between the ages of 20 to 50.

  • Hazumi1902 (data started being collected in February 2019)

    The Wizard adjusted topics so that the participants could enjoy the dialogue for a longer period of time. There were 30 participants (10 male and 20 female) between the ages of 20 to 70.

  • Hazumi1911 (data started being collected in November 2019)

    A wristband-type sensor was used to record the participant's physiological data (skin conductance, heart rate, etc.), while the dialogues were conducted in the same way as in Hazumi1902. There were 30 participants (15 male and 15 female) between the ages of 20s to 70s.

From IDR, video data and data recorded by Microsoft Kinect (audio, depth images, and posture information as joint angle positions) are provided. The total data size is about 180GB. (More data will be released when they are ready.)

In addition, ELAN-format files containing manual transcriptions of participants' utterances, various annotations, system utterance and its dialogue act are available along with bio-signal data, dump files for experiments, and questionnaire results from the following GitHub sites.

A paper on the technical content will be published.

Kazunori Komatani, Shogo Okada:
Multimodal Human-Agent Dialogue Corpus with Annotations at Utterance and Dialogue Levels
International Conference on Affective Computing & Intelligent Interaction (ACII), (accepted), 2021.

Please also refer to the release from Osaka University.
https://resou.osaka-u.ac.jp/en/research/2020/20201020_3

For more details about the data, please refer to the following document (currently only in Japanese).
https://www.nii.ac.jp/dsc/idr/rdata/Hazumi/documents/HazumiOverview.pdf

Update Information

  • Overseas distribution began. (2021/09/22)

User Qualification

  • The purpose of using the corpus is limited to research (including fundamental technology development).
  • Researchers, mainly affiliated with universities and public research institutes, can apply for the use of the data. Depending on the research purpose, we may also accept applications from private companies, etc. For more information, please email the IDR office in the Contact section below.
  • As a general rule, the unit of provision is a laboratory (a group equivalent to a laboratory if it is not a university). The person representing the laboratory (or the group) should apply for the use.

Application

Please apply following the procedure shown below. The data is available free of charge. The required documents can be downloaded from the links in the Documents section below.

  1. Please read the Osaka University Multimodal Dialogue Corpus Hazumi Terms of Use and the Terms of IDR Dataset Service carefully and, confirming that they are acceptable to you (and your organization), fill out the Application Form following the items below:

    1. An application should be made for each user group such as a laboratory in a university, and the applicant should be a principal investigator in the group, e.g., a professor at a university or a head researcher at a research institution.

    2. The signer of the Consent Form should be a person authorized to sign the contract on behalf of your organization (typically, a dean of a school or higher for universities). Please consult with your administrative section about the qualified signer beforehand and enter formal information in full for the Signer as to be printed in the Consent Form.

    3. "Research group members" are restricted to the researchers and students belonging to the abovementioned user group and doing research under supervision of the applicant. If someone belonging to a different organization or a separate laboratory, even in joint research, will use the data, a separate application should be made.

  2. Please send the application form as an email attachment to the IDR office shown in the Contact section below.

    1. The subject of the email should be "Application for the Hazumi Corpus (Xxxx University)." If the subject is not appropriate, the email may be discarded without its content being reviewed.

    2. When applying for other datasets at the same time, please send each application as a separate email.

    3. Please note that your application will be forwarded to Osaka University and will be used to judge qualification, prepare the Consent Form, and manage users.

  3. Your application will be reviewed at the IDR office and the availability of the data will be emailed to you. If you do not receive a reply email within a week, please contact the IDR office.

  4. Please submit an Consent Form to NII.

    1. The IDR office will send you an Consent Form by email.

    2. The Consent Form should be signed by the signer and PI, and be sent by post to the Contact (IDR Office) shown below.

  5. The IDR office will provide the data when your signed Consent Form has been received.

Data provision

The data will be provided by downloading from the IDR's Web server.

Documents

Usage report, etc.

  • You are required to give Osaka University notice in advance of any press releases or media interviews regarding research results.
  • Please submit a report on publications at conferences and in journals every year in response to a request from the IDR office.
  • For information on how to cite this corpus in publications, please refer to the DSC Reference Portal.

Contact (IDR Office)

IDR Office, National Institute of Informatics

Email:
idr [at] nii.ac.jp
Address:
2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, JAPAN

(Please use email for communicating with us if not otherwise specified.)