> HOME > Data List > Osaka University Multimodal Dialogue Corpus (Hazumi)

Osaka University Multimodal Dialogue Corpus (Hazumi)

Dataset that NII received from Osaka University and provides to researchers.

update: 2022-07-07

Outline of the Data

This is a multimodal human-agent dialogue corpus collected at the Institute of Scientific and Industrial Research (SANKEN), Osaka University. The virtual agent was operated by a human operator located in a different room using Wizard-of-Oz method. The dialogue was chit-chat about several topics. Each participant's dialogue was about 15 to 20 minutes long.

  1. In-person recording version

    The dialogues which were recorded in the lab using video and Microsoft Kinect between 2017 and 2019.

    • Hazumi1712 (data started being collected in December 2017)

      The participants were asked in advance whether or not they were interested in several topics (e.g., sports, dramas, celebrities, games, trains, etc.) and talked about three topics in which the participant was interested and three in which the participant was not interested. There were 29 participants (14 male and 15 female) between the ages of 20s to 50s.

    • Hazumi1902 (data started being collected in February 2019)

      The Wizard adjusted topics so that the participants could enjoy the dialogue for a longer period of time. There were 30 participants (10 male and 20 female) between the ages of 20s to 70s.

    • Hazumi1911 (data started being collected in November 2019)

      A wristband-type sensor was used to record the participant's physiological data (skin conductance, heart rate, etc.), while the dialogues were conducted in the same way as in Hazumi1902. There were 30 participants (15 male and 15 female) between the ages of 20s to 70s.

    From IDR, video data and data recorded by Microsoft Kinect (audio, depth images, and posture information as joint angle positions) are provided. The total data size is about 180GB.

    In addition, ELAN-format files containing manual transcriptions of participants' utterances, various annotations, system utterance and its dialogue act are available along with bio-signal data, dump files for experiments, and questionnaire results from the following GitHub sites.

    A paper on the technical content has been published.

    Kazunori Komatani, Shogo Okada:
    Multimodal Human-Agent Dialogue Corpus with Annotations at Utterance and Dialogue Levels,
    International Conference on Affective Computing & Intelligent Interaction (ACII), 2021.
    https://doi.org/10.1109/ACII52823.2021.9597447

    Please also refer to the release from Osaka University.
    https://resou.osaka-u.ac.jp/en/research/2020/20201020_3

    For more details about the data, please refer to the following document (currently only in Japanese).
    https://www.nii.ac.jp/dsc/idr/rdata/Hazumi/documents/HazumiOverviewInPerson.pdf

  2. Online recording version

    The dialogues which were recorded online using web meeting system between 2020 and 2021.

    • Hazumi2010 (data started being collected in October 2020)

      The Wizard adjusted topics so that the participants could enjoy the dialogue for a longer period of time. There were 33 participants (17 male and 16 female) between the ages of 20s to 60s.

    • Hazumi2012 (data started being collected in December 2020)

      The dialogues were conducted in the same way as in Hazumi2010. There were 63 participants (29 male and 34 female) between the ages of 20s to 60s.

    • Hazumi2105 (data started being collected in May 2021)

      The dialogues were recorded again for participants in Hazumi2010 or Hazumi2012 who were unaware that the system was being operated by a human operator. There were 29 participants (14 male and 15 female) between the ages of 20s to 60s.

    From IDR, video data recorded by web meeting system are provided. The total data size is about 9GB.

    In addition, ELAN-format files containing manual transcriptions of participants' utterances, various annotations, system utterance and its dialogue act are available along with dump files for experiments, and questionnaire results from the following GitHub sites.

    For more details about the data, please refer to the following document (currently only in Japanese).
    https://www.nii.ac.jp/dsc/idr/rdata/Hazumi/documents/HazumiOverviewOnline.pdf

Update Information

  • The data of online recording version (Hazumi2010, Hazumi2012, Hazumi2105) were newly released. Users can download them from the data distribution site. (2022/07/07)
  • Overseas distribution began. (2021/09/22)

User Qualification

  • The purpose of using the corpus is limited to research (including fundamental technology development).
  • Researchers, mainly affiliated with universities and public research institutes, can apply for the use of the data. Depending on the research purpose, we may also accept applications from private companies, etc. For more information, please email the IDR office in the Contact section below.
  • As a general rule, the unit of provision is a laboratory (a group equivalent to a laboratory if it is not a university). The person representing the laboratory (or the group) should apply for the use.

Application

Please apply by following the procedure shown below. The data are available free of charge. The required documents can be downloaded from the links in the Documents section below.

  1. Please read the Osaka University Multimodal Dialogue Corpus Hazumi Terms of Use and the Terms of IDR Dataset Service carefully and, confirming that they are acceptable to you (and your organization), fill out the Application Form following the items below:

    1. An application should be made for each user group such as a laboratory in a university, and the applicant should be a principal investigator in the group, e.g., a professor at a university or a head researcher at a research institution.

    2. The signer of the Consent Form should be a person authorized to sign the contract on behalf of your organization (typically, a dean of a school or higher for universities). Please consult with your administrative section about the qualified signer beforehand and enter formal information in full for the Signer as to be printed in the Consent Form.

    3. "Research group members" are restricted to the researchers and students belonging to the abovementioned user group and doing research under supervision of the applicant. If someone belonging to a different organization or a separate laboratory, even in joint research, will use the data, a separate application should be made.

  2. Please send the application form as an email attachment to the IDR office shown in the Contact section below.

    1. The subject of the email should be "Application for the Hazumi Corpus (Xxxx University)." If the subject is not appropriate, the email may be discarded without its content being reviewed.

    2. When applying for other datasets at the same time, please send each application as a separate email.

    3. Please note that your application will be forwarded to Osaka University and will be used to judge qualification, prepare the Consent Form, and manage users.

  3. Your application will be reviewed at the IDR office and the availability of the data will be emailed to you. If you do not receive a reply email within a week, please contact the IDR office.

  4. Please submit an Consent Form to NII.

    1. The IDR office will send you an Consent Form by email.

    2. The Consent Form should be signed by the signer and PI, and be sent by post to the Contact (IDR Office) shown below.

  5. The IDR office will provide the data when your signed Consent Form has been received.

Data provision

The data will be provided by downloading from the IDR's Web server.

Documents

Usage report, etc.

  • You are required to give Osaka University notice in advance of any press releases or media interviews regarding research results.
  • Please submit a report on publications at conferences and in journals every year in response to a request from the IDR office.
  • For information on how to cite this corpus in publications, please refer to the DSC Reference Portal.

Contact (IDR Office)

IDR Office, National Institute of Informatics

Email:
idr [at] nii.ac.jp
Address:
2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, JAPAN

(Please use email for communicating with us if not otherwise specified.)