> HOME > Data List > Speech Corpus

Speech Corpus

Speech corpora that Speech Resources Consortium established in NII accepted from various institutions and groups. These are provided by Speech Resources Consortium for the time being.

update: 2014-07-01

Corpus provided by the Speech Resources Consortium

For the details of each corpus, please refer to Speech Corpus List. For ordering procedure, please refer to License Agreement Form.

<Free Corpus>

  1. Priority Area Project on "Spoken Language" - Grant-in-Aid for Developmental Scientific Research on "Speech Database" Continuous Speech Corpus (PASL-DSR)
  2. University of Tsukuba Multilingual Speech Corpus (UT-ML)
  3. Tohoku University - Matsushita Isolated Word Database (TMW)
  4. GSR(A) "Regional Difference in Spoken Japanese Dialects" Spoken Japanese Dialect Corpus (GSR-JD)
  5. Real World Computing Project (RWCP) Speech Corpora
    1. RWCP Spoken Dialogue Corpus - 1996 edition (RWCP-SP96)
    2. RWCP Spoken Dialogue Corpus - 1997 edition (RWCP-SP97)
    3. RWCP News Speech Corpus (RWCP-SP99)
    4. RWCP Meeting Speech Corpus (RWCP-SP01)
  6. RWCP Real Environment Speech and Acoustic Database (RWCP-SSD)
  7. Priority Area "Spoken Dialogue" Spoken Dialogue Corpus (PASD)
  8. CIAIR Children Voice Speech Corpus (CIAIR-VCV)
  9. IPSJ SIG-SLP Corpora and Environments for Noisy Speech Recognition (CENSREC)
    1. Noisy Speech Recognition Evaluation Environment (CENSREC-1 ⟨AURORA-2J⟩)
    2. Noisy Speech Detection Evaluation Environment (CENSREC-1-C)
    3. Audio-Visual Speech Recognition Evaluation Environment (CENSREC-1-AV)
    4. In-car Connected Digit Data and Environment for Noisy Speech Recognition (CENSREC-2)
    5. In-car Isolated Word Data and Environment for Noisy Speech Recognition (CENSREC-3)
    6. Reverberant Speech Recognition Evaluation Environment (CENSREC-4)
  10. Priority Areas "Advanced Utilization of Multimedia to Promote Higher Education Reform" Speech Database (UME)
    1. English Speech Database Read by Japanese Students (UME-ERJ)
    2. Japanese Speech Database Read by Foreign Students (UME-JRF)
  11. RIKEN Spoken Dialogue Corpus (Word processing task, Japanese) (RIKEN-DLG)
  12. Japanese Map Task Dialogue Corpus (MapTask)
  13. Utsunomiya University Spoken Dialogue Database for Paralinguistic Information Studies (UUDB)
  14. Japanese Phonetically-balanced Word Speech Database (ETL-WD)
  15. Speech Database of the 1991-1992 Tsuruoka Survey (Tsuruoka91-92)
  16. X-ray Film database for speech research (X-Ray)
  17. Priority Areas "Prosody and Speech Processing" Japanese MULTEXT Prosodic Corpus (MULTEXT-J)
  18. Chinese MULTEXT Corpus (MULTEXT-C)
  19. Keio University Japanese Emotional Speech Database (Keio-ESD)
  20. Vowel Database: Five Japanese Vowels of Males, Females, and Children Along with Relevant Physical Data (JVPD)
  21. Tokyo Institute of Technology Multilingual Speech Corpus (TITML)
    1. Indonesian (TITML-IDN)
    2. Icelandic (TITML-ISL)
  22. AWA Long-Term Recording Speech Corpus (AWA-LTR)
  23. Speech database of Aragusuku Dialect (Aragusuku)
  24. Speech database of Oogami Dialect (Oogami)
  25. Online Gaming Voice Chat Corpus with Emotional Label (OGVC)
  26. Chiba Three-party Conversation Corpus (Chiba3Party)

<Fee-based Corpu>

  1. ASJ Japanese Newspaper Article Sentences Read Speech Corpus (JNAS)
  2. Japanese Newspaper Article Sentences Read Speech Corpus of the Aged (S-JNAS)
  3. ASJ Continuous Speech Corpus for Research (ASJ-JIPDEC)
  4. NTT - Tohoku University Familiarity-controlled Word Lists (FW03)
  5. NTT - Tohoku University Familiarity-controlled Word Lists 2007 (FW07)
  6. NTT Infant Speech Database (INFANT)


  1. JEIDA Japanese Common Speech Data Corpus (JEIDA-JCSD)
  2. JEIDA Noise Database (JEIDA-NOISE)