> HOME > Data List > Yahoo! Dataset

Distribution of "Yahoo! Chiebukuro data (2nd edition)"

Datasets that NII accepted from Yahoo Japan Corporation (web page in Japanese) and provides for researchers.

Notice
Application for use of "Yahoo! Chiebukuro data (2nd edition)" has been closed on Oct. 31, 2018. Provision of "Yahoo! Chiebukuro data (3rd edition)" will be anounced on this website as soon as it will become available.

update: 2018-10-31

Outline of the Data

"Yahoo! Chiebukuro" is the largest knowledge retrieval service in Japan, and the Yahoo Japan Corporation has been providing this service since April 2004. Their aim is to connect people who want to question and those who want to answer, and the sharing of wisdom and knowledge among the participants.

This data was obtained by Yahoo Japan Corporation, extracting questions and answers solved in the following period from the "Yahoo! Chiebukuro" database.

  • Period: April 2004 -- April 2009
  • Number of questions: approx. 16 million
  • Number of answers: approx. 50 million

<Difference from the 1st edition>

The 2nd edition conprises data from April 2004 till April 2009, whereas the 1st edition comprises data till October 2005, i.e., the period when the Chiebukuro service was a beta version.

Not only the period is extended, but also it includes postings from mobile phones that has been supported since May 2006.

Moreover, new additional data attached to the postings (number of evaluations, coins and Chie collections, mobile flag, etc.) that were introduced after the period of the 1st edition are provided.

  1st edition (closed) 2nd edition
period April 2004 -- October 2005 April 2004 -- April 2009
number of questions approx. 3 million approx. 16 million
number of answers approx. 13 million approx. 50 million

NTCIR-8 CQA Test Collection

"NTCIR-8 CQA" (Community QA) test collection using this data is also provided together with this data.

This test collection can be used to evaluate the quality of the answer on the CQA site.
This test collection was built by NTCIR Project organized by NII, and is composed of the following data.

  • 1500 questions extracted from Yahoo Chiebukuro data version 1.0
  • Assessment results by four assessors
  • ID lists, best answer lists, and category information, etc.

For more details, please refer to NTCIR-8 CQA page.

Application

Please see "Application Procedure".