Rakuten Dataset
NII provides the "Rakuten Dataset" to researchers according to the contract between NII and Rakuten Group, Inc.
update: 2022-03-10
Outline of the Data
- Rakuten Ichiba: All product data (Approx. 283 million items), item review data (Approx. 70 million reviews), shop review data (Approx. 22.5 million reviews)
- Rakuten Travel: Facility data (Approx. 29,000 facilities), review data (Approx. 6.56 million reviews)
- Rakuten GORA (Rakuten's golf service): Facility data (1,669 facilities), review data (320,000 reviews)
- Rakuten Recipe: Recipe data (Approx. 800,000 recipes), recipe images (Approx. 800,000 images), Pickup recipe (1,854 recipes), Daylicious news (362 articles)
- Annotated data:
- Tsukuba sentiment-tagged corpus (TSUKUBA corpus): corpus with sentiment polarity information for each sentence of Rakuten Travel's review data provided by University of Tsukuba
- Product images dataset with category label: image dataset of products which belong to Rakuten genres corresponding to some categories in Caltech-256 dataset
- Images with character area: images with rectangle coordinates of character area
- Floor plan from Rakuten Real Estate and pixel-wise wall label: floor plan images (powered by LIFULL Co., Ltd., 500 images) and annotated wall label with pixel
- Rakuten France: user review, products reviews interests: data used in "Challenge Data 2016-2017: Prediction of products reviews interests" (in French)
- Rakuten France: book and author information: book item information, annotated book item information with normalized author name
- Rakuten France: Multi-modal Product Dataset: product information and images used in "SIGIR 2020 E-Commerce Workshop Data Challenge" (in French)
- Rakuten Books bibliographic information for author name disambiguation test: book data by same name authors and annotated data to distinguish between different authors with the same name
- Rakuten Travel Review: Aspects and Sentiment-tagged corpus: corpus with 7 review aspects and sentiment polarity information for each sentence of Rakuten Travel's review data
- Chart Digitization Dataset: Synthesized Image data of vertical bar chart, horizontal bar chart, pie chart and line chart (1,000 images for each chart type) and JSON format annotation data of the chart components. (added on 2022-03-10)
Please see "Rakuten Data Release" for details.
The same data is available also from Advanced Language Information Forum (ALAGIN).
Update Information
- The data of "Chart Digitization Dataset" (Annotated Data) was newly released. Users can download it from the data distribution site. (2022/03/10)
- Company name was changed from Rakuten, Inc. to Rakuten Group, Inc. (2021/04/01)
- The data of "Rakuten France: Multi-modal Product Dataset" (Annotated Data) was newly released. (2021/03/22)
- The data of "Rakuten Travel Review: Aspects and Sentiment-tagged corpus" (Annotated Data) was newly released. (2021/02/10)
- "Postal mail address for agreements (Rakuten, Inc.)" was updated. (2020/08/31)
- "Rakuten Ichiba" data was updated. (2020/02/25)
- "Rakuten Travel" data was updated. (2020/02/25)
- The data of "Rakuten Books bibliographic information for author name disambiguation test" (Annotated Data) was newly released. (2020/02/25)
- The data of "Rakuten France: book and author information" (Annotated Data) was newly released. (2019/07/05)
- Name of "PriceMinister: user review, products reviews interests" was changed to "Rakuten France: user review, products reviews interests", and it was moved to "Annotated Data". (2019/07/05)
- The data of "Floor Plan from Rakuten Real Estate (powered by LIFULL Co., Ltd.) and Pixel-wise Wall Label" (Annotated Data) was newly released. (2017/11/28)
- Distribution of "Rakuten Viki" data was terminated. (2017/11/28)
- "PriceMinister" data was newly released. (2017/04/03)
- Distribution of Rakuten Auction data was terminated, because the Rakuten Auction Service was closed. (2016/10/31)
- "Rakuten Viki" data was newly released and "Rakuten Travel" data and "Rakuten Recipe" data were updated. (2016/01/12)
- Three kinds of "Annotated data" were newly released and "Rakuten Travel" data was updated. (2014/09/30)
- "Rakuten Auction" data was newly released and "Rakuten Ichiba" data was updated. (2014/04/01)
- "Rakuten Recipe" data was newly released and "Rakuten Travel" data was updated. (2012/08/07)
- "Rakuten Ichiba" data was updated. (2011/08/23)
- Distribution of "Rakuten Dataset" was started. (2010/08/04)
User Qualification
A researcher belonging to a university or a public research institution only can apply for the use of the Data. Application from those belonging to a private company, etc. will not be accepted. For more details, please e-mail to the IDR office shown in the "contact" section below.
Application
Please submit an application following the procedure shown below. The data is available free for charge. The required documents can be downloaded from the links in "documents" section below.
-
Please read the contents of "Agreement on the usage of Rakuten Data (sample)" carefully and confirm if it is acceptable for you (and your organization), and fill out the "Application Form" following the items below:
-
An application should be made for each user group such as a laboratory in a university, and the applicant should be a principal investigator in the group, e.g., a professor at a university or a head researcher at a research institution.
-
The signer of the Agreement should be a person authorized to sign and seal the agreement on behalf of your organization and having an official seal (typically, Dean of a school or the upper for the case of a university). Please consult with your administrative section about the qualified signer beforehand and enter formal information in full for "Signer" as to be printed in the Agreement.
-
"Research group members" are restricted to the researchers and students belonging to the abovementioned user group and doing research under supervision of the applicant. When someone belonging to a different organization or a separate laboratory, even if in a joint research, would use the data, a separate application should be made.
-
-
Please e-mail the application form (MS Word file) as an attachment file to the IDR office shown in the "contact" section below.
-
The subject of the email should be "Application for the Rakuten Dataset (Xxxx University)". If the subject is not appropriate, the email may be discarded without its content reviewed.
-
In case you make applications for other datasets at the same time, please send each one with a separate email.
-
Please note that your application will be forwarded to Rakuten Group, Inc. and will be used for judging the qualification, preparing an agreement, and managing the users.
-
-
Your application will be reviewed at the IDR office and the availability of the data will be emailed to you. If you do not receive a reply email in a week, please contact the IDR office.
*** Please understand that, since the user qualification conditions are set by Rakuten Group, Inc., there may be a case we cannot provide the data to you.
-
Your organization and Rakuten Group, Inc. conclude an agreement.
-
Rakuten Group, Inc. will send a PDF of "Agreement" to you by email.
-
Sign and seal, and send it to Rakuten Group, Inc.
-
Rakuten Group, Inc. will return a copy of the sealed agreement to you. Please keep it safe.
*** Please follow instructions in the email from Rakuten Group, Inc.
-
-
The IDR office will provide the data when noticed by Rakuten Group, Inc.. If you do not receive the download instruction within a few days after you receive the sign-and-sealed Agreement, please contact the IDR office.
Data provision
The data will be provided by downloading from the IDR's Web server. If you cannot download the data for some technical reason, please consult us.
Documents
- Application Form (MS Word file)
- Agreement on the usage of Rakuten Data (sample) (link to Rakuten Group, Inc.'s website)
Usage report
- When you make the research result public, you are required to give Rakuten Group, Inc. notice of the publication content, date, place, etc. at least 10 days before the submission or 30 days before the presentation.
- You are requested to submit a research report using the data once a year.
- For information on how to cite this dataset in publications, please refer to the DSC Reference Portal.
Contact (IDR Office)
IDR Office, National Institute of Informatics
- E-mail:
- idr [at] nii.ac.jp
- Address:
- 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, JAPAN
(Please use e-mail for communicating with us if not otherwise specified.)
Postal mail address for agreements (Rakuten Group, Inc.)
"Rakuten Data Release" Office, Rakuten Institute of Technology (R.I.T.)
- Address:
- Rakuten Crimson House,
1-14-1 Tamagawa, Setagaya-ku, Tokyo 158-0094, JAPAN- E-mail:
- tech-rit-omo [at] mail.rakuten.com