Digital Content and Media Sciences Research Division
Digital Content and Media Sciences Research Division, Associate Professor
Establishing a knowledge base for open science information
Widely sharing research results with society
The trend towards "open science" that aims to widely release and share not only research articles but also the supporting research data has spread internationally. The National Institute of Informatics (NII) is promoting this trend by collecting and managing research data from all academic fields and building searchable information infrastructure, and I am involved in constructing this system.
In Europe and the United States, it is taken for granted that the results of research paid for with taxpayers' money will be freely accessible to citizens. Having access to the data behind articles makes it possible to reproduce experiments and investigations, and this can accelerate research and reduce the number of fraudulent articles. These data were available to anyone in the industrial sector and the general public may give rise to novel research. The results are returned to the general public. This kind of system of open science is urgently required in Japan too.
My specialism is bioinformatics, and I have been researching systems for sharing genomics databases and medical research data. I intend to expand this research to all academic fields, including social and natural sciences, to construct a knowledge base of scientific information that is freely available to the public. NII operates numerous services, such as the Science Information NETwork (SINET, an academic network linking research institutions nationwide), GakuNin (Academic Access Management Federation), Japanese Institutional Repositories Online Cloud (JAIRO Cloud, an institutional repository cloud service), and CiNii (a search service for research articles in Japan), and the system that I am developing must connect successfully with these services.
Towards a massive database linking vast amounts of data
Currently, articles and researcher information registered via multiple routes are not integrated on the databases and linking research data to this information is realistically difficult. Also, it is difficult to collect data from projects carried out at institutions other than universities, as is archiving. The names of contributors involved in data preparation do not come up for evaluation. Valuable research data are scattered around each institution, and policies on the granularity and management of these data differ according to the field. If control is abandoned, then data will be lost, so open science must facilitate the reuse of research data. A technology called linked open data (LOD) is useful for this purpose.
People can read documents on the Internet (Web) as characters and images, but computers cannot understand them in that form. By releasing data in a form that is easily readable by computers and connecting them together, it is possible to build a "data web" network that connects completely different data. Applying this LOD technology will make it possible to obtain the data one is looking for all at once, even if they are in different databases, because it will allow the entire Internet to be searched as one massive database. Connecting information from various fields can be expected to have results such as narrowing down compounds for medicinal purposes using biological data and chemical data. How should we accumulate and share research data? When should we make data open? What should be done to expand the network nationwide and coordinate internationally? There are many challenges, but ultimately I want to help realize a society in which not only experts but high school, junior high school, and even elementary school students can access interesting research data, and software for solving problems is instantly available via cloud services.