News

News Release

Release of New Japanese LLMs, "LLM-jp-4 8B" and "LLM-jp-4 32B-A3B", Trained on a High-Quality Corpus of Approximately 12 Trillion Tokens under an Open-Source License
Surpassing GPT-4o and Qwen3-8B on Several Standard Benchmarks

 The Research and Development Center for Large Language Models (LLMC) at the National Institute of Informatics (NII; Director-General: Sadao Kurohashi; Chiyoda-ku, Tokyo), an inter-university research institute within the Research Organization of Information and Systems, has been training large language models (LLMs) from scratch within the activities of LLM-jp, an open research community. We are pleased to announce the public release of our new models under an open-source license: "LLM-jp-4 8B," a dense model with approximately 8.6 billion parameters, and "LLM-jp-4 32B-A3B," a Mixture-of-Experts (MoE) model with approximately 32 billion total parameters.
 To train these models, we gave due consideration to the Open Source AI Definition (OSAID) and collected, selected, and constructed high-quality training corpora accessible to third parties. As a result, we used a training corpus of approximately 12 trillion tokens, consisting of publicly available Internet data, government and Diet documents, synthetic data, and other resources.
 The released models support a context length of up to approximately 65,000 tokens, and achieved performance surpassing that of powerful multilingual LLMs such as GPT-4o and Qwen3-8B in both Japanese-language understanding, as measured by Japanese MT-Bench, and English-language understanding, as measured by MT-Bench.
 LLMC will continue research and development aimed at ensuring the transparency and reliability of LLMs by leveraging "LLM-jp-4 8B" and "LLM-jp-4 32B-A3B." Larger-parameter models are currently under development and are scheduled for release during Japan's fiscal year 2026.

Overview of the Released LLMs

(1)Computational Resources

We used AI Bridging Cloud Infrastructure (ABCI 3.0), operated by the National Institute of Advanced Industrial Science and Technology (AIST).

(2)Models
  • LLM-jp-4 8B Model
    • Model architecture: Llama 2 architecture
    • Number of parameters(*1): approximately 8.6 billion (8B)
  • LLM-jp-4 32B-A3Bモデル
    • LLM-jp-4 32B-A3B Model
    • Model architecture: Qwen3 MoE*2 architecture
    • Total number of experts: 128
    • Number of active parameters: approximately 3.8 billion (3.8B)
    • Number of active experts: 8
(3)Training Corpora

With due consideration to OSAID, we collected, selected, and constructed high-quality training corpora accessible to third parties, resulting in a training corpus*3 approximately six times larger than that used for our previous model series, "LLM-jp-3.1."

Pre-training

For pre-training, we used a large-scale corpus consisting of publicly available Internet data, government and Diet documents, and other resources. The corpus totals approximately 19.5 trillion tokens, comprising approximately 700 billion Japanese tokens, 17.8 trillion English tokens, 850 billion tokens in other languages (Chinese and Korean), and 200 billion tokens of source code. Based on experiments to optimize the sampling weight of each sub-corpus, we finally used approximately 10.5 trillion tokens in total for pre-training.

Mid-training

Following pre-training, mid-training was conducted, using a training corpus totaling approximately 1.2 trillion tokens, consisting of the pre-training corpus together with LLM-generated synthetic data, including instruction pre-training data.

(4)Fine-tuning

Fine-tuning was conducted using 22 types of Japanese and English instruction-tuning datasets. The training datasets included both open-source licensed datasets and datasets newly developed by LLM-jp. The datasets developed by LLM-jp will be released in stages[.

(5)Evaluation

An LLM-as-a-Judge evaluation with GPT-5.4 was conducted using "llm-jp-judge," an evaluation framework developed by LLM-jp. In Japanese MT-Bench, which measures Japanese-language understanding capability, LLM-jp-4 8B scored 7.54 and LLM-jp-4 32B-A3B scored 7.82, surpassing GPT-4o (7.29), gpt-oss-20b (7.33), and Qwen3-8B (7.14). In MT-Bench, which measures English-language understanding capability, LLM-jp-4 8B scored 7.79 and LLM-jp-4 32B-A3B scored 7.86, matching or exceeding GPT-4o (7.69), gpt-oss-20b (7.85), and Qwen3-8B (7.69).
Evaluation using "llm-jp-eval v2.1.3," a framework developed by LLM-jp for evaluation based on 42 evaluation datasets derived from existing Japanese and English language resources, also confirmed that both LLM-jp-4 8B and LLM-jp-4 32B-A3B achieved Japanese-language performance comparable to gpt-oss-20b and Qwen3-8B.

release_20260403_en_fig1.png
Category-wise evaluation of representative LLMs using llm-jp-eval
URL for Released Models, Tools, and Corpora

https://llm-jp.nii.ac.jp/release

Future Plans

Based on the results of these models, we are developing larger-parameter LLMs: LLM-jp-4 32B and LLM-jp-4 332B-A31B. Furthermore, lightweight models will also be developed in parallel for ease of practical deployment. These models are scheduled for release during Japan's fiscal year 2026.

Reference 1: Overview of LLM-jp
  1. LLM-jp, organized by NII, consists of over 2,600 participants (as of March 31, 2026) from universities, companies, and research institutions, mainly focusing on researchers in natural language processing and computer systems. LLM-jp shares information on LLM research and development through hybrid meetings, online sessions, and Slack, while also conducting joint research on building LLMs. Specific activities include:
    • Promoting the development of open LLMs proficient in Japanese and related research.
    • Regular information exchange on model building expertise and latest research developments.
    • Fostering collaboration across institutions by sharing data and computing resources.
    • Publishing outcomes such as models, tools, and technical documentation.
  2. LLM-jp has established working groups including the
    • Corpus Construction WG,
    • Model Building WG,
    • Evaluation and Tuning WG,
    • Multimodal WG,
    • Real Environment Interaction WG,
    • Dialogue WG,
    • Academic Domain WG,
    • Safety WG,
    • and Principle Elucidation WG.
  3. Each group is led respectively by
    • Professor Daisuke Kawahara of Waseda University,
    • Professor Jun Suzuki of Tohoku University,
    • Professor Yusuke Miyao of the University of Tokyo,
    • Professor Naoaki Okazaki of the Institute of Science Tokyo,
    • Professor Tetsuya Ogata of Waseda University,
    • Professor Ryuichiro Higashinaka of Nagoya University,
    • Professor Akiko Aizawa of NII,
    • Project Professor Satoshi Sekine of NII,
    • and Associate Professor Yohei Oseki of the University of Tokyo.
    • Additional contributions come from many individuals, including: Professor Kenjiro Taura of the University of Tokyo and Professor Rio Yokota of the Institute of Science Tokyo, particularly in areas such as parallel computing methods.
  4. For more details, visit the official website: https://llm-jp.nii.ac.jp/
Reference 2

This work was supported by the "R&D Hub Aimed at Ensuring Transparency and Reliability of Generative AI Models" project of the Ministry of Education, Culture, Sports, Science and Technology.

Acknowledgements

This work was supported by the "Development Acceleration Use" program of ABCI 3.0, provided by AIST and AIST Solutions.

  • In the development of these models, under a joint research agreement on "Research and Development of Language Resources for Generative AI Models," the National Institute for Japanese Language and Linguistics provided the NINJAL Japanese Web Corpus (whole-NWJC). In addition, under the "Agreement on Mutual Cooperation between the National Diet Library and the National Center for Science Information Systems," the National Diet Library provided a list of website URLs collected through its Web Archiving Project (WARP).

Links

News Release: PDF

Release of New Japanese LLMs, "LLM-jp-4 8B" and "LLM-jp-4 32B-A3B", Trained on a High-Quality Corpus of Approximately 12 Trillion Tokens under an Open-Source License
Surpassing GPT-4o and Qwen3-8B on Several Standard Benchmarks


(*1) Number of Parameters: Large language models are neural networks trained on language data, and the number of parameters is one of the indicators of the network’s size. It is generally believed that more parameters lead to higher performance.
(*2) MoE (Mixture of Experts) model: A model that contains multiple experts within an LLM and enables efficient inference by dynamically selecting among them during inference.
(*3) corpus:A database that stores large amounts of natural language texts in a structural manner.
7427

SPECIAL