Online ISSN:1349-8606
Progress in Informatics  
No.6 March 2009  
Page 15-25 PDF(480KB) | References
doi:10.2201/NiiPi.2009.6.3
Utilization of external knowledge for personal name disambiguation
Quang Minh VU1, Atsuhiro TAKASU2 and Jun ADACHI3
1,2,3National Institute of Informatics
(Received: September 1, 2008)
(Revised: December 1, 2008)
(Accepted: December 2, 2008)
Abstract:
The amount of information on the World Wide Web (WWW) is increasing at an explosive rate, and the role of computer systems in processing such a huge amount of data has become crucial. In this paper, we focus on the name disambiguation problem when searching for people, because information about people is an important part of the web and improvements to personal information may benefit many web citizens. The name ambiguity problem occurs frequently when searching for people, because a name may be shared by several people. In this research, we use external knowledge while solving this problem, so that we can analyze information in web documents more easily. We collect web directories and use the latent Dirichlet allocation method to extract latent topics from web directories. The extracted topics are used to modify the search result documents so that important contexts that help to discriminate people can be recognized more easily. We carried out experiments with real web documents and verified the advantages of our approach over other disambiguation approaches that use the vector space model and named entity recognition methods.
Keywords:
Personal name disambiguation, knowledge base, latent Dirichlet allocation, latent topic extraction, document similarity
PDF(480KB) | References

National Institute of Informatics is a member of CrossRef.
Go back HOME