Seminar by Prof. Dr. Debora Weber-Wulff and Prof. Simone Teufel
August 5, 13:00-15:00
National Institute of Informatics
1901 Conference Room
- Plagiarism Detection and Documentation in Germany
- Prof. Dr. Debora Weber-Wulff (HTW Berlin)
- The topic of plagiarism and plagiarism detection has been very widely discussed in Germany since in 2011 the German minister of defense, Karl-Theodor zu Guttenberg, was forced to resign after extensive plagiarism was found in his doctoral dissertation.
There is a wide belief that there must be software available for detecting plagiarism easily, and there are many companies that sell such software. However, people are generally unaware of the large amount of false positives and false negatives that such software produces even when investigating simple text overlap.
Research on various other strategies of plagiarism detection is still highly experimental.
For a closed set of documents that are suspected to contain duplicates, there are some simple text mining methods that can be used to identify potential duplicates. The methods have been used on 50.000 dissertations in medicine and 900.000 open access journal articles with startling results. Even at universities and journals that use plagiarism detection software, there is widespread plagiarism, duplicate publication, and text recycling to be found.
- Proposition-based Summarisation -- a first implementation
- Prof. Simone Teufel (University of Cambridge)
- I will discuss joint work with my student Yimai Fang. I will present an implementation of the text-understanding-based summarisation idea by Kintsch and van Dijk (1978), which assumes that summarisation can be simulated by an incremental text-understanding process which operates within human memory limitations. The model is "deep", and current technology in NLP can of course not capture all that would be needed to fully instantiate the model. The model is nevertheless interesting because it allows us to pinpoint the NLP areas where improvement can demonstrably result in better summaries -- namely the areas of coreference resolution and WSD, as I will demonstrate. The model is also interesting because it beats current models based on lexical semantics, centroid-based sentence centrality, and also those based on the random walk model in networks. I will end the talk with a demo of the system, and a discussion of which genres lend themselves best to this type of research.
Prof. Akiko Aizawa
*Please replace [at] with @.