Latent Semantic Analysis Based Automatic Cross-Language Plagiarism Detector for Paragraph Written in Two Syntactically Distinct Languages

Ratna, Anak Agung Putri; Lomempow, Emily; Purnamasari, Prima Dewi; Yuwono, Untung; Adhi, Boma Anantasatya

Latent Semantic Analysis Based Automatic Cross-Language Plagiarism Detector for Paragraph Written in Two Syntactically Distinct Languages

James Alexander Gordon on 21st October 2015

Abstract

The number of scientific publication in Bahasa Indonesia is now in steady rise. As a speaker of under-resourced language, Indonesian author often consult documentation in other language, especially English. The necessity for an automated cross-language plagiarism checker has now become prominent. There are several methods available for an automated cross-language plagiarism detection but, most of them only works well on syntactically similar language. Unfortunately both Bahasa Indonesia and English come from a very different language family, therefore they have completely different syntax. This paper investigates the possibility of expanding the use of Latent Semantic Analysis (LSA) for an automated cross-language plagiarism checker between two syntactically distinct languages. LSA's bag of word concept is exploited, removing the necessity to use grammatically correct automatic translator. Several modifications to the LSA algorithm are also proposed to improve its performance. The proposed a proof of concept algorithm is capable to find similarities between a paragraph and its exact translation written in different languages. The exact translation of a paragraph can be identified with 81.82% up to 90.91% accuracy in all test cases.

Author Information
Anak Agung Putri Ratna, Universitas Indonesia, Indonesia
Emily Lomempow, Universitas Indonesia, Indonesia
Prima Dewi Purnamasari, Universitas Indonesia, Indonesia
Untung Yuwono, Universitas Indonesia, Indonesia
Boma Anantasatya Adhi, Universitas Indonesia, Indonesia

Category: Education and Technology: Teaching, Learning, Technology and Education Support

Post navigation

Previous: Previous post: A Five Year Follow-Up Study of Fellowship Baptist College Graduates Basis for Student Development Program
Next: Next post: A Business Model of Low Cost Carrier in Indonesia (The Influence of Perceived Advertising Spending and Price Deals on Brand)

Posted by James Alexander Gordon

All Posts

Latent Semantic Analysis Based Automatic Cross-Language Plagiarism Detector for Paragraph Written in Two Syntactically Distinct Languages

Abstract

Comments & Feedback

Comments

Powered by WP LinkPress

Posted by James Alexander Gordon