DETEKSI PLAGIARISME PADA NOVEL BERBAHASA INGGRIS MENGGUNAKAN AUTHORSHIP ATTRIBUTION BERBASIS STYLOMETRY DAN SUPPORT VECTOR MACHINE (SVM)

Mey Rini Rz., Badieah .

Abstract


Plagiarisme pada novel berbahasa Inggris tidak hanya berupa penyalinan langsung, tetapi juga peniruan gaya penulisan (paraphrase plagiarism). Penelitian ini mengembangkan sistem deteksi berbasis authorship attribution dengan stylometry, Support Vector Machine (SVM), dan Sentence-BERT (SBERT). Data berupa 15 novel dari lima penulis klasik diproses melalui preprocessing dan chunking menjadi 1000, 5000, dan 10000 kata. Hasil pengujian menunjukkan akurasi SVM sebesar 84.38% (1000 kata), 82.50% (5000 kata), dan tertinggi 90.48% (10000 kata). Jane Austen konsisten mudah dikenali dengan f1-score 0.90, sementara Mary Shelley meningkat signifikan pada teks panjang (recall 1.00). Analisis SBERT menghasilkan skor kesamaan semantik 0.55–0.63, dengan nilai tertinggi juga pada Austen (0.63). Integrasi SVM dan SBERT terbukti saling melengkapi serta stylometry efektif mengenali gaya, sedangkan SBERT menangkap kesamaan makna. Dengan demikian, sistem mampu mendeteksi plagiarisme secara lebih akurat dan komprehensif.

Kata Kunci: Plagiarisme, Stylometry, Authorship Attribution, SVM, SBERT


Full Text:

PDF

References


Adebayo, G. O., & Yampolskiy, R. V. (2022). Estimating Intelligence Quotient Using Stylometry and Machine Learning Techniques: A Review. Big Data Mining and Analytics, 5(3), 163–191. https://doi.org/10.26599/BDMA.2022.9020002

Assael, Y., Sommerschield, T., Shillingford, B., Bordbar, M., Pavlopoulos, J., Chatzipanagiotou, M., Androutsopoulos, I., Prag, J., & de Freitas, N. (2022). Restoring and attributing ancient texts using deep neural networks. Nature, 603(7900), 280–283. https://doi.org/10.1038/s41586-022-04448-z

Avci, C., Budak, M., Yagmur, N., & Balcik, F. B. (2023). Comparison between random forest and support vector machine algorithms for LULC classification. International Journal of Engineering and Geosciences, 8(1), 1–10. https://doi.org/10.26833/ijeg.987605

El-Rashidy, M. A., Mohamed, R. G., El-Fishawy, N. A., & Shouman, M. A. (2024). An effective text plagiarism detection system based on feature selection and SVM techniques. In Multimedia Tools and Applications (Vol. 83, Issue 1). Springer US. https://doi.org/10.1007/s11042-023-15703-4

He, X., Lashkari, A. H., Vombatkere, N., & Sharma, D. P. (2024). Authorship Attribution Methods, Challenges, and Future Research Directions: A Comprehensive Survey. Information (Switzerland), 15(3), 1–42. https://doi.org/10.3390/info15030131

Maurya, R. K., Saxena, M. R., & Akhil, N. (2016). Intelligent Systems Technologies and Applications. Advances in Intelligent Systems and Computing, 384(January), 247–257. https://doi.org/10.1007/978-3-319-23036-8

Rahma, S. L., & Taufiq, U. (2024). Analisis Tingkat Akurasi Metode Pendeteksian Plagiarisme Ide dengan menggunakan Yake dan Sentence Transformer. Journal of Internet and Software Engineering, 5(1), 15–22. https://doi.org/10.22146/jise.v5i1.9073

Santander-Cruz, Y., Salazar-Colores, S., Paredes-García, W. J., Guendulain-Arenas, H., & Tovar-Arriaga, S. (2022). Semantic Feature Extraction Using SBERT for Dementia Detection. Brain Sciences, 12(2). https://doi.org/10.3390/brainsci12020270

Sarwar, R., Perera, M., Teh, P. S., Nawaz, R., & Hassan, M. U. (2024). Crossing Linguistic Barriers: Authorship Attribution in Sinhala Texts. ACM Transactions on Asian and Low-Resource Language Information Processing, 23(5). https://doi.org/10.1145/3655620

Sharma, N., & Kumar, A. (2024). Deep Learning for Stylometry and Authorship Attribution: a Review of Literature. International Journal for Research in Applied Science and Engineering Technology, 12(9), 212–215. https://doi.org/10.22214/ijraset.2024.64168


Refbacks

  • There are currently no refbacks.