DETEKSI PLAGIARISME PADA NOVEL BERBAHASA INGGRIS MENGGUNAKAN AUTHORSHIP ATTRIBUTION BERBASIS STYLOMETRY DAN SUPPORT VECTOR MACHINE (SVM)
Abstract
Plagiarisme pada novel berbahasa Inggris tidak hanya berupa penyalinan langsung, tetapi juga peniruan gaya penulisan (paraphrase plagiarism). Penelitian ini mengembangkan sistem deteksi berbasis authorship attribution dengan stylometry, Support Vector Machine (SVM), dan Sentence-BERT (SBERT). Data berupa 15 novel dari lima penulis klasik diproses melalui preprocessing dan chunking menjadi 1000, 5000, dan 10000 kata. Hasil pengujian menunjukkan akurasi SVM sebesar 84.38% (1000 kata), 82.50% (5000 kata), dan tertinggi 90.48% (10000 kata). Jane Austen konsisten mudah dikenali dengan f1-score 0.90, sementara Mary Shelley meningkat signifikan pada teks panjang (recall 1.00). Analisis SBERT menghasilkan skor kesamaan semantik 0.55–0.63, dengan nilai tertinggi juga pada Austen (0.63). Integrasi SVM dan SBERT terbukti saling melengkapi serta stylometry efektif mengenali gaya, sedangkan SBERT menangkap kesamaan makna. Dengan demikian, sistem mampu mendeteksi plagiarisme secara lebih akurat dan komprehensif.
Kata Kunci: Plagiarisme, Stylometry, Authorship Attribution, SVM, SBERT
Full Text:
PDFReferences
Adebayo, G. O., & Yampolskiy, R. V. (2022). Estimating Intelligence Quotient Using Stylometry and Machine Learning Techniques: A Review. Big Data Mining and Analytics, 5(3), 163–191. https://doi.org/10.26599/BDMA.2022.9020002
Assael, Y., Sommerschield, T., Shillingford, B., Bordbar, M., Pavlopoulos, J., Chatzipanagiotou, M., Androutsopoulos, I., Prag, J., & de Freitas, N. (2022). Restoring and attributing ancient texts using deep neural networks. Nature, 603(7900), 280–283. https://doi.org/10.1038/s41586-022-04448-z
Avci, C., Budak, M., Yagmur, N., & Balcik, F. B. (2023). Comparison between random forest and support vector machine algorithms for LULC classification. International Journal of Engineering and Geosciences, 8(1), 1–10. https://doi.org/10.26833/ijeg.987605
El-Rashidy, M. A., Mohamed, R. G., El-Fishawy, N. A., & Shouman, M. A. (2024). An effective text plagiarism detection system based on feature selection and SVM techniques. In Multimedia Tools and Applications (Vol. 83, Issue 1). Springer US. https://doi.org/10.1007/s11042-023-15703-4
He, X., Lashkari, A. H., Vombatkere, N., & Sharma, D. P. (2024). Authorship Attribution Methods, Challenges, and Future Research Directions: A Comprehensive Survey. Information (Switzerland), 15(3), 1–42. https://doi.org/10.3390/info15030131
Maurya, R. K., Saxena, M. R., & Akhil, N. (2016). Intelligent Systems Technologies and Applications. Advances in Intelligent Systems and Computing, 384(January), 247–257. https://doi.org/10.1007/978-3-319-23036-8
Rahma, S. L., & Taufiq, U. (2024). Analisis Tingkat Akurasi Metode Pendeteksian Plagiarisme Ide dengan menggunakan Yake dan Sentence Transformer. Journal of Internet and Software Engineering, 5(1), 15–22. https://doi.org/10.22146/jise.v5i1.9073
Santander-Cruz, Y., Salazar-Colores, S., Paredes-García, W. J., Guendulain-Arenas, H., & Tovar-Arriaga, S. (2022). Semantic Feature Extraction Using SBERT for Dementia Detection. Brain Sciences, 12(2). https://doi.org/10.3390/brainsci12020270
Sarwar, R., Perera, M., Teh, P. S., Nawaz, R., & Hassan, M. U. (2024). Crossing Linguistic Barriers: Authorship Attribution in Sinhala Texts. ACM Transactions on Asian and Low-Resource Language Information Processing, 23(5). https://doi.org/10.1145/3655620
Sharma, N., & Kumar, A. (2024). Deep Learning for Stylometry and Authorship Attribution: a Review of Literature. International Journal for Research in Applied Science and Engineering Technology, 12(9), 212–215. https://doi.org/10.22214/ijraset.2024.64168
Refbacks
- There are currently no refbacks.