Skripsi
IDENTIFIKASI BAHASA PADA TEKS MENGGUNAKAN METODE LONG SHORT TERM MEMORY (LSTM)
Language is the main communication tool used by humans, with the diversity of languages that exist in the world reflecting the cultural diversity and identity of a language. In this context, language identification is important for the development of communication technology and information processing. This research focuses on language identification in text by utilizing Long Short Term Memory method and Word2vec as Word Embedding method to produce effective results from text. The main objective of this research is to develop a system that is able to recognize and classify language in text with high accuracy. The dataset used in this research consists of 10,000 text data, which includes 10 different language label classes with 1000 data each including Arabic, Chinese, Dutch, English, French, Indonesian, Japanese, Korean, Russian, Spanish. The total dataset is divided into 80% training data and 20% test data, to determine the hyperparameters used in the study by searching using the random search method. After the process, the best hyperparameter results were obtained for the LSTM model with a dropout configuration of 0.3, batch size 32, hidden unit 64, recurrent dropout 0.2 and epoch 15. Based on this research, by evaluating using the confusion matrix table, the average value of evaluation metrics such as precision 0.9859, recall 0.9855 and f1-score 0.9856 and getting an accuracy value of 0.9856.
Inventory Code | Barcode | Call Number | Location | Status |
---|---|---|---|---|
2407000123 | T137345 | T1373452023 | Central Library (Referens) | Available but not for loan - Not for Loan |
No other version available