Skripsi
KLASIFIKASI SPAM PADA EMAIL BERBAHASA INDONESIA MENGGUNAKAN FASTTEXT DAN BERNOULLI NAÏVE BAYES
Indonesia ranks sixth globally in terms of the number of spam senders. Numerous studies have been conducted on spam detection and filtering, with Bayesian algorithms being among the most commonly used approaches. This study aims to classify Indonesian-language email messages into spam and non-spam categories. A secondary dataset consisting of 2,604 messages was used, comprising 1,362 spam messages and 1,242 non-spam messages. Word representation was performed using FastText with an n-gram approach to capture sub-word level information, while classification was carried out using the Bernoulli Naïve Bayes algorithm based on binary values. The experiments compared the performance of the Bernoulli Naïve Bayes algorithm with and without the use of FastText. Evaluation was conducted using accuracy, confusion matrix, and classification report metrics, with a 70:30 data split. The results showed that both models, with and without FastText, achieved 95% accuracy. However, the model incorporating FastText demonstrated more balanced performance across classes and higher recall in detecting spam. In contrast, the model without FastText achieved perfect precision and recall for spam but showed decreased performance for non-spam. Therefore, the use of FastText contributes to improving the sensitivity and balance of spam email classification in the Indonesian language
Inventory Code | Barcode | Call Number | Location | Status |
---|---|---|---|---|
2507005406 | T182739 | T1827392025 | Central Library (Reference) | Available but not for loan - Not for Loan |
No other version available