Skripsi
SISTEM TANYA JAWAB EKSTRAKTIF PADA TEKS BERBAHASA INDONESIA DENGAN FINE-TUNING INDOBERT
The abundance of digital information in today's era makes the extraction of relevant information a major challenge, especially in Indonesian, which has unique linguistic characteristics. As an effort to overcome this challenge, this study develops an extractive question-answering system for Indonesian text by fine-tuning the IndoBERT model, which enables the system to extract specific parts of a context paragraph as answers to given questions. The dataset used in this study is the Indonesian-translated version of the Stanford Question Answering Dataset (SQuAD) 2.0, which contains more than 100,000 question-answer pairs derived from Wikipedia articles. The fine-tuning process was carried out in eight scenarios, which are combinations of dataset type (the full dataset including unanswerable questions and a modified dataset with all unanswerable questions removed), learning rate (2e-5 and 5e-5), and batch size (16 and 48). The results of the study show that the model with a learning rate of 5e-5 and batch size of 16 delivers the best performance. On the dataset with unanswerable questions, the model achieved an exact match score of 60.57% and an f1-score of 70.84%. Meanwhile, on the dataset without unanswerable questions, the model achieved an exact match score of 54.79% and an f1-score of 73.06%.
Inventory Code | Barcode | Call Number | Location | Status |
---|---|---|---|---|
2507002880 | T173583 | T1735832025 | Central Library (REFERENCE) | Available but not for loan - Not for Loan |
No other version available