Skripsi
CLINICAL NAMED ENTITY RECOGNITION MODEL BERBASIS TRANSFORMER UNTUK DATA BIOMEDIS
Clinical Named Entity Recognition (CNER) is a critical task in natural language processing (NLP) aimed at extracting medical entities from complex biomedical texts. The main challenges in this task lie in the complexity of sentence structures and the highly variable medical terminology. This study focuses on the development and evaluation of CNER models based on the Transformer architecture, specifically BERT, to improve understanding and accuracy in recognizing medical entities from biomedical data. Two BERT-Base models were developed in this research: EMR-BERT and PubMed2M-BERT. EMR-BERT is a customized model with eight encoder layers trained directly through fine-tuning. In contrast, PubMed2M-BERT is a continuation pre-training of BERT-Base Uncased using the Masked Language Modeling (MLM) objective without Next Sentence Prediction (NSP) on the ViPubMed biomedical corpus. The pre-training results showed a perplexity score of 2.964 and a stable loss curve. During the fine-tuning phase, PubMed2M-BERT achieved the highest F1-score of 92% on the NCBI-disease dataset, outperforming EMR-BERT, which achieved 87%. These findings demonstrate that domain-specific pre-training can significantly enhance the performance of Transformer models in CNER tasks on biomedical data.
Inventory Code | Barcode | Call Number | Location | Status |
---|---|---|---|---|
2507003419 | T175807 | T1758072025 | Central Library (Reference) | Available but not for loan - Not for Loan |
No other version available