EVALUATING THE EFFECTIVENESS OF LARGE LANGUAGE MODELS (LLMS) VERSUS MACHINE LEARNING (ML) IN IDENTIFYING AND DETECTING PHISHING EMAIL ATTEMPTS
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
An-Najah National University
Abstract
Phishing emails remain a major concern and a growing cybersecurity threat during online communications and interactions. In spite of implementing advanced email security procedures and protocols, along with integrating specialized filters to detect malicious content, issues related to monitoring and false positives still exist, thus leading to incorrect detections. They usually bypass the regular filters as they become more complex. This study investigates phishing detection attempts through the examination of email textual content and embedded URLs, intending to compare the effectiveness of traditional machine learning (ML) models and transformer-based large language models (LLMs). Many different ML classifiers, including (Random Forest, Logistic Regression, Support Vector Machine, Naive Bayes, Gradient Boosting, Decision Tree, and K-Nearest Neighbors), were trained on both balanced and imbalanced datasets. Alongside several transformer-based LLMs, like (DistilBERT, ALBERT, BERT-Tiny, ELECTRA, MiniLM, and RoBERTa), were also applied to fine-tuning on the same task. To increase the realism of the evaluation in this study, phishing emails were created using ChatGPT-5.1.
The results indicated robust performance using both techniques. Random Forest achieved the best ML accuracy at 99.47% for email detection and 98.81% for URL prediction. DistilBERT and ALBERT performed similarly (98.44% for emails and 99.79% for URLs), while BERT-Tiny ranked last among the transformer models; However, it achieved an acceptable level of effectiveness. Among all the ML models explored, Gradient Boosting showed the lowest accuracy. ML performance has improved somewhat through dataset balancing, but the accuracy of LLMs has decreased slightly as a result, indicating their sensitivity to class distribution.
In general, ML models are maintained to be computationally efficient; however, the LLMs were improved at capturing complex linguistic patterns. The effectiveness of phishing detection may be raised through combining both strategies using a hybrid or ensemble approach.