EVALUATING THE EFFECTIVENESS OF LARGE LANGUAGE MODELS (LLMS) VERSUS MACHINE LEARNING (ML) IN IDENTIFYING AND DETECTING PHISHING EMAIL ATTEMPTS

Abbas, Linda

EVALUATING THE EFFECTIVENESS OF LARGE LANGUAGE MODELS (LLMS) VERSUS MACHINE LEARNING (ML) IN IDENTIFYING AND DETECTING PHISHING EMAIL ATTEMPTS

Date

2026-01-28

Authors

Abbas, Linda

Publisher

An-Najah National University

Abstract

Phishing emails remain a major concern and a growing cybersecurity threat during online communications and interactions. In spite of implementing advanced email security procedures and protocols, along with integrating specialized filters to detect malicious content, issues related to monitoring and false positives still exist, thus leading to incorrect detections. They usually bypass the regular filters as they become more complex. This study investigates phishing detection attempts through the examination of email textual content and embedded URLs, intending to compare the effectiveness of traditional machine learning (ML) models and transformer-based large language models (LLMs). Many different ML classifiers, including (Random Forest, Logistic Regression, Support Vector Machine, Naive Bayes, Gradient Boosting, Decision Tree, and K-Nearest Neighbors), were trained on both balanced and imbalanced datasets. Alongside several transformer-based LLMs, like (DistilBERT, ALBERT, BERT-Tiny, ELECTRA, MiniLM, and RoBERTa), were also applied to fine-tuning on the same task. To increase the realism of the evaluation in this study, phishing emails were created using ChatGPT-5.1. The results indicated robust performance using both techniques. Random Forest achieved the best ML accuracy at 99.47% for email detection and 98.81% for URL prediction. DistilBERT and ALBERT performed similarly (98.44% for emails and 99.79% for URLs), while BERT-Tiny ranked last among the transformer models; However, it achieved an acceptable level of effectiveness. Among all the ML models explored, Gradient Boosting showed the lowest accuracy. ML performance has improved somewhat through dataset balancing, but the accuracy of LLMs has decreased slightly as a result, indicating their sensitivity to class distribution. In general, ML models are maintained to be computationally efficient; however, the LLMs were improved at capturing complex linguistic patterns. The effectiveness of phishing detection may be raised through combining both strategies using a hybrid or ensemble approach.

URI

https://hdl.handle.net/20.500.11888/20893

Collections

Natural Sciences

Full item page

EVALUATING THE EFFECTIVENESS OF LARGE LANGUAGE MODELS (LLMS) VERSUS MACHINE LEARNING (ML) IN IDENTIFYING AND DETECTING PHISHING EMAIL ATTEMPTS

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By