EXPLORING THE CAPACITY AND PERFORMANCE OF SUPERVISED LEARNING METHODS FOR LABEL CLASSIFICATION IN CAUSAL INFERENCE: A COMPARATIVE STUDY
No Thumbnail Available
Date
2024-08-08
Authors
Abu Saqer, Ola
Journal Title
Journal ISSN
Volume Title
Publisher
An-Najah National University
Abstract
In fact, discussions about machine learning are increasingly prevalent due to its accuracy
in prediction and its ability to handle vast amounts of data. Furthermore, many relationships in life are causal, which motivates the efforts to comprehend the cause-and-effect
relationships among variables. For instance, understanding the extent of the effect of a
particular medicine on an individual with an illness becomes crucial. While it might seem
straightforward at first glance, a deeper examination tell the complexity inherent in such
endeavors when using machine learning in causality. Machine learning methods have
made a valuable contribution to the field of causal inference because unlike traditional
approaches, machine learning methods offer greater flexibility in estimating causal effects, since machine learning techniques do not require modelling hypotheses., yet there
is still a research in estimation causal effect when both treatment and outcome are binary
variables, because machine learning has proven its ability to predict, and prediction does
not mean causality. Perhaps this is the challenge for machine learning in obtaining more
accurate and less biased estimates of causal effects.
This study conducts a comparative analysis of supervised learning methods for label classification in causal inference. We evaluate the performance and capacity of four techniques: Causal Forest (CF), Support Vector Machine (SVM), Generalized Linear Models
(GLM), and Linear Probability Models (LPM) in estimating the causal effects for categorical response variable. In a randomized controlled trial simulation and real experiments were performed to evaluate the methods’ performance under varying conditions, by
xi
changing the main characteristics of the data including the sample size, and the number
of the explanatory variables.
We have focused on these four methods because of their specific advantages: Causal
Forests are particularly adept at making causal inferences easily; Support Vector Ma chines are recognised for their effectiveness in binary classification tasks; Generalised
Linear Models are well established as optimal for modelling the binary response vari able; and Linear Probability Models are used for their ability to provide predictions as
probabilities.
The results provide valuable insights into the strengths and limitations of each method in
each scenario in the causal effects simulation study. Furthermore, the methods are able
to detect heterogeneity in the real data results, and it was expected that SVM, GLM and
LPM would detect more heterogeneity than Causal Forest. This thesis helps us to improve
our knowledge of machine learning techniques in causal inference and emphasizes the
importance of carefully evaluating their performance in real-world applications