An-Najah National University 

Electrical and Telecommunications Engineering 

Departments 

 
GRADUATION PROJECT 1 

 
"Your Whisper Key-Sound ID Unlock" 

 
Submitted in partial fulfillment of the requirements 

for Bachelor degree in Electrical and 

Telecommunications Engineering 

 
Supervisor: Dr. Falah Hasan. 

 
Students Name: 

Jana Nazem Hanini (12011091) 

Noor AlDeen Moneer Mahameed (12010248) 

Academic year: 2024/2025 


II 

 
❖ DEDICATION 

َرىٰ "  َ ي ُ ُه َسْوف  َ َّ َسْعي  ن 
لَّا َما َسَعٰى َوأ َ ِ أ ِ

َسان 
ن  َس ِلْلإ ِ

ْ ْ َلي  ن 
م    صدق  ألله   " َوأ َ ي 

العظ   يحرلا نمحرلا هللا مسب  

ب ِ إ   ي  حَي ِ
ِ إل

أ الَوَطن  ... لى ٰهد   

ُْهُد  أ إلج  َك ٰهد 
َ
، َول ُّ ِدى 

َ ن  َ َماُء ألا  ي ِ
ْ ُل، َوألان  وَّ ُّ ألا َ

َك إلحُب 
َ
ٍر َوَوَلاءٍ ل

ْ ج 
َ ُكلِّ ف  َك ن ِ

ْ
َدي  َ َ ن  ن  ْ ي  َ ُعُه ن 

َ ض 
َ ، ن  ى ٍ

ٍم َوَسْع
ُ ِعلْ َمَرة 

َ . ث   

لى   هلِ إ ِ أءِ   ألا ً َّ ِعر  ... ألا َ  

ه ٍ 
َ ظ 
ْ ح
ى  ُكلِّ لَ ِ

ُ ف  اِدق  َعاُء الصَّ ى، َوالدُّ
ولَ ُ ألا ُ َعاَمه  ْم الدِّ

ُ ي  ْ ن  َ َدأدُ  ،أ  ْصُل َوألاْمي ِ ُم ألا َ
ُ ي  ْ ا َن  َ ْم، ف 

ُ ك
ْ
َلن  ْم َوا ِ

ُ ك ْ ُ ِمن  از  َ ج 
ْ ن 
أ ألا ِ

َ . ٰهد   

ْزب ِ  ِ الدَّ
اق  َ اِء َوِزف  َ ْصِدف  لى ألا َ ... إ ِ  

ِد  َ ي 
ْ
، َوال ِ

ق  رِي 
َمه ِ الطَّ

ْ ى  َعي  ِ
وَز ف 

ُّ ْم الن 
ُ ي  ْ َر  ُكي 

َّ َعث  َ ن 
َ ْ ن  ن 

ا أ َ َ َ ِكْدن  ن  ا ِحي  َ ن   ن ِ
ْ ى  أ َْمَسَكب  ِ

ي 
ا، الَّ َ ْ َمَعن  ب 

ى  َكان َ ِ
ي 
ْم الَّ
ُ ك ُلوي ِ

ُ ًرأ ِلق 
كْ
ُ    .س 

لى   ة  إ ِ د  َسان ِ ... الِكَرأِم   ألا   

ْ َحَمُلوأ  ا ن ِ َمن  َ ْحَون 
َ َها ن  ْوأ ن ِ

َ ِم، َوَمس 
َ الِعلْ ْعَلة 

ُ ْوَم ُكِل س  َ ُه الن  ُ ي 
ُ ْكي 
َ ٍ ي  از 

َ ج 
ْ ن   أ ِ

ى  ُكلِّ ِ
َل ف  وَّ َ ألا َ ْم إلجَْرف 

ُ ي  ْ ُكي 
َ ف 
ٍل،  ْ ن  ُ ٍر َون  ْ . َصث   

ا  َ ن  َ ن ِ َمن  ْ أ  ا، َوَمن  َ ن  َّ َحي 
َ ْ أ  ِ َمن 

َصَماب 
َ ْحِمُل ب 

َ ْل ن  َ ا، ن 
َ ا َوْحَدن  َ ن  ا َْسَمان ِ ُم ن ِ

َ ي  َ ي 
ْ ح 
ُ ن 
أ الَعَمُل َلا  َ . ٰهد   

م ِمن   
ُ ك
َ
ِر،  ا ُكُل ل ِدي 

ْ ق  َّ ِ َوالن 
ان  َ ن  أِلاْمي ِ  

م  ِ
اظ  َ ى ن 

َ ي  َ ُ ح  ِدَسه  ْ َهي 
إلُم

ى  أ ِ
ي  ي  ْحي ِ  

ِدُس  ْ َهي 
إلُم

د   ر َمَجاِمي  ث  ِ ُمي ِ
ن  ي  وُز الدِّ

ُ ي   


III 

 
❖ DISCLAIMER 
 

This report was written by students at the Electrical and 

Telecommunications Engineering Department, Faculty of 

Engineering, An-Najah National University.  It has not been altered 

or corrected, other than editorial corrections, as a result of 

assessment and it may contain language as well as content errors. 

The views expressed in it together with any outcomes and 

recommendations are solely those of the students. An-Najah 

National University accepts no responsibility or liability for the 

consequences of this report being used for a purpose other than the 

purpose for which it was commissioned. 
 

IV 

 
❖ ACKNOWLEDGMENT 

 
- First and foremost, we thank God for the strength, patience, and 

success that have brought us to this point and enabled us to complete 

this project to the fullest. 

 
- We extend our deepest gratitude to our dear families, who have been 

a key supporter every step of the way. Without them, this 

achievement would not have been possible. 

 
- We express our gratitude to our supervisor, Dr. Falah Hassan, for 

his time, effort, and guidance. 

 
- We also thank all our esteemed teachers for answering our questions 

whenever we needed them. 

 
- Finally, to our dear friends, who have been our greatest support. 

Thank you for everything. 

 
V 

 
❖ TABLE OF CONTENTS 

 
CHAPTER 1: INTRODUCTION ................................................................................................ 4 

1.1   Statement of the problem. .................................................................................................... 4 

1.2   Objectives of the work. ......................................................................................................... 4 

1.3   Scope of the work. ................................................................................................................. 4 

1.4   Significance or importance of your work. .......................................................................... 5 

1.5   Organization of the report. .................................................................................................. 5 

  CHAPTER 2: LITERATURE REVIEW ................................................................................. 6 

  CHAPTER 3:Methodology. ....................................................................................................... 6 

3.1   intelligent control systems .................................................................................................... 7 

3.1.1   Definition of intelligent control systems: ......................................................................... 7 

3.1.2   Features of intelligent control systems:............................................................................ 7 

3.2   Biometric systems.................................................................................................................. 8 

3.2.1   Definition of Biometric systems: ....................................................................................... 8 

3.2.2   Types of Biometric Features: ............................................................................................ 8 

3.2.3   Comparison of Biometric Systems: Fingerprint, Face, and Voice Recognition. .......... 8 

3.2.4   Why Voice Recognition is the Best Option? .................................................................... 9 

3.2.5   Speech Recognition vs Speaker Recognition: ................................................................ 10 

3.2.6   Applications of Speech Recognition ............................................................................... 10 

3.2.7   Applications of Speaker Recognition: ............................................................................ 11 

3.2.8   Componentsof the Speaker Recognition System : ........................................................ 11 

3.3   Formants .............................................................................................................................. 12 

3.3.1   Definition of the Formants: ............................................................................................. 12 

3.3.2   Basic Definition: ............................................................................................................... 12 

3.3.3   Acoustic & Anatomical Basis .......................................................................................... 12 

 3.3.4   The number and positions of formants depend on: ..................................................... 12 

3.3.5   Key Formants (F1, F2, F3...) ........................................................................................... 13 

3.4   Mel-Frequency Cepstral Coefficients (MFCCs) .............................................................. 14 

3.4.1   Definition of MFCC: ........................................................................................................ 14 

3.4.2   Concept Summary: .......................................................................................................... 14 


VI 

 
3.4.3   MFCC Extraction Steps: ................................................................................................. 14 

3.4.4   MFCCs vs Formants ........................................................................................................ 15 

3.5   Speaker Recognition as a Biometric Identity System: Training and Testing Phases. .. 15 

3.6   Gaussian Mixture Model (GMM)...................................................................................... 17 

3.6.1   Definition of GMM: ......................................................................................................... 17 

3.6.2   Advantages of GMM: ...................................................................................................... 17 

3.6.3   Parameters ........................................................................................................................ 17 

3.6.4   The GMM work in voice Recognition ............................................................................ 20 

3.7    The pitch period ................................................................................................................. 21 

3.7.1   Definition of the pitch period: ......................................................................................... 21 

 3.7.2   Technical Definition: ...................................................................................................... 21 

 3.7.3   The Importance of Pitch in Voice Recognition: ........................................................... 21 

3.7.4   Approximate Pitch Period Values for Different Groups: ............................................ 22 

3.7.5   Explanation: ..................................................................................................................... 22 

 3.8   Noise Handling and Signal Processing Techniques ........................................................ 23 

3.8.1   Band-Pass Filtering .......................................................................................................... 23 

3.8.2   Hamming Windowing ...................................................................................................... 24 

3.8.3   Signal Normalization ....................................................................................................... 24 

3.8.4   Summary Table of Noise Handling Techniques ............................................................ 25 

Chapter 4: Results and Analysis ................................................................................................ 26 

4.1   The principle of code operation ......................................................................................... 26 

4.2   Code Result for an 11-year-old child ................................................................................ 26 

4.3   Code Result for Female ...................................................................................................... 29 

4.4   Code Result for Male .......................................................................................................... 31 

Chapter 5: Discussion ................................................................................................................. 34 

Chapter 6: Conclusions and Recommendation ........................................................................ 35 

6.1   Conclusions .......................................................................................................................... 35 

6.2   Recommendation................................................................................................................. 36 

6.3   Future works ....................................................................................................................... 36 

Swot Analysis ............................................................................................................................... 37 

References .................................................................................................................................... 38 

 
VII 

 
❖ LIST OF FIGURES 

Figure 1-3: Speaker Recognition (Tranining and Testing Phases). ..................16 
Figure 2-3: The GMM work in voice Recognition. .............................................20 

Figure 3-3: Speaker Recognition Process ……………………………………25 

Figure 3-4: The child's Original Audio in the three Attempts. .........................26 
Figure 4-4: The child's Normalized Audio in the three Attempts. ....................27 
Figure 5-4: The child's MFCC comparison across in the three Attempts. ......27 
Figure 6-4: The child's Pitch Period and Fundamental Frequency value in the 

three Attempts. .......................................................................................................28 
Figure 7-4: The child's Fomant Analysis Result in the three Attempts. ..........28 
Figure 8-4: Female Original Audio in the three Attempts. ................................29 
Figure 9-4: Female Normalized Audio in the three Attempts. ..........................29 
Figure 10-4: Female MFCC comparison across in the three Attempts. ...........30 
Figure 11-4: Female pitch Period and Fundamental Frequency value in the 

three Attempts. .......................................................................................................30 
Figure 12-4: Female Formant Analysis Results in the three Attempts. ...........31 
Figure 13-4: Male original Audio in the three Attempts. ..................................31 
Figure 14-4: Male Normalized Audio in the three Attempts. ............................32 
Figure 15-4: Male MFCC comparison across in the three Attempts................32 
Figure 16-4: Male pitch Period and Fundamental Frequency value in the 

three Attempts. .......................................................................................................33 
Figure 17-4: Male Formant Analysis Results in the three Attempts. ...............33 
 

VIII 

 
❖ LIST OF TABLES 

 
Table 1-3: Cmparison of Biometric System. ......................................................... 8 
Table 2-3: Speech Recognition vs Speaker Recognition. ...................................10 
Table 3-3: Key Formants. ......................................................................................13 
Table 4-3: MFCCs vs Formants. ..........................................................................15 
Table 5-3: Approximate Pitch Period Values for Different Groups. ................22 
Table 6-3: Summary of Noise Handiling Techniques. ........................................25 
 

IX 

 
❖ LIST OF ABBREVIATIONS 

 
MFCC     Mel Frequency Cepstral Coefficient 

CNN         Convolutional Neural Network 

RNN         Recursive Neural Networks 

SVM         Support Vector Machine 

IoT            Internet of Things 

PID           Proportional Integral Derivative 

3D            Three-Dimensional 

ASR         Automatic Speech Recognition 

NLP         Natural Language Processing 

FFT          Fast Fourier Transform 

DCT         Discrete Cosine Transform 

ID             Identification 

GMM       Gaussian Mixture Model 

EM           Expectation-Maximization 

MAP        Maximum A Posteriori 

RAP        Relative Average Perturbation  

APQ       Amplitude Perturbation Quotient 

NHR       Noise-to-Harmonics Ratio 

SPI         Soft Phonation Index 

SNR       Signal to Noise Ratio 

 
X 

 
❖ ABSTRACT 

This project follows an intelligent design for an access control 

system based on voice recognition of specific individuals, 

enhancing security and facilitating access by relying on voice prints 

instead of traditional passwords. The system analyzes the unique 

voice data of authorized individuals, which is then recorded in real 

time. The recording is then recorded, allowing the door to open 

immediately if a match is found. 

The system boasts stellar voice recognition accuracy, even in noisy 

environments. It relies on advanced algorithms that are remarkably 

capable of distinguishing the authorized person's voice from a range 

of different voices, making it suitable for use in real-world situations 

requiring a high level of security. 

The project reflects the integration of voice control technologies and 

embedded systems, highlighting the significant potential of voice 

recognition technologies in building smart solutions and controlling 

devices. 

 
4 

 
❖ CHAPTER 1: INTRODUCTION 

1.1   Statement of the problem. 

Traditional door security systems, such as physical keys, keypads, and access 

cards, are vulnerable to loss, theft, duplication, or unauthorized access. These 

methods also lack flexibility and may not provide adequate security in modern 

smart environments. Therefore, there is a need for a more intelligent and 

secure access control system that uses biometric features specifically, voice 

recognition to ensure that only authorized individuals can unlock and access 

secured areas. This project aims to address this issue by designing a voice-

controlled door lock system that offers improved security, convenience, and 

reliability. 

 
1.2   Objectives of the work. 

The main objective of this project is to develop a secure and user-friendly 

voice-controlled door lock system that can identify and authenticate specific 

authorized users based on their voice. The detailed objectives include: 

• Designing and implementing a voice recognition module capable of 

identifying pre-registered voices. 

• Integrating the voice module with a microcontroller to control the door 

lock mechanism. 

• Ensuring system reliability and accuracy under various environmental 

conditions. 

• Enhancing user convenience by enabling hands-free, keyless access. 

1.3   Scope of the work. 

The scope of this project includes the design and implementation of a voice-

controlled door locking system that recognizes specific, pre-recorded voice 

inputs. The system will be built using a microcontroller, a voice recognition 

module, and an electronic locking mechanism. The project will be limited to 

recognizing a limited number of authorized users, and the impact of 

environmental factors such as noise, echo, and polyphony will be tested within 

pre-defined limits. 


5 

 
1.4   Significance or importance of your work. 
 

The proposed voice lock system provides a modern, secure, and contactless 

alternative to traditional entry methods such as keys, cards, and passwords. 

The project's significance lies in its reliance on voice recognition, a unique 

biometric feature that is difficult to replicate, enhancing security. The project 

also aligns with the growing demand for smart and automated systems in 

homes and workplaces. Additionally, this system offers a practical and easy 

to use solution for people with disabilities who may encounter difficulty with 

traditional locks. By reducing physical interaction and improving access 

control, the system contributes to a safer and smarter living environment. 

 
1.5   Organization of the report. 

This report is organized into six main chapters to provide a clear and systematic 

presentation of the project.  

Chapter 1 introduces the project by outlining the problem statement, objectives, 

scope, significance, and the structure of the report. 

 Chapter 2 presents a comprehensive literature review, discussing previous work 

related to voice recognition, biometric systems, and the technologies used.  

Chapter 3 explains the methodology, including the system design, biometric 

techniques, feature extraction processes, and noise handling methods.  

Chapter 4 details the results and analysis, showcasing the system’s performance 

with different users and discussing the extracted features.  

Chapter 5 provides a discussion of the key findings, challenges faced, and the 

effectiveness of the proposed system.  

Chapter 6 concludes the report with a summary of the outcomes, recommendations 

for improvement, and suggestions for future work. Additionally, a SWOT analysis 

is included to evaluate the strengths, weaknesses, opportunities, and threats of the 

project. Finally, References are provided to acknowledge the sources used 

throughout the research. 


6 

 
❖ CHAPTER 2: LITERATURE REVIEW 
 

voice recognition technology has gained significant attention in recent years due 

to its growing applications in security, smart systems, and human-machine 

interaction. Researchers have continuously sought to improve the accuracy, 

reliability, and noise tolerance of voice-based systems, particularly for access 

control. Traditional security methods, such as physical keys, PIN codes, and 

fingerprint sensors, have shown limitations in terms of convenience, vulnerability 

to theft, and environmental constraints. As a result, biometric systems, especially 

those based on voice recognition, have emerged as a promising alternative due to 

their non-contact nature and user-friendliness. 

Previous studies have demonstrated that voice recognition can serve as a reliable 

biometric identifier when combined with appropriate signal processing 

techniques. For example, the use of Mel-Frequency Cepstral Coefficients 

(MFCCs) and Gaussian Mixture Models (GMMs) has been widely adopted to 

enhance the precision of speaker identification. However, early implementations 

struggled with noise interference and variations in voice due to environmental 

factors or user behavior. To address these issues, modern approaches integrate 

intelligent control systems and machine learning algorithms, such as 

Convolutional Neural Networks (CNNs) and Recurrent Neural Networks 

(RNNs), to improve system adaptability and performance. 

Moreover, the comparison between different biometric modalities such as 

fingerprint, facial, and voice recognition reveals that voice recognition offers 

unique advantages, particularly in accessibility for people with disabilities and 

its compatibility with hands-free smart environments. Researchers have also 

distinguished between speech recognition, which focuses on understanding 

spoken commands, and speaker recognition, which identifies the person 

speaking. The latter is the core focus of this project. 

This project builds upon these studies by designing a practical, real-time, and 

speaker-specific voice-controlled door lock system, incorporating advanced 

noise handling, signal processing, and speaker verification techniques to ensure 

secure and efficient access control in both personal and professional settings. 

 
7 

 
❖ CHAPTER 3: METHODOLOGY 

  
3.1   intelligent control systems 
 

3.1.1   Definition of intelligent control systems: 
 

An intelligent system uses artificial intelligence and intelligent learning 

control techniques without direct human intervention. Unlike traditional 

control systems (such as PID), which rely on mathematical models, 

intelligent control systems learn from data and adapt to the surrounding 

environment, even if they are subject to data uncertainty or inaccuracy. 

 
3.1.2   Features of intelligent control systems: 
 

✓ Adaptiveness: The ability to modify the system's behavior based on 

changes in the environment or inputs. 

✓ Flexibility: The ability to handle nonlinear and complex systems 

that are difficult to model mathematically. 

✓ Control under uncertainty: The ability to operate even in the 

presence of noise or incomplete data. 

✓ Self-learning: Improves performance over time by learning from 

past experiences. 

 
8 

 
3.2   Biometric systems 
 

3.2.1   Definition of Biometric systems: 

 are a common commercial or behavioral technology used to identify 

people or verify their original identity. This process can be used for 

periods such as access to protected areas, smart devices, and even 

system centers. 

3.2.2   Types of Biometric Features: 

✓ Physical: Fingerprint, Face, Iris, DNA. 

✓ Behavioral: Voiceprint, Signature, Gait, Keyboarding Style. 

3.2.3   Comparison of Biometric Systems: Fingerprint, Face, and 

Voice Recognition. 

Criterion 
Fingerprint 

Recognition 
Face Recognition 

Voice 

Recognition  

Accuracy Very high High, but affected by 

lighting and angle 
Good, improving 

with modern 

technologies 

Reliability 
Reliable, but affected 

by cuts or dirt 
Moderate, can be 

affected by facial 

changes 
Moderate to high 

with voice filtering  

Ease of Use 
Requires touching a 

sensor 
Requires camera 

alignment 
 No touch or 

alignment needed – 

just speak 

Privacy 
Can be faked using 

printed fingerprints 
Can be spoofed with 

3D images 
 More secure using 

unique voice 

frequencies 

Accessibility 
Limited for certain 

physical disabilities 
Limited for visually 

impaired 
 Very suitable for 

people with 

disabilities 
System 

Integration 

Needs dedicated 

sensor hardware 
Needs high-quality 

camera 
 Can use standard 

microphones 

Cost Medium Relatively high  Low – no special 

hardware required 
Environmental 

Adaptability 

Affected by sweat or 

dust 
Affected by lighting 

or shadows 
 Works in dark and 

varied conditions 
Table 1-3: Cmparison of Biometric System. 


9 

 
3.2.4   Why Voice Recognition is the Best Option? 

 
✓ No Specialized Hardware Required: Voice systems work 

with standard microphones in phones or computers. 

 
✓ User-Friendly: The user simply needs to speak – no physical 

contact or precise positioning needed. 

 
✓ Strong Privacy: Can use techniques like frequency matching 

or spoken passphrases to enhance security. 

 
✓ Highly Flexible: Easy to integrate into smart systems, such 

as smart locks, voice assistants, and more. 

 
✓ Ideal for Smart Environments: Perfect for hands-free 

access in smart homes, offices, and public systems. 

 
✓ Upgradable: Can be improved using machine learning and 

signal processing. 

 
10 

 
3.2.5   Speech Recognition vs Speaker Recognition: 

Feature Speech Recognition Speaker Recognition 

Also Called 
Automatic Speech 

Recognition (ASR) 
Voice Biometrics or Speaker 

Identification 
Purpose Understand what is being said Identify or verify who is 

speaking 
Focus Converts spoken words into 

text 
Matches voice patterns to a 

specific person 
Main Output Text (e.g., "Open the door") Identity (e.g., "This is Jana 

speaking") 
Technology 

Natural Language Processing 

(NLP), speech-to-text models 
Machine learning on vocal 

features (tone, pitch.) 
Sensitive to Language, accent, grammar Voice characteristics, 

emotion, background noise 
Used With 

Commands, dictation, 

transcription 

Authentication and security 

systems 

 
Table 2-3: Speech Recognition vs Speaker Recognition. 

 
3.2.6   Applications of Speech Recognition 

1. Virtual Assistants: Siri, Alexa, Google Assistant. 

2. Dictation & Transcription: Turning spoken lectures or medical notes into text. 

3. Voice Interfaces: Smart TVs, cars, or appliances responding to voice 

commands. 

4. Real-time Translation: Translating spoken language into another in real time. 

5. Voice-Controlled Apps: Hands-free texting or controlling apps while driving. 

 
11 

 
3.2.7   Applications of Speaker Recognition: 

1. Security & Authentication 

• Voice-based login to banking apps or secure systems. 

• Smart locks that open only to specific people’s voices. 

2. Forensics & Law Enforcement 

• Identifying speakers in recorded phone calls or surveillance. 

3. Personalized Services 

• In smart homes: the assistant recognizes who is speaking and adjusts 

preferences (music, lighting, temperature). 

• In cars: auto-sets driving profiles (seat, mirrors) based on the driver’s voice. 

4. Call Centers & Customer Support 

• Voice ID to verify customers instead of security questions. 

• Fraud prevention in financial services. 

 
3.2.8   Components of the speaker Recognition System: 

1. Front-end processing - the “signal processing” part, which converts the 

sampled speech signal into set of feature vectors, which characterize the properties 

of speech that can separate different speakers. Frontend processing is performed 

both in training and testing phases. 

2. Speaker modeling - this part performs a reduction of feature data by modeling 

the distributions of the feature vectors. 

3. Speaker database - the speaker models are stored here. 

4. Decision logic - makes the final decision about the identity of the speaker by 

comparing unknown feature vectors to all models in the database and selecting the 

best matching model. 


12 

 
3.3   Formants 

3.3.1   Definition of the Formants: 

Formants are the resonant frequencies of the human vocal tract. They appear as 

peaks in the speech spectrum and are critical in shaping the distinct sounds of 

vowels and some consonants. 

3.3.2   Basic Definition: 

A formant is a frequency band where acoustic energy is concentrated due to 

resonance in the vocal tract during speech production. 

 
 3.3.3   Acoustic & Anatomical Basis 

1. The lungs push air through the vocal folds, which may vibrate to produce a 

glottal sound. 

2. The sound then travels through the vocal tract (throat, mouth, nasal cavity). 

3. The shape of the vocal tract selectively amplifies certain frequencies—these 

are formants. 

 
3.3.4   The number and positions of formants depend on: 

• Vocal tract length and shape 

• Tongue and lip position 

• Jaw opening 

• Nasal cavity involvement 

 
13 

 
3.3.5   Key Formants (F1, F2, F3...) 

 
Formant 

Frequency Range 

(Typical Male 

Voice) 

Acoustic Meaning 

F1 300–900 Hz 
Inversely related to tongue 

height. Low F1 = high tongue. 

F2 850–2500 Hz 
Related to tongue frontness 

High F2 = front tongue. 

F3 1800–3000 Hz 
Influenced by lip rounding and 

tongue tip position. 

F4 3000+ Hz 
Less often used, but helpful in 

speaker identification. 

 
Table 3-3: Key Formants. 

 
14 

 
 3.4   Mel-Frequency Cepstral Coefficients (MFCCs) 
 

3.4.1   Definition of MFCC:  

The human speech contains numerous discriminative features that can be used to 

identify speakers. Speech contains significant energy from zero frequency up to 

around 5 kHz. The objective of automatic speaker recognition is to extract, 

characterize and recognize the information about speaker identity. The property of 

speech signal changes markedly as a function of time. To study the spectral 

properties of speech, signal the concept of time varying Fourier representation is 

used. However, the temporal properties of speech signal such, as energy, zero 

crossing, correlation are assumed constant over a short period. That is its 

characteristics are short-time stationary. Therefore, using hamming window, Speech 

signal is divided into a number of blocks of short duration so that normal Fourier 

transform can be used. 

 
3.4.2   Concept Summary: 

✓ Mel-frequency: Mimics how humans perceive pitch (nonlinear frequency 

scale). 

✓ Cepstral: Represents the rate of change in the spectral envelope (the shape 

of the spectrum). 

✓ Coefficients: The actual numbers that describe the shape of the spectrum. 

 
3.4.3   MFCC Extraction Steps: 

✓ Convert the audio signal to the frequency spectrum using FFT. 

✓ Map the frequency bands to the Mel scale (human auditory scale). 

✓ Apply the logarithm to compress large variations. 

✓ Use the Discrete Cosine Transform (DCT) to generate the final coefficients. 

 
15 

 
3.4.4   MFCCs vs Formants 
 

Feature MFCCs (Mel-Frequency 

Cepstral Coefficients) 

Formants (F1, F2, F3...) 

What They 

Represent 

Overall spectral envelope 

of the speech signal 

Resonant frequencies of the 

vocal tract 

Derived From Log Mel-spectrum → 

Discrete Cosine Transform 

(DCT) 

Peaks in the speech spectrum 

Biological Basis Indirect: reflects vocal tract 

via spectrum shape 

Direct: tied to anatomical 

vocal tract properties 

Interpretability Not directly interpretable Highly interpretable (linked 

to articulation) 

Used In ASR, speaker ID, emotion 

recognition 

Forensics, phonetics, speech 

pathology 

Robustness More robust to noise, 

distortion, and channel 

effects 

Sensitive to recording quality 

and noise 

Language 

Dependency 

Generally, language-

independent 

More influenced by 

phoneme/language 

characteristics 

Feature 

Dimensions 

Typically, 13–39 (with 

deltas and acceleration) 

Usually 3–5 (F1–F5) 

Extraction Tools Easy: Librosa, Kaldi, 

Python, MATLAB 

Requires spectral analysis, 

LPC, or Praat 

Application 

Speed 

Fast, low computational 

cost 

Slower, sometimes requires 

manual verification 

 
Table 4-3: MFCCs vs Formants. 

 
16 

 
3.5   Speaker Recognition as a Biometric Identity System: 

Training and Testing Phases. 

Anatomical structure of the vocal tract is unique for every person and hence the 

voice information available in the speech signal can be used to identify the 

speaker. Recognizing a person by her/his voice is known as speaker recognition. 

Since differences in the anatomical structure are an intrinsic property of the 

speaker, voice comes under the category of biometric identity. Using voice for 

identity has several advantages. One of the major advantages is remote person 

authentication. Like any other pattern recognition systems, speaker recognition 

systems also involve two phases namely, training and testing. Training is the 

process of familiarizing the system with the voice characteristics of the speakers 

registering. Testing is the actual recognition task. The block diagram of training 

phase is shown in Figure 1-3. Feature vectors representing the voice characteristics 

of the speaker are extracted from the training utterances and are used for building 

the reference models. During testing, similar feature vectors are extracted from the 

test utterance, and the degree of their match with the reference is obtained using 

some matching technique. The level of match is used to arrive at the decision. The 

block diagram of the testing phase is given in Figure 1-3. 

 
Figure 1-3: Speaker Recognition (Tranining and Testing Phases). 

 
17 

 
3.6   Gaussian Mixture Model (GMM)  

3.6.1   Definition of GMM: 

is a parametric probability density function represented as a weighted sum of 

Gaussian component densities. GMMs are commonly used as a parametric model of 

the probability distribution of continuous measurements or features in a biometric 

system, such as vocal-tract related spectral features in a speaker recognition system. 

GMM parameters are estimated from training data using the iterative Expectation-

Maximization (EM) algorithm or Maximum a Posteriori (MAP) estimation from a 

well-trained prior model. 

 
3.6.2   Advantages of GMM: 

✓ Simple and easy to implement. 

✓ Very effective for tasks that do not depend on text content (text-

independent). 

✓ Does not require massive data growth or high computational power. 

✓ Very suitable for embedded systems such as the Arduino or Raspberry Pi. 

 
3.6.3   Parameters: 

1- Jitter: is relative evaluation of the period-to-period (very short- term) 

variability of the pitch within the analyzed voice sample. 

 
Voice break areas are excluded. 

 
18 

 
2- RAP: (Relative Average Perturbation) is the relative evaluation of the 

period-to-period variability of the pitch within the analyzed voice sample 

with smoothing factor of 3 periods. 

Voice break areas are excluded. 

 
3- Shimmer: Shimmer Percent /%/ is relative evaluation of the period-to- 

period (very short term) variability of the peak-to-peak amplitude within the 

analyzed voice sample. 

 Voice break areas are excluded. 

 
19 

 
4- APQ: (Amplitude Perturbation Quotient) /%/ is relative evaluation of the 

period-to-period variability of the peak-to- peak amplitude within the 

analyzed voice sample at smoothing of 11 periods.  

Voice break areas are excluded. 

 
5- NHR: (Noise-to-Harmonic Ratio) is the average ratio of the inharmonic 

spectral energy the frequency range 1500-4500 Hz to the harmonic spectral 

energy in the frequency range 70- 4500 Hz. This is a general evaluation of 

noise present in the analyzed signal. 

 
6- SPI: (Soft Phonation Index) is the average ratio of the lower- frequency 

harmonic energy in the range 70-1600 Hz to the higher-frequency harmonic 

energy in the range 1600-4500 Hz. 

 
20 

 
3.6.4   The GMM work in voice Recognition: 

 
1. Training 

• Take voice samples from each person (e.g., MFCC). 

• Create a GMM model for them, i.e., a system that "learns" the characteristics 

of the voice in terms of frequency, pitch, and speed. 

 
2. Validation or Recognition (Testing) 

• When a person becomes a person, their voice requests are obtained (e.g., 

MFCC). 

• The probability (probability) of this decision is calculated for each existing 

GMM. 

• A person is accepted if their probability of belonging to the GMM is the 

highest (and exceeds a specified threshold). 

 
Figure 2-3: The GMM work in voice Recognition. 


21 

 
3.7   The pitch period 

3.7.1   Definition of the pitch period: 

Pitch period is the time interval between two consecutive repetitions of a sound 

wave produced by the vocal cords (often when pronouncing a vowel or tone), it 

determines the pitch of a speaker. 

 
3.7.2   Technical Definition: 

It is the time interval (in seconds or samples) between two consecutive peaks in a 

periodic audio signal produced by the vibration of the vocal cords. 

 
3.7.3   The Importance of Pitch in Voice Recognition: 

➢ Speaker recognition: 

Each person has a distinct pitch. Men typically have a fundamental 

frequency between 85–180 Hz, and women between 165–255 Hz. 

 
➢ Feature extraction: 

Pitch period is used in some biometric systems as an additional feature with 

MFCC or LPC. 

 
➢ Signal classification:  

It helps distinguish between: 

• Voiced signals: have a clear pitch period (such as vowels). 

• Unvoiced signals: do not have a regular pitch (such as f and s). 

 
22 

 
3.7.4   Approximate Pitch Period Values for Different Groups: 

 
Group 
Typical Fundamental 

Frequency (F₀) 

  Pitch Period 

 (T₀) = 1 / F₀ 

Males 85 – 180 Hz 5.6 – 11.7 milliseconds 

Females 165 – 255 Hz 3.9 – 6.1 milliseconds 

Children 250 – 400 Hz 2.5 – 4 milliseconds 

Elderly 100 – 200 Hz 5 – 10 milliseconds 

 
Table 5-3: Approximate Pitch Period Values for Different Groups. 

 
3.7.5   Explanation: 

1. Males: 

• Have longer and thicker vocal cords → vibrate slower → lower 

frequency → longer pitch period. 

2. Females: 

• Shorter and thinner vocal cords → vibrate faster → higher frequency 

→ shorter pitch period. 

3. Children: 

• Very small vocal folds → very high frequency → shortest pitch 

period. 

• Their voices tend to be higher-pitched. 

4. Elderly: 

• Changes in muscles and nerves affect vocal folds. 

• Frequency often decreases in males and slightly increases in females, 

reducing the gap between genders. 

 
23 

 
3.8   Noise Handling and Signal Processing Techniques 

In real-world environments, voice recognition systems are often exposed to various 

sources of signal distortion such as background noise (side conversations, device 

sounds, traffic, etc.), variations in voice intensity, and changes in microphone 

sensitivity. To minimize the impact of these factors and improve system 

performance, a set of noise handling and signal processing techniques were 

applied. 

 
3.8.1   Band-Pass Filtering 

3.8.1.1   What is Band-Pass Filtering? 

Band-pass filtering is a technique that allows frequencies within a specific range to 

pass through while attenuating frequencies outside this range. 

3.8.1.2   Why is it Used? 

✓ The critical frequency range for human speech typically lies between 300 Hz 

and 3400 Hz. 

✓ Frequencies below 300 Hz often represent noise such as vibrations or air 

conditioning sounds. 

✓ Frequenci above 3400 Hz are typically sharp noises or electrical 

interference. 

3.8.1.3   Benefits: 

✓ Isolates the relevant speech signals. 

✓ Improves the signal-to-noise ratio (SNR). 

✓ Makes the system more accurate and less sensitive to unwanted sounds. 

 
24 

 
3.8.2   Hamming Windowing 

3.8.2.1   What is Hamming Windowing? 

It is a mathematical function applied to small segments (frames) of the audio signal 

to reduce distortion when converting the signal to the frequency domain. 

3.8.2.2   Why is it Used? 

✓ The speech signal changes rapidly and cannot be accurately analyzed as a 

long continuous signal. 

✓ The signal is divided into short-time frames (20 to 30 milliseconds), which 

can be considered stationary. 

✓ Without windowing, abrupt changes at the frame edges cause "spectral 

leakage" during Fourier Transform analysis. 

3.8.2.3   Benefits: 

✓ Minimizes edge discontinuities between frames. 

✓ Provides more accurate spectral analysis for feature extraction. 

 
3.8.3   Signal Normalization 

3.8.3.1   What is Signal Normalization? 

Signal normalization is the process of adjusting the amplitude of the signal so that 

all samples have a consistent energy level. 

3.8.3.2   Why is it Used? 

✓ Speakers may talk closer or farther from the microphone. 

✓ Speakers may talk loudly or softly. 

3.8.3.3   Benefits: 

✓ Ensures fair comparison across all samples regardless of voice intensity. 

✓ Improves the performance of the recognition algorithm by focusing on the 

unique features of the voice rather than its volume. 


25 

 
3.8.4   Summary Table of Noise Handling Techniques 
 

Technique Purpose Benefit 

Band-Pass Filtering 
Isolate human speech 

frequencies. 
Reduces external noise. 

Hamming Windowing Prevent spectral leakage. 
Accurate frequency 

analysis. 

Signal Normalization 
Standardize signal 

amplitude. 

Fair comparison across 

samples. 

MFCC + GMM  
Extract and classify voice 

features. 

Improves recognition 

accuracy. 

 
Table 6-3: Summary of Noise Handiling Techniques. 

 
Figure 3-3: Speaker Recognition Process. 

Record

Noise 
Reduction

Feature 
Extraction

Training &

Testing

Modling

Decision


26 

 
❖ Chapter 4: Results and Analysis 
 

4.1   The principle of Code Operation  

The code's function is to implement what was explained previously in the 

report regarding the topic of feature extraction. The code records 3 attempts 

of the speaker, then purifies them as much as possible from noise by raising 

the sound Amplitude for high-value sounds, and reducing the Amplitude of 

the sounds with low values. Then it performs extraction operations for the 

mentioned values (MFCC, PP, FF), prints their values, and draws curves 

representing them using ready-made libraries.  

 
4.2   Code Result for an 11-year-old child 

 
Figure 3-4: The child's Original Audio in the three Attempts. 

 
27 

 
Figure 4-4: The child's Normalized Audio in the three Attempts. 

 
Figure 5-4: The child's MFCC comparison across in the three Attempts. 
 

28 

 
Figure 6-4: The child's Pitch Period and Fundamental Frequency value in the 

three Attempts. 
 

Figure 7-4: The child's Fomant Analysis Result in the three Attempts. 


29 

 
4.3   Code Result for Female 
 

Figure 8-4: Female Original Audio in the three Attempts. 

 
Figure 9-4: Female Normalized Audio in the three Attempts. 


30 

 
Figure 10-4: Female MFCC comparison across in the three Attempts. 
 

Figure 11-4: Female pitch Period and Fundamental Frequency value in the 

three Attempts. 


31 

 
Figure 12-4: Female Formant Analysis Results in the three Attempts. 

 
4.4   Code Result for Male 

 
Figure 13-4: Male Original Audio in the three Attempts. 

 
32 

 
Figure 14-4: Male Normalized Audio in the three Attempts. 

 
Figure 15-4: Male MFCC comparison across in the three Attempts. 


33 

 
Figure 16-4: Male pitch Period and Fundamental Frequency value in the three 

Attempts. 

 
Figure 17-4: Male Formant Analysis Results in the three Attempts. 

 
34 

 
❖ Chapter 5: Discussion 
 

The results obtained from the system’s practical implementation and testing provide 

clear evidence of the effectiveness of the proposed voice-controlled door lock 

system. By analyzing the performance across different user groups — a child, a 

female, and a male — the system successfully demonstrated its ability to extract 

reliable voice features such as Mel-Frequency Cepstral Coefficients (MFCC), Pitch 

Period (PP), and Formants, which are critical for accurate speaker identification. 

One key observation from the results is the consistency of the extracted features 

within the same speaker across multiple attempts. This confirms the system’s 

stability and accuracy in recognizing the correct user, even when slight variations in 

voice tone, speed, or distance from the microphone occur. The figures presented in 

Chapter 4 clearly show that the MFCC patterns, pitch periods, and format 

distributions are unique to each user and remain stable across their three attempts. 

The system also showed strong capability in noise handling. Techniques such as 

band-pass filtering effectively isolated the human voice frequency range (300 Hz to 

3400 Hz), while Hamming windowing minimized spectral leakage, and signal 

normalization reduced the influence of varying voice amplitudes. These methods 

collectively improved the system’s robustness, as the results remained valid even 

when the recordings included moderate background noise. 

Another important discussion point is the system’s ability to distinguish between the 

typical pitch periods and forms of different user categories. For instance, the pitch 

period for the child user was consistently shorter than that of the male and female 

users, which aligns with the theoretical pitch values explained in Chapter 3. This 

strengthens the reliability of the biometric identification process. 

However, despite these positive outcomes, the system faced some limitations in very 

high noise environments where competing sounds could partially affect the 

recognition process. Also, the current version of the system is limited in its capacity 

to store a large number of authorized users, which could become a scalability issue 

in more complex applications. 

In summary, the system proved to be a practical and reliable voice-controlled access 

solution, with satisfactory performance across different genders and age groups. It 

demonstrated robustness in handling noise and voice variations. 


35 

 
❖ Chapter 6: Conclusions and Recommendation 
 

6.1   Conclusions 

This project successfully achieved its main objective of designing and 

implementing a voice-controlled door lock system capable of identifying and 

authenticating specific authorized users based on their unique voice features. By 

utilizing advanced techniques such as Mel-Frequency Cepstral Coefficients 

(MFCC), pitch period analysis, and formant extraction, the system demonstrated 

a high level of accuracy in recognizing different speakers and distinguishing them 

from unauthorized individuals. 

The system's ability to process and adapt to various environmental conditions, 

supported by noise-handling methods like band-pass filtering, Hamming 

windowing, and signal normalization, further enhanced its reliability in real-

world scenarios. The results confirmed that voice recognition can serve as a 

practical and secure alternative to traditional security methods such as physical 

keys and password-based systems. 

Additionally, the project highlights the advantages of using voice as a biometric 

feature, especially in terms of convenience, accessibility, and low-cost 

implementation, as no special hardware is required. The system is particularly 

beneficial for people with disabilities and in smart environments where touchless 

access is essential. 

In conclusion, the project provides a solid foundation for the development of 

intelligent, voice-controlled access systems and proves the viability of integrating 

biometric voice recognition in everyday security applications. 

 
36 

 
6.2   Recommendation 

It is recommended to improve the system by focusing on increasing its resistance 

to various environmental challenges, particularly high levels of background noise 

and echo, by applying more sophisticated filtering and noise suppression 

algorithms. Additionally, expanding the voice database to support a larger 

number of users while maintaining quick response time would make the system 

more scalable and practical for real-life applications. The integration of real-time 

processing units is also encouraged to ensure faster decision-making and 

smoother system performance. To further enhance security, it is advisable to 

develop a multi-factor authentication system that combines voice recognition 

with other biometric techniques. Developing an intuitive user training module 

would also help users provide clearer and more consistent voice samples during 

registration. Finally, it is recommended to design a mobile application that offers 

users the ability to remotely control and monitor the system, providing greater 

convenience and flexibility. 

 
6.3   Future works 
 

In the future, this system can be further developed by platforms to enable remote 

control, real-time monitoring, and access management through mobile 

applications. Expanding the system to support multilingual voice commands and 

adaptive learning would also enhance its flexibility and usability in diverse 

environments. Moreover, future improvements could include the development of 

a more sophisticated noise cancellation module to allow the system to perform 

reliably in highly crowded or industrial areas. Additionally, integrating artificial 

intelligence and deep learning techniques, such as Convolutional Neural 

Networks (CNNs) and Recurrent Neural Networks (RNNs), could improve the 

system’s ability to differentiate between similar voices and handle more complex 

voice variations. Ultimately, future versions of this system may target 

commercial, residential, and institutional applications, offering an even more 

secure and intelligent access control solution. 

 
37 

 
❖ Swot Analysis 
 

Strengths

- A convenient and easy-to-use 
system for all segments.

- Uses low-cost, readily available 
components.

- Scalable and integrated with smart 
home systems.

Weaknesses

- Sensitive to ambient noise and its 
performance may be affected in 

crowded environments.

- Vulnerable to voice imitation attacks 
or the playback of audio recordings.

- Depends on the user's personal 
voice and may be affected by changes 

in the user's tone or voice tone.

Opportunities

- Integration with smartphone apps 
for remote control.

- Opportunity to integrate the system 
with other biometric systems for 

added security.

- Growing demand for smart security 
systems and contactless 

communication.

Threats

- User concerns regarding the privacy 
and security of biometric data.

- The risk of technical failures or 
power or internet outages affecting 

system continuity.

- Poor performance in different 
environments, such as high noise or 

long distances..

SWOT


38 

 
❖ References 
 

1. https://ieeexplore.ieee.org/abstract/document/8743482 

2. https://jontallen.ece.illinois.edu/uploads/537.F18/Book/main-

all.pdf 

3. https://citeseerx.ist.psu.edu/document?doi=b4e9c14c67b8aa431

a40041cce0a3564144e1a2a&repid=rep1&type=pdf 

4. https://www.ijirset.com/upload/2018/may/8_Speaker.pdf 

5. https://new.eurasip.org/Proceedings/Ext/SPECOM2006/papers/0

21.pdf 

6. https://web.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/016_

parallel%20processing%20pitch%20detector.pdf 

7. https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&d

oi=7965c68bedb3f7a7d566a1a1bb1d7a9e72c5a46a 

8. https://assta.org/proceedings/sst/2006/sst2006-84.pdf 

9. https://jjem.jnnce.ac.in/journals/SP-2/JJEMSP0231.pdf 

10. https://www.akademiabaru.com/submit/index.php/ard/arti

cle/view/6137 

 
https://ieeexplore.ieee.org/abstract/document/8743482
https://jontallen.ece.illinois.edu/uploads/537.F18/Book/main-all.pdf
https://jontallen.ece.illinois.edu/uploads/537.F18/Book/main-all.pdf
https://citeseerx.ist.psu.edu/document?doi=b4e9c14c67b8aa431a40041cce0a3564144e1a2a&repid=rep1&type=pdf
https://citeseerx.ist.psu.edu/document?doi=b4e9c14c67b8aa431a40041cce0a3564144e1a2a&repid=rep1&type=pdf
https://www.ijirset.com/upload/2018/may/8_Speaker.pdf
https://new.eurasip.org/Proceedings/Ext/SPECOM2006/papers/021.pdf
https://new.eurasip.org/Proceedings/Ext/SPECOM2006/papers/021.pdf
https://web.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/016_parallel%20processing%20pitch%20detector.pdf
https://web.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/016_parallel%20processing%20pitch%20detector.pdf
https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=7965c68bedb3f7a7d566a1a1bb1d7a9e72c5a46a
https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=7965c68bedb3f7a7d566a1a1bb1d7a9e72c5a46a
https://assta.org/proceedings/sst/2006/sst2006-84.pdf
https://jjem.jnnce.ac.in/journals/SP-2/JJEMSP0231.pdf
https://www.akademiabaru.com/submit/index.php/ard/article/view/6137
https://www.akademiabaru.com/submit/index.php/ard/article/view/6137