An-Najah National University  

Faculty of Graduate Studies 
 

Language Errors in Machine Translation 

of Scientific Biological Texts from English 

to Arabic: The Case of Google Translate 
 

By  

Hanan Jamal Alawneh 
 

Supervisor  

Dr. Abdel Karim Daragmeh 
 

This Thesis is Submitted in Partial Fulfillment of the 

Requirements for the Degree of ‎Master of Applied Linguistics 

and Translation, Faculty of Graduate Studies, An-

Najah ‎National University, Nablus, Palestine.‎ 

2019 


iii 
 

Dedication 

To the jewel of Palestine, the eternal capital, Jerusalem  

To my first teachers, my dearest mom and dad for their 

unconditional love and their insistence to make me use the right hand 

instead of the left one since I was four years, believing that the right one is 

more blessed. Their vision turned out to be true for I got a blessing from 

God to finish this thesis. 

To my lovely sisters, Rafeef, Raniem, Wa‟ed, and Haneen. 

To my one and only brother, Ra‟if.  

To my true friend Zainab who has been by my side since the 1
st
 

grade.  

To my relatives and friends who ever concern about my thesis with 

love and passion. 

To Google team who have always been a source of inspiration in 

their good and bad results. This thesis is to assist you in cutting the clouds 

of low quality results, so Google Translate works better. 

 
iv 
 

Acknowledgement 

I would like to express my gratitude to all my teachers who have 

nourished my intellectual life since kindergarten. 

Special thanks to my supervisor, Dr. Abdel Karim Daragmeh for his 

valuable comments from the first letter of this thesis till the last drop of ink.    

Many thanks are due to Dr. Mahmoud Eshreteh for his constructive 

comments concerning the linguistic part of the thesis. 

Very special thanks go to Dr. Hamed Abdelhaq for his feedback 

regarding the maps and computer science aspects included in this work. 

Last but not least, I shall never forget my dearest father and mother 

who have been a source of continuous support from the very beginning of 

this journey till the last breath.  

 
vi 
 

Table of Contents  

Subject Page 

Defense Committee Members ii 

Dedication iii 

Acknowledgments  iv 

Declaration  v 

Table of Contents vi 

List of Tables viii 

List of Figures ix 

List of Abbreviations xi 

Abstract xii 

Chapter One: Introduction 

1.1. Introduction 1 

1.2. Scientific Translation and Machine Translation 4 

1.3.Why the Biology Textbooks? 5 

1.4. Problem Statement 7 

1.5. Purpose of the Study  10 

1.6. Significance of the Study  11 

1.7. Limitations of the Study 11 

1.8. Research Questions 12 

1.9. Thesis Chapters 13 

Chapter Two: Literature Review and Methodology 

2.1. Related Literature to Machine Translation  15 

2.2. Methodology 24 

2.3. Theoretical Framework 27 

Chapter Three: Linguistic Discordance in Google Translation 

3.1. Introduction 29 

3.2. Errors at Syntactic Level 37 

       3.2.1. Organization of Constituents at Phrase Level 37 

       3.2.2. Organization of Constituents at Sentence Level  42 

3.2.3. Erroneous Shifts from Verbal to Nominal Sentences in      

Arabic  

49 

3.3. Errors at Morphological Level  51 

       3.3.1 Inappropriate Choice of Suffixes 51 

         3.3.1.1. Inflections Attached to Sub-headings 51 

         3.3.1.2. Inflections Attached to the Verb 53 

       3.3.2. Passive Constructions 53 


vii 
 

Subject Page 

3.3.2.1. Failure to Distinguish Between the Simple 

Past and Passive Inflections 

55 

                  3.3.2.2. Passive inflections  58 

         3.3.3. Unnecessary Derivation for Certain Words 60 

         3.3.4. Pronouns Translation 62 

            3.3.4.1.  Relative Pronouns Referent/s 62 

            3.3.4.2.  Pronouns Refer to Gender Neutral Nouns 64 

3.3.4.3. Phrasal Verbs Meanings along with their  

Gender Marked Subjects  

68 

3.4. Conclusion   71 

Chapter Four: Cohesive Markers at Paragraph Boundaries in 

Google Translation 

4.1. Introduction 75 

4.2. Areas of Eff./Def. of Lexical Markers in Description 

Paragraphs  

80 

4.2.1. Cohesion in Introducing Definitions  80 

4.2.2. Cohesion Achieved through Scientific Terms Chains 85 

4.3. Areas of Eff./Def. of Lexical Markers in Process Paragraphs  91 

4.3.1. Cohesion Achieved through Word-Choice 91 

4.3.2. Process-Sequence Verbs 96 

4.3.3. Process Signals within Non-past Constructions 99 

4.4. Areas of Eff./Def. of Lexical Markers in Causality 

Paragraphs  

103 

4.4.1. Word Collocations Deployed in Cause/Result Relations 103 

4.4.2. Causal Chains 108 

4.5. Conclusion 117 

Chapter Five: Conclusions and Recommendations 

5.1. Conclusions 120 

5.2. Recommendations 128 

References  129 

Appendixes 

Appendix (1): The selected texts and their translation as produced 

by Google Translate 

136 

 ب الممخصّ

 
viii 
 

List of Tables  

Table No. Subject Page 

Table (1) Errors made at noun phrases level 38 

Table (2) Errors in ordering the noun phrase 40 

Table (3) Errors made at sentence level 43 

Table (4) Errors of turning a sentence to a noun phrase  49 

Table (5) Errors in treating the definite article  52 

Table (6) Errors in the selection of parts of speech  53 

Table (7) Errors in recognizing the passive construction 56 

Table (8) Errors at passive construction arrangement level  57 

Table (9) Errors at passive inflections level  58 

Table (10) Redundancy due to unnecessary repetition 61 

Table (11) Errors in  assigning the correct referent 62 

Table (12) Errors in using gender markers 65 

Table (13) Errors made at pronoun level    67 

Table (14) Errors in SV agreement 69 

Table (15) Errors in relative clauses translation 85 

 
ix 
 

List of Figures 

Figure No. Subject Page 

Figure (1) The process of treating the selected data 27 

Figure (2) Processing of noun phrases 42 

Figure (3) Mapping of nominal sentences to verbal sentences 47 

Figure (4) Mapping of nominal sentences 48 

Figure (5) Mapping of sentences with verb to be  51 

Figure (6) Processing of words with similar forms in plural and 

present tense  

55 

Figure (7) Processing of passive constructions 60 

Figure (8) The process of matching the relative pronoun with 

its referent  

64 

Figure (9) The process of assigning gender to both the sub. and 

the verb 

66 

Figure (10) The process of gender matching between the 

pronoun and its antecedent  

68 

Figure (11) The process of composing the meaning of the 

phrasal verb "keep" 

70 

Figure (12) Parallel structures used in introducing definitions 85 

Figure (13) GT failure in reproducing adjective clauses  85 

Figure (14) The lexical chain used in paragraph 2a  88 

Figure (15) Deficiency at terminological level in paragraph 2a  89 

Figure (16) Suggested translation for paragraph 2a 89 

Figure (17) 

 
The process of identifying descriptive paragraphs 

based on the lexical markers 

91 

Figure (18) GT inexact translation for the process in question 95 

Figure (19) GT failure in identifying the verb as a sequence 

transition 

97 

Figure (20) Suggested translation for paragraph 3a 99 

Figure (21) Suggested translation for paragraph 4a 

 
102 


x 
 

Figure No. Subject Page 

Figure (22) The process of identifying process paragraphs based 

on the lexical markers 

103 

Figure (23) Errors in employing collocations 105 

Figure (24) 'Because' along with its complement in Arabic  106 

Figure (25) Errors in pronoun-referent resolution 107 

Figure (26) GT failure in deploying "wa" 107 

Figure (27) Suggested translation for paragraph 5a 108 

Figure (28) GT failure in reproducing if-structures  111 

Figure (29) Suggested translation for paragraph 6a 111 

Figure (30) The process of identifying causality paragraphs 

based on the lexical markers 

113 

Figure (31) Suggested translation for paragraph 7a 116 

 
xi 
 

List of Abbreviations 

AJ : Adjective 

AT : Article 

Aux. : Auxiliary 

AV : Adverb 

C                   : Complement 

C/R : Cause/Result 

CJ : Conjunction 

Def. : Definition 

Des. : Description 

DT : Determiner 

Eff./def. : Efficiency/deficiency 

FEMTI : Framework for the Evaluation of Machine Translation in the 

ISLE 

GT : Google Translate 

ISLE : International Standards for Language Engineering 

L.M. : Lexical Marker 

MTE : Machine Translation Evaluation 

NLP : Natural Language Processing 

NN : Noun 

Obj. : Object 

PN   : Pronoun 

Pro. : Process 

SL/TT : Source Language/ Target Language 

Sub. : Subject 

TO : Infinitive Marker to 

VB : Verb 'BE' 

VM : Modal verb 

VV : Verb 

    
xii 
 

                        Language Errors in Machine Translation 

of Scientific Biological Texts from English 

to Arabic: The Case of Google Translate 

By  

Hanan Jamal Alawneh 

Supervisor  

Dr. Abdel Karim Daragmeh 

Abstract 

Machine translation has planted its roots deeply in research domains 

since it becomes the first aid for survival in this era of "globalization". 

Thus, the present research explores the areas of efficiency/deficiency in 

Google Translate performance in scientific biological texts translation from 

English to Arabic. More specifically, the research aims to test GT 

performance at two levels: sentence and paragraph levels. Thus, Catford‟s 

translation shifts (1965), Halliday and Hassan's model of cohesive devices 

(1976) and types of paragraphs frequently used in scientific texts are the 

main tools used to judge GT output. Finally, the researcher attempts to 

propose solutions for the errors encountered to enhance GT performance in 

this particular text type to help GT produce translations with high accuracy 

rates. 

 
Chapter One 

Introduction 

1.1. Introduction 

The 21
st
 century can be best described as a competitive marketplace 

with two main competing forces. On the one hand, there are companies that 

work hard in order to put the best products in the hands of their consumers. 

On the other hand, there are clients who struggle to find an optimal product 

that both eases their life and saves them time and effort. This leads 

machines to become like shadows of human beings; if one wants to talk to 

someone who is far away from him/her, then s/he has to use a machine 

which is the cell phone in order to communicate with that person; or if one 

wants to move from one place to another, s/he has to use a car which is also 

a package of machines. Similarly, if a student, a mother, a father, a tourist, 

or a beginner translator, wants to learn to read a paragraph, to check the 

pronunciation, the spelling of certain words, or to translate a short excerpt, 

a word, a phrase or even a text of whatever kind from one language into 

another, s/he often uses a machine to perform such tasks. Thus, such trends 

reflect the fact that machine translation has become a necessity for living in 

the modern world.  

There is a plenty of choices among machine translation software that 

users often benefit from such as: Bing Translator, which was introduced by 

Microsoft in 2012 and, provides a multi-lingual translation service as well 

as Babylon which played an important role in machine translation from and 


2 
 

to Arabic through developing dictionaries that contain acronyms and 

abbreviations. In addition, there is voice translation software which 

provides customers with voice to text or text to voice translations by 

turning a certain message into a unit of translation then producing written 

or oral translations for it according to the customers' needs. A clear 

example of voice translation software is Google Translate (GT). This 

software provides its users with voice translations. Thus, all they need to do 

is to click on the button "speak" and a written translation of their speech 

will appear on the screen. Moreover, GT provides translations among 103 

languages with over than 200 million users daily (Wikipedia, 2018). 

Therefore, GT has become the most fashionable, trendy and easily 

accessible machine nowadays for translation tasks.  

However, this software can sometimes be misleading since it is a 

machine that depends heavily on word-recognition and pattern-matching 

between the components of the input and the likely equivalence for that 

input in its translation memory. This framework of translation action was 

explained by GT team (2012) who stated that:  

When Google Translate generates a translation, it looks for 

patterns in hundreds of millions of documents to help decide 

on the best translation. By detecting patterns in documents 

that have already been translated by human translators, 

Google Translate can make intelligent guesses as to what an 

appropriate translation should be. This process of seeking 


3 
 

patterns in large amounts of text is called “statistical machine 

translation" (Retrieved from:   https://urlzs.com/xtgSo). 

In other words, the truth that GT is both fast and economical cannot 

be denied; however, when it comes to its accuracy, the translation product 

can be inaccurate, incomprehensible and often misleading. It is quite 

evident that GT comprehension ability is still inferior to human translation. 

This deficiency is due to the fact that GT has to deal with many languages 

with different linguistic systems. Thus, Aiken and Balan (2011) stated that: 

"Although Google Translate provides translations among a large number of 

languages, the accuracies vary greatly... translations between European 

languages are usually good, while those involving Asian languages are 

often relatively poor" (Retrieved from: https://urlzs.com/aPm1L). 

This proves that GT is still in its initial stage and thus the door is still 

open to improve it; the evaluation of its performance is deemed a vital 

stage in improving the performance of the translation software. Therefore, 

this thesis aims at evaluating GT performance and pinpointing the 

problems that it may encounter while translating texts from English into 

Arabic, particularly scientific biological texts taken from Biology 1 

textbook which is taught at the Faculty of Science at An-Najah National 

University for 1
st
 year students; the machine translation users in this case 

are 18-19 years old. The texts contain chapters on The Chemical Context of 

Life, Water and the Fitness of the Environment, The Structure and Function 

of Large Biological Molecules, An Introduction to Metabolism, Cellular 


4 
 

Respiration, The Cell Cycle, Mendel and the Gene Idea and From Gene to 

Protein. Finally, the thesis attempts to recommend solutions for the errors 

encountered in the translation action to enable software developers to 

enhance their translation program. Such solutions help to reach high 

accuracy levels when translating such type of texts since they are 

considered an example of a controlled area covered under the umbrella of 

scientific genre.  

1.2. Scientific Translation and Machine Translation 

A scientific text is considered one of the writing modes embedded 

within the general term called 'scientific genre' for Hatim and Munday 

define the word 'genre' as "a conventionalized form of speaking or writing 

which we associate with particular 'communicative events'. Participants in 

these events tend to have set goals, with strict norms regulating what can or 

cannot be said within the confines of given genre settings" (2004, p.88). 

That is to say, the scientific genre is a well-established mode because it 

employs a set of agreed upon standards and textual norms that regulate the 

use of both language and message-building within the texts that conform to 

such genre. In other words, the scientific genre is tied with a language that 

is characterized by "impersonal style, simpler syntax, use of acronyms, and 

clarity (Ilyas, 1989, p.109). Accordingly, when it comes to translation, 

scientific translation is considered to have an informative function. Byrne 

(2006) stated: "scientific translation primary goal is to deliver scientific 

information; it aims at presenting well expressed information, that may be 


5 
 

used easily, properly and effectively" (as cited in Soualmia, 2009, p.21). 

Moreover, Soualmia stated: "Scientific translation is defined as the method 

employed to help organize thought, procedures and then come into clear, 

faithful and reliable results, free of subjectivity and personal involvements" 

(2009, p.19).  

However, translators may face a difficulty in translating scientific 

terms and constructions since Zinaser states that: "Every profession has its 

growing arsenal of jargon to fire at the lay man and hurls him back from its 

walls" (1976, p.15). Thus, translators may resort to different procedures 

while translating texts such as: transliteration, borrowing, or providing 

footnotes. Thereby, with regard to GT, the situation may be much more 

challenging for the machine in question may not enjoy enough level of 

recognition to decide upon which procedure to use. Thus, this may result in 

low quality translations. This echoes Parikh‟s words who stated: "machine 

translation rarely reaches accuracy levels above 70%, while a human 

translation almost always produces accuracy levels above 95%" (2012. 

Retrieved from: https://urlzs.com/STgBk). Thus, the present research aims 

to test the extent GT adheres to the norms associated with the scientific 

language while translating excerpts taken from scientific biological 

textbooks.  

1.3. Why the Biology Textbooks?  

The reasons behind choosing the biology textbooks for students in 

their 1
st
 year in biology specialization is: First of all, scientific texts are 


6 
 

very challenging; they are "a good example of the most challenging text 

type … these texts often present information that is conceptually rich but 

also conceptually dense and abstract. They use terminology that is 

unfamiliar to many students … using language in ways that students do not 

encounter in their reading of fictional and narrative texts" (Palincsar, 2013. 

Retrieved from: https://urlzs.com/ZBv4f).                                                      

 Secondly, scientific biological texts deal with terminologies and 

processes related to everyday life activities like sleeping, eating, etc unlike 

the other branches of science like chemistry and physics which are 

basically about numbers and statics that are close to the sign language such 

as the mathematical calculations (+ / - / * ); those later texts contain 

minimum text and therefore they can be easily understood by looking at the 

symbols. In comparison, the biology text is basically about concepts, 

descriptions, and terms. Thus, it can be hypothesized that GT can work 

better with biology.  

Finally, students at this stage (1
st
 year in specialization) will take 

compulsory courses that usually contain introductory and basic concepts 

about biology in English. These courses will serve as a repertoire for them 

later on in their specialization. This makes it vital for students to make sure 

that they understand the ideas and get the accurate equivalent. Therefore, 

1
st
 year students who are majoring in biology may choose GT to translate 

certain texts and terms from English to Arabic to understand the 

information in their textbooks. In other words, students need to cope with 


7 
 

the language of science which, in turn, uses the English language to express 

the new experiments/studies in biology; this tendency to use machine 

translation supports the report issued by GT in 2016 which states that more 

than 500 million people use GT around the world and that the Arabic 

language is one of the most widely used languages in the application with 

more than 100 billion words a day. 

1.4. Problem Statement 

This research will be concerned with the mistranslations performed 

by GT while translating scientific biological texts from English to Arabic. 

When it concerns machine translation, issues related to features and 

functions may create a challenge for the machine in question, i.e., GT. 

Thus, this study will explore problems including those named by 

Wisniewski, Kubler and Yvon (2014) such as: "lexical errors, 

morphological errors, syntax errors, semantic errors, format errors…  " 

(Retrieved from: https://urlzs.com/CAQbk).  

These types of errors may affect the quality of scientific texts since 

these texts may contain different types of paragraphs including: description 

paragraphs which aim at describing concepts/objects, process paragraphs 

that mark the sequence of certain biological processes or causality 

paragraphs which explain the cause/result of particular phenomena. Thus, 

in all cases, scientific texts must meet four standards: Syntax, Morphology, 

Terminology and Cohesion/Naturalness.  


8 
 

First of all, 'Syntax'. In such kind of texts, syntactic structures must 

guide the machine to one and only one meaning. This means that ambiguity 

which may threaten the precision of the text is not welcomed in scientific 

translation so there must be no room for structural ambiguity in the 

resulting translations made by GT.  

The second standard is 'Morphology'. It examines certain morphemes 

attached to certain words to help the software to get the exact meaning such 

as connectors, negation, tense and number. However, GT may not benefit 

from these morphemes to process and understand the stated facts directly 

since it depends on its intelligent guesses to connect the parts of the text 

together.  

The third standard is 'Terminology' which refers to domain specific 

terms used heavily in scientific texts. Such terms create a challenge for 

both GT and students to understand because these technical terms "have 

one or many meanings in everyday language" while having a different, 

peculiar and precise meaning in scientific texts (Ali and Ismail, 2006. 

Retrieved from: https://urlzs.com/FnT8d).   

The fourth standard is 'Cohesion/Naturalness' of the resulting 

translation. In other words, scientific texts vary in the cohesive devices they 

employ to connect the ideas in a coherent way since they may be 

descriptive, persuasive, or informative. All these functions aim at putting 

the information in the hand of the readers without being redundant or 

consuming much time/effort to get the intended meaning. However, in the 


9 
 

case of GT software, these standards may be demanding. This echoes Al-

Asali‟s words who points out that "the real problem with today‟s MT 

systems … is that they do not achieve the appropriate interpretation of 

certain parts of the source text (ST), which may depend, in one way or 

another, on the appropriate comprehension of the devices controlling them" 

(2000:xix). 

Therefore, the study will explore problems related to text type; the 

unique nature of scientific texts leads to machine translation problems 

when used by students. Thus, the research focus will be on: Firstly, the 

mistranslations made by GT in areas of describing particular biological 

processes or biological terms at sentence level including phrasal 

constituents; e.g. the mistranslation of "inheritance law" in the sentence 

“Mendel used the scientific approach to identify two laws of inheritance”  

into  "قبٌَٕ انًٛشاس/االسس" which is a phrase that is used to refer to the process 

of genes movement from parents to their offspring and it has nothing to do 

with the concept that refers to the possessions‟ of a dead person. Secondly, 

the mistranslations of scientific texts at paragraph level according to the 

format of such type of texts from English into Arabic such as translating 

the English text: "water is an excellent solvent for many substances 

because of its polar nature. Polar substances and ions dissolve in water 

because opposite charges are attracted to the appropriate ends of water. 

Strictly hydrophobic molecules, including most lipids, do not mix well with 

water." into Arabic as: 


10 
 

. رزٔة انًٕاد ٔاألَٕٚبد انقطجٛخ فٙ "انًبء ْٕ يزٚت يًزبص نهكضٛش يٍ انًٕاد ثغجت طجٛؼزّ انقطجٛخ 

نهًبء  انغضٚئبد انكبسْخ . ال رخزهظبء ألٌ انشعٕو انًؼبكغخ رُغزة انٗ َٓبٚبد انًٛبِ انًُبعجخانً

 ثشذح ، ثًب فٙ رنك يؼظى انذٌْٕ ، ثشكم عٛذ يغ انًبء."    

The Arabic paragraph contains errors such as the underlined short 

sentence that contains the passive construction "are attracted". The correct 

translation of the English structure is "األقطبة انًزؼبكغخ رُغزة انٗ َٓبٚبد انًٛبِ  

 In other words, these "opposite charges" do not move by . انًُبعجخ"

themselves; instead they are moved by an external force. This meaning is 

not expressed correctly in the Arabic text because the verb  is active  "رُغزة"

not passive. There are also mistranslations of certain terms, despite the 

context makes their meanings clear such as: "opposite charges" which is 

translated as انًؼبكغخ" "انشعٕو  instead of  and ("األقطبة انًزؼبكغخ )عبنجخ ٔيٕعجخ" 

"hydrophobic molecules" which means "َبفشح نهًبء" not "كبسْخ"   since the 

latter is not a scientific term. 

1.5. Purpose of the Study     

The present research aims at examining the translation problems GT 

encounters when translating scientific biological texts found in Biology 1 

textbook. It has been observed that GT makes errors in areas such as 

syntax, morphology, terminology, among others, (Hannouna, 2004, p.450). 

Thus, the research attempts to detect these errors at sentence and paragraph 

levels; the researcher will highlight the areas of eff./def. in GT performance 

to provide useful input about the quality of translation. Then, the researcher 

will propose solutions for the errors to enhance GT performance. Such 


11 
 

outcomes can be useful for the translation software developers and users 

alike since Ulitkin (2011) stated that: "despite their efficiency and outlooks, 

the translation software and electronic means cannot replace the human 

translator and guarantee high-quality translations". He believes that a good 

translation is a result of the combination between the translator‟s talents 

and experience on the one hand and the electronic technologies on the other 

hand; therefore, users cannot only depend on the use of machines in 

translation (Retrieved from: https://urlzs.com/3US1f).  

1.6. Significance of the Study 

This research is of great importance for it deals with the most widely 

used machine translation system, GT. Thus, it aims at identifying the 

challenges GT encounters in scientific biological texts translation; it 

highlights the areas (syntax, morphology, terminology, cohesion) which are 

best treated by GT and the ones produced in low quality. Thus, it ends at 

suggesting recommendations to enhance GT performance concerning the 

level at which the users/software developers can best use/improve GT in 

this particular text type translation.  

1.7. Limitations of the Study 

The present research is limited to GT language errors found in 

scientific biological texts translated form English to Arabic only. Yet, it 

does not tackle language errors committed by other machine translation 

programs. In addition, the present research will be concerned with testing 


12 
 

GT performance at both sentence and paragraph levels since evaluating GT 

at text level is beyond the scope of the present research. The researcher 

observed that GT commits errors at both sentence and paragraph levels; 

thus, it is hypothesized that GT may not perform well at text level since 

sentence and paragraph levels serve as the basic building blocks of any 

text. In other words, communicative texts cannot function without strong 

blocks. Thus, it would be better to evaluate GT performance at smaller 

levels at first to pave the road for GT evaluation at text level. Moreover, 

some figures and drawings might be inserted within the text to clarify the 

information being presented. Thus, users might resort to input GT with 

only separated/short paragraphs in lieu of longer texts to avoid such visual 

representations. Finally, the research focus will be on the external 

characteristics of GT, in particular, eff./def. areas in GT performance 

regardless of its internal characteristics which include speed, storage, or 

cost.   

1.8. Research Questions 

In attempting to evaluate GT performance and investigate the 

translation problems encountered in scientific biological texts translation, it 

is important to answer these questions:  

1.  What are the grammatical errors made by GT when translating 

biological texts at sentence level?  


13 
 

2.  Which cohesive markers are mishandled/mistreated by GT and 

which of these are reproduced correctly when translating biological 

texts at paragraph level?  

3.  What are the possible explanations for making ill-formed 

translations/inadequate system performance? 

4.  What are the main recommendations for improving machine 

translation in this particular text type?  

1.9. Thesis Chapters: 

The present thesis contains five chapters; the sequence is 

summarized below. 

Chapter One is devoted to introductory information that describes 

the state of technology in the 21
st
 century in general and machine 

translation in particular. The chapter also includes: the problem statement, 

the purpose of the research, the significance of the research, the limitations 

of the research, the research questions. Finally, the chapters of the thesis.  

Chapter Two includes literature review about machine translation 

and its development. In addition, the methodology, data collection, and the 

framework in which the data will be treated along with in the present 

research.  

Chapter Three will present the data through analyzing and 

comparing the source text/input with Google translation/output based on 


14 
 

Catford‟s translation shifts (1965). Thus, the researcher will identify the 

errors made by GT at sentence level then recommend solutions for them.  

Chapter Four will discuss the errors made at paragraph level by GT 

based on Halliday and Hassan's model of cohesive markers (1976) which 

includes both grammatical and lexical devices, beside the types of 

paragraphs frequently used in scientific texts. Finally, the chapter will 

present suggested solutions for the encountered challenges.  

Chapter Five presents the conclusions; it is expected that the thesis 

presents conclusions regarding the quality of GT output, the reasons of its 

failure, and the effects of mismatches between the source text and GT 

output. The research also attempts to suggest recommendations for further 

research that could help in enhancing GT performance in scientific 

biological texts translation. 

  
15 
 

Chapter Two 

Literature Review and Methodology 

2.1. Related Literature to Machine Translation 

Machine translation life-cycle is best likened to a baby who starts 

taking his/her first steps through leaning on one couch and another to 

follow his/her parents footprints. Yet, once that child balances his/her body 

and masters the walking skill, his/her parents can hardly catch and control 

his/her movement. Similarly, machine translation took its early steps after 

World War ΙΙ drawing on two main factors: First, the invention of the first 

computer in the 1950s and the desire to benefit from this invention in 

specific domains. Second, the rising tensions between the two main forces 

at that time: the United States and the Soviet Union (Russia now) 

manifested in the Cold War. Accordingly, the American government 

developed the first version of machine translation to break on the Russian 

communications and decode their military plans. Thus, machine translation 

early days were of one function, that is military (Errens, 2019. Retrieved 

from: https://urlzs.com/95Af3). 

However, this mono-function of machine translation started to fade 

with the need of global communication. In other words, machine 

translation started to impose itself on civilian domains "because of 

globalization, the rising of international trade, the expansion of mass media 

and technology, the increase of migration, and the recognition of linguistic 


16 
 

minorities" (Al-Khawalda, Al-Oliemat, 2014. Retrieved from: 

https://urlzs.com/XKfcN). In other words, machine translation shifts from 

being restricted to military interests to serve a number of civilian functions 

including translation of texts between different languages. Accordingly, the 

new discipline of "computational linguistics" came into light. This new 

discipline is defined as:  

a subfield of linguistics and computer science that is 

concerned with computer processing of human language. It 

includes automatic machine translation (MT) of one language 

into another, the analysis of written texts and spoken 

discourse, the use of language for communication between 

people and computers, computer modeling of linguistic 

theories, and the role of human language in artificial 

intelligence (AI) (Hannouna, 2004, p.53).  

Consequently, researchers tend to reflect on this new branch of 

linguistics through forming linguistic models about machine translation 

including: the way the machine works, error-tracking or eff./def. 

identification, accuracy levels of particular text types, etc. Such studies are 

carried out to see whether the machine could replace human translators, aid 

them, assist translation theorists who seek to test hypothesis using a 

particular translation program or software developers who want to promote 

their translation programs and impress the end users to trust their product 

(Hatim and Munday, 2004, p.120). 


17 
 

Thus, machine translation life is divided into two main stages: The 

first generation and the second generation. The former generation, usually 

referred to as the direct approach, refers to the early days of machine 

translation where the machine was fed with only a limited number of 

linguistic rules of each language and a bi-lingual dictionary. Thus, this 

indicates that machine translation was merely word-for-word replacement 

at first. In other words, the translation action is done directly between the 

languages in question provided that the machine has both the necessary 

rules and vocabulary.  

However, this generation received criticism since translation is not 

just word for word substitution. Yet, it is an art of crafting texts. This 

echoes Somers and Hutchins‟s words who stated: "From a linguistic point 

of view, what is missing is any analysis of the internal structure of the 

source text, particularly the grammatical relationships between the 

principal parts of the sentences" (1992. Retrieved from: 

https://urlzs.com/KpTMD). Moreover, the first generation input was 

limited to small levels only including: words, phrases and sentences which 

users cannot edit their output translation. In other words, this direct 

approach derives its name from the fact that it does not allow the users to 

interact with the machine for the translation action is done only through 

literal translation between the source text and the target text/output. This 

echoes Craciunescu, Gerding-Salas, Stringer-O'Keeffe‟s words who state: 

"The first versions of machine translation programs were based on detailed 


18 
 

bilingual dictionaries that offered a number of equivalent words in the 

target language for each word listed in the source language, as well as a 

series of rules on words order" (2004. Retrieved from: 

https://urlzs.com/GDTS5). Moreover, Somers and Hutchins maintain that 

this approach results in "frequent mistranslations at the lexical level and 

largely inappropriate syntax structures" (1992. Retrieved from: 

https://urlzs.com/KpTMD). Such errors may affect the meaning and take 

the source text away from its intended meaning.  

Accordingly, the criticism thrown at the first generation of machine 

translation led to the evolution of the second generation. In other words, 

machine translation has developed and it started to view the translation 

action as a process done along three dimensions: First, the machine 

decodes the meaning of the ST. Second, it re-encodes this meaning in the 

target language. In other words, decoding the meaning of the ST in its 

entirety requires that the machine interprets and analyzes all the elements 

of the text and transfers them into the target language. Thus, "this process 

requires in-depth knowledge of the grammar, semantics, syntax etc of the 

source language and the same in-depth knowledge is required for re-

encoding the meaning in the target language" (Dubey, 2013, p.18). 

Third, this indirect approach started to allow the end user to interact 

with the machine. In other words, the direct relationship which holds 

between the input and the output in the first generation is broken by the 

interaction of the end user who started to take place in the second 


19 
 

generation since the machine starts to "ask the user to supplement its 

linguistic information, requesting confirmation of its decisions, or selection 

from among alternatives" (Somers and Hutchins, 1992. Retrieved from: 

https://urlzs.com/KpTMD). Errens asserts that "to meet that demand and 

clean up its data, Google Translate has an improvement function that led 

users enter suggestions for smoother translations" (2019. Retrieved from: 

https://urlzs.com/95Af3).  

 Such mutual procedures between the machine and the end user 

enhanced the machine performance, in particular, in areas where the tested 

text type/genre is limited to a set of norms. This echoes Austermuhal‟s 

words who states: "the simple but effective system depends on careful pre-

editing and the adoption of very controlled lexis and syntactic structures" 

(2001:163-4 as cited in Hatim and Munday, 2004, p.117). Thus, success 

stories started to flourish including the well-known story of the Canadian 

METEO system which was accomplished at the University of Montreal; it 

translates the weather bulletins automatically from English to French and 

vice versa for the Metrological Service of Canada. In other words, weather 

forecasts have specific norms that the machine could easily recognize 

including: single words and fixed expressions such as: sunny, low 7, wind 

southwest 10km/h. Moreover, Fromkin and Rodman state: "the greater 

recognition of the role of syntax and the application of linguistic principles 

over the past forty years have made it possible to use computers to translate 

simple texts grammatically and accurately between well-studied languages" 


20 
 

(1995, p.473). This proves that the machine could produce high accuracy 

rates in cases where the domain is specific enough.  

Moreover, this generation yields a number of fruitful concepts 

including: statistical machine translation and neural machine translation. 

Pestove states that the former "is based on the idea that if you feed a 

computer … enough data in the shape of parallel texts in two languages, it 

will be able to spot and recreate the statistical patterns between them.  

While the latter means "the source text is the set of specific features. 

Basically, it means that you encode it, and let the other neural network 

decode it back to the text, but, in another language". It is a new discipline 

and it is limited to nine languages only. However, neural machine 

translations "are helpless when the word is not in their lexicon" (2018. 

Retrieved from: https://urlzs.com/enQrN).  

Consequently, Google team launched their Google Translator 

Toolkit in 2009. This Toolkit is considered as a platform where translators 

upload texts and submit them for translation. Thus, Google resorts to use 

bilingual “parallel corpora”. This corpora consists of a pair of texts, where 

one text is a translation of the other. This interaction between GT and 

human translators led Google team to develop their “phrasebook” where 

users can save their translations. Thus, they started to enjoy the freedom to 

access their favorite translations of certain phrases and texts. This 

framework of GT is explained as: 


21 
 

To translate a text, Google Translate search different 

documentaries to find the best appropriate translation pattern 

between translated texts by human. This pattern searching is 

called SMT. Consequently, the quality of Google Translate 

depends on the number of human translated texts searched by 

Google Translate … SMT uses a bilingual text corpora which 

is a database of the sentences in both source language and 

target language. A large group of sentences translated from 

for example English to Persian will be provided for the 

machine to calculate the probability of the words. If for 

instance a word like X has probability 75% to be translated 

into Y, then it will choose Y as the translation of X (Karami, 

2014).  

However, such new concepts do not indicate that the machine would 

replace human translators since a lot of research has been done on machine 

translation including GT. Yet, most of the attempts were sentence level 

focused and of randomly selected domains. For example, Key mentions 

types of errors committed by machine translation including: "words with 

multiple meanings, sentences with multiple grammatical structures, 

uncertainty about what a pronoun refers to, and other problems of 

grammar" (1980/2003 as cited in Hatim and Munday, 2004, p.116). In 

addition, Al-Khawalda and Al-Oliemat (2014) tested GT in translating 

twelve sentences with different temporal references from English to 

Arabic. They conclude that GT is confusing for non native English 


22 
 

speakers when it comes to temporal signals (Retrieved form: 

https://urlzs.com/XKfcN).   

Moreover, Al Shehab (2013) tested GT in translating six legal 

sentences from English to Arabic. Thus, he noted that GT could achieve 

partial equivalent yet it commits errors in translating the archaic English 

terms, the passive voice and the modal "shall". Such researches end with no 

suggested solutions for the errors being identified (Retrieved from: 

http://www.eajournals.org). In other words, there are errors that are still 

committed but systematic research on a specific genre may yield fruitful 

results which may enhance GT performance through developing lexicons 

containing only technical terms and constructions for the domain in 

question to reduce problems related to word-choice. In additions, post 

editing processes could be reduced through minimizing keyboard press 

rates since Craciunescu et al. (2004) state: "when translation tasks are 

repeated, … keyboard use can be reduced by as much as 70% with some 

texts" (Retrieved from: https://urlzs.com/GDTS5). 

In connection with machine translation at levels larger than the 

single sentence, only a handful number of researches attempted to test GT 

competency in translating long stretches of language between English and 

Arabic. ElShiekh (2012) examined GT performance at text level. He 

selected three genres/disciplines which are: advertisement, Koranic and 

literary texts. Yet, ads contain single words and phrases for they aim to be 

short, persuasive and eye catching instead of longer paragraphs or texts. In 


23 
 

addition, literary and koranic texts are loaded with emotive words that may 

pose a difficulty for GT (Retrieved from: http://dx.doi.org/10.5539 

/ells.v2n1p56). Abdulhaq believes that "machine translation can handle the 

parole part of language but it can never master the langue part" (2016, p.8).  

ElShiekh classified the errors made by GT at sentence level such as: 

transliteration and mismatches of polysemous words; however, he 

neglected paragraph and text levels. In other words, he did not explore the 

idea of cohesive markers at paragraph and text levels even though his study 

promised to focus on text-level (2012. Retrieved from: http://dx.doi.org/ 

10.5539/ells.v2n1p56). Al-Samawi notes the shortage of studies at 

paragraph or text levels: "most of the previous studies that tried to use error 

analysis in machine translation research were at the level of the single word 

or phrase. Like a rare bird, research on errors of machine translation at the 

text level may not be easy to find, especially in Arabic English" (2014. 

Retrieved from: https://urlzs.com/74KMz).   

However, Hatim and Munday state: "at present, there is a limited 

possibility of concordancing the search results or of configuring the search 

to select the specific text types or genres that are of interests" (2004, 

p.120). Accordingly, this research will take a step forward in highlighting 

the norms frequently used in scientific biological texts as a branch of 

scientific translation to test GT performance in this text type then identify 

the areas of eff./def. in its output for the features associated with scientific 

texts may do the mission promising. In other words, Craciunescu et al. 


24 
 

(2004) state: "machine translation is most useful with texts possessing the 

following characteristics: First, "Terminological homogeneity" which 

means that the meaning of terms does not vary. Second, "Phraseological 

homogeneity". It means that the ideas or actions are expressed or described 

with the same words. Third, short, simple sentences: these increase the 

probability of repetition and reduce ambiguity" (Retrieved from: 

https://urlzs.com/GDTS5). 

Moreover, Errens states: "It follows that for now, MT delivers best 

results with scientific and technical writing, anything that adheres more 

strictly to formulas. Wherever the use of language deviates from standard, 

where it is more colloquial or artistic, MT falters" (2019. Retrieved from:  

https://urlzs.com/95Af3). Accordingly, such requirements are available in 

scientific biological texts. Thus, this will make it easy for the researcher to 

test the areas of def. in GT performance which prevents it from reaching 

high accuracy levels then suggest solutions for those defects to enable GT 

to reach high accuracy rates as much as possible.  

2.2. Methodology 

This research will adopt the qualitative approach in analyzing the 

selected data; so to collect relevant data, the researcher uses three 

successive steps. The first step is the translation of the biological texts 

using GT. The researcher decides to use ten texts which are taken randomly 

from the biology textbooks, particularly the Biology 1 textbook that is 

currently used at the Faculty of Science for 1
st
 year students. The researcher 


25 
 

selects ten texts to reach fair conclusions regarding GT performance and 

emphasis the fact that the errors committed are not just a coincidence; 

instead, they serve as indicators that there are serious defects in GT 

translation program. The texts are technical and they cover topics like the 

micro/organisms, particularly The Chemical Context of Life (p.1-4), Water 

and the Fitness of the environment (p.5-8), The Structure and Function of 

Large Biological Molecules (p.16-31), An Introduction to Metabolism 

(p.59-67), Cellular Respiration (p.68-78),  The Cell Cycle (p.91-99), 

Mendel and the Gene Idea (p.108-117), From Gene to Protein (p.132-140). 

These topics contain various biological terms, descriptions and processes 

expressed in different syntactic structures such as: active and passive 

constructions, present tense forms, if- structures, etc.  

The second step is the examination of the resulting texts; this 

includes the classification of errors at two levels:  

Sentence Level: The research aims to start the evaluation with smaller 

units such as sentences, then moves gradually to longer stretches of 

language such as paragraphs. Thus, chapter three traces the recurrent errors 

made by GT at sentence level in the translation of the selected texts through 

analyzing them and categorizing the errors in order to pinpoint the semantic 

shifts that may result and alter the meaning of the scientific text. In other 

words, the research focuses on both the errors and the extent to which those 

errors affect or hinder the level of comprehension in each single sentence.  


26 
 

Paragraph Level: The research, in chapter four, aims at testing GT 

competency in deploying cohesive devices in English to Arabic scientific 

texts translation at paragraph level. The researcher identifies the paragraph 

types frequently used in scientific text including: definition paragraphs, 

process paragraphs and causality paragraphs. Thus, different types of 

paragraphs taken from the ten texts are inputted into GT to be converted to 

Arabic to carry out the evaluation. The paragraphs express biological 

information related to inheritance, water, cell division, enzymes and gene 

expression. 

 The researcher examines these levels according to the features of 

scientific texts as a 'normative genre'. This genre includes universal 

features of scientific texts such as: technicality which refers to domain 

specific terms and structural clarity at the sentence/text level which 

includes issues such as: active and passive constructions and pronoun 

reference. The researcher also considers the presence of all functional/ 

morphological items such as connectors that show time, cause, etc in the 

selected texts.  

In the final step, based on GT performance at the above mentioned 

levels, explanations are given for each type of errors along with suggested 

solutions to enhance GT performance in this particular text type. Figure (1) 

gives the areas where GT errors may occur in the translation action.  


27 
 

Figure (1): The process of treating the selected data. 

2.3. Theoretical Framework 

The present research relies mainly on the model of Machine 

Translation Evaluation (MTE) based on the International Standards for 

Language Engineering (ISLE's framework of taxonomy 3). The model 

distinguishes between two types of evaluation; The 'glass box' evaluation 

which considers GT a „glass box‟, so the evaluator looks inside the 

translation engine to see how the translation process is done. While the 

second type of evaluation is concerned with the relationship between the 

input and output. In this type of evaluation, GT is treated as a 'black-box', 

which means that the evaluator has to look at the input and output without 

taking into account the mechanisms by which the GT engine works 

(FEMTI, 2003 as cited in Hannouna, 2004, p.115).  

Thus, it is the 'black box' evaluation that will be adapted in this 

research since it helps in identifying areas of errors that may occur in the 

GT performance and which are deemed to be the main objectives of the 

present evaluation. In other words, the 'black box' evaluation focuses on the 


28 
 

external quality characteristics of GT. These characteristics of the outcome 

can be traced by comparing between the input and the output without the 

need to explore the system functions since the aim of the research is to 

identify the errors that may result in the translation. However; when the 

time comes to giving recommendations for the software developers on how 

to improve it, the research will shed light on the 'glass box' evaluation.  

Consequently, the research will draw on Catford‟s translation shifts 

(1965) including both „level shift‟ and „category shift‟ to highlight the 

grammatical errors that may result at sentence level. Thus, the researcher 

will compare the source text and GT output to identify the types of shifts 

that may take place. Yet, when the research shifts to paragraph level, it will 

examine four types of paragraphs frequently used in scientific texts, 

descriptive, process, causality and mixed paragraphs, according to Halliday 

and Hassan's model of cohesive markers (1976) employed to make the text 

cohesive and coherent. This model includes both grammatical and lexical 

cohesive devices. Thus, the researcher will trace the cohesive markers 

deployed in the source text to test whether GT reproduces them correctly in 

the output based on that model or not. 

 
29 
 

Chapter Three 

Linguistic Discordance in Google Translation  

3.1. Introduction 

Linguistics can be best described as a musical instrument that is 

capable of playing different melodies according to a group of tunes set 

together to form a musical scale that is usually put in front of musicians to 

follow in big concerts. Similarly, the linguistic system, in any language, 

does the same function of that musical instrument since linguistics plays a 

major role in producing stretches of language that sound harmonious and 

meaningful to both the ears and minds of all language users. In other 

words, linguistics forms a scientific model that helps language users to 

form and communicate coherent and innermost thoughts since it controls 

the way people use their language to express their human experience. 

However, this experience may be different from one group of language 

users to another because "language … gives structure to experience, and 

helps to determine our way of looking at things, so it requires some 

intellectual effort to see them in any other way than that which our 

language suggests to us" (Halliday, 1970, p.143). 

Therefore, the effort and time spent in understanding the differences 

between the linguistic systems of all languages worldwide is affected by 

the components of the linguistic system of the language/s in question. The 

linguistic system of any language could be seen as an umbrella that covers 


30 
 

different areas, one of which is named 'grammar'. This area consists of two 

main parts which are: syntax and morphology. The former part deals with 

the arrangement of words in different language structures while the latter 

deals with the structure or build up of individual words. Notwithstanding, 

the division of this umbrella into two parts does not mean that they are 

different or unrelated to one another; instead, both syntax and morphology 

are much more interrelated that being contradicted because the term 

'grammar' has been used to refer to the two concepts of syntax and 

morphology by many researchers including Baker who maintains that 

"grammar is organized along two main dimensions: morphology and 

syntax. Morphology covers the structure of words", while "Syntax covers 

the grammatical structure of groups, clauses, and sentences" (1992, p.83). 

However, language users do not use the grammatical categories that 

exist in their languages in the same way since these grammatical elements 

are not identical to all languages. In other words, each language differs 

from the other in the way it expresses the same message, for each has its 

own grammatical patterns that it imposes upon its users. Therefore, in the 

translation process, the variety among grammatical categories between 

source and target language poses a difficulty for human translators 

"because one cannot always match the content of a message in language A 

by an expression with exactly the same content in language B, because 

what can be expressed and what must be expressed is a property of a 


31 
 

specific language in much the same way as how it can be expressed" 

(Winter, 1961, p.98). 

Thus, the possibility of achieving equivalence at grammatical level 

has been examined by many researchers who discussed this dilemma in 

relation to translation. For example, the clear-cut differences between 

English and Arabic in relation to grammatical categories led Baker to 

identify five problematic grammatical categories between English and 

Arabic which are: number, gender, person, tense and aspect, and voice. She 

maintains that the differences at this level constitute a source of difficulty 

for human translators because such differences are capable of changing the 

content of the message in the process of translation. This change may lead 

to the addition of information which is not found in the original or omitting 

information from the source text. She concludes that this may happen when 

"the target language has a grammatical category which the source language 

lacks", or "if the target language lacks a grammatical category which exists 

in the source language" (1992, p.83). 

Moreover, Catford discussed the process of translation between two 

languages with different linguistic systems. He maintains that there are 

some "translation shifts" that may occur in the process of translating a text 

from a mother language to a target language. He states that such shifts may 

take place at two main levels: lexical and category. The former shift 

"occurs when an SL item has a target language equivalence at a different 

linguistic level from its own (grammatical, lexical, etc.) "while category 


32 
 

shift takes place at four levels which are: class, structure, unit and intra-

system shifts. First, "class shift" which involves changing the class of a 

word, e.g., from an adjective to a noun. Second, "structure shift" which 

refers to altering the grammatical structure of a sentence, e.g. from active to 

passive. Third, "unit shift" which refers to switching the rank of: e.g. a 

clause to a phrase. Finally, "intra-system shift" "which occurs when 

translation involves selection of a non-corresponding term in the TL system 

…: e.g. an SL 'singular' becomes a TL 'plural'" (1965 as cited in Hatim, 

2001, p.16). Such shifts may take place when the translator cannot adhere 

to the linguistic forms that exist in the source text. 

Therefore, Nord states that "linguistic problems arise from 

differences of structure in the vocabulary and syntax of second language 

(SL) and target language TL" (1991, p.88). In the same vein, Abbasi and  

Karimnia assert that most students commit errors while doing translation 

tasks at syntactic and morphological levels or what they call "syntactic-

morphological errors" such as: errors in the use of the appropriate tense, 

errors in the use of articles and prepositions, and errors in the use of active 

and passive voice. They state that students while doing certain translation 

tasks, they do commit errors at different grammatical levels because they 

transfer the grammatical rules of their own language into the target 

language (2011. Retrieved from: https://urlzs.com/wpw5J).  

Even so, these problems may be solved since Hannouna believes that 

human translators can work hard and focus their effort on understanding 


33 
 

and mastering the grammars of the two languages involved in the 

translation task and they can "draw on general knowledge of the subject 

matter and the world to arrive at the intended meaning" (2004, p.54). Thus, 

the research hypothesizes that when it comes to machine translation, 

deficiencies related to grammatical categories would be more painstaking 

since machines may not have the ability to analyze all the grammatical 

categories between the languages involved especially if they are far-distant 

languages such as English and Arabic; such languages have more 

differences in their linguistic systems than similarities.     

On the grounds of this, linguistic errors may blow in while using 

machine translation because "MT is often impeded by lexical and syntactic 

ambiguities, structural disparities between the two languages, 

morphological complexities and other cross-linguistic differences" 

(Hannouna, 2004, p.54). In other words, machine translation errors may be 

attributed to the framework adopted by all machine translation programs 

that is known as the "Transfer approach". This approach consists of three 

steps which are: First of all, the scanning and analysis of the ST syntactic 

structures into their basic building blocks. Secondly, the transfer of those 

syntactic structures into the TL structure. Finally, the synthesis and 

restructuring of the output based on that TL structure which may yield one 

or a number of proposed translations for the same structure. Thus, this 

approach indicates that the process in machine translation programs is 

sequential so each step has to pave the road for the next one to take place to 


34 
 

produce optimal output that satisfies the users‟ desires (Somers, 1998, 

p.145 as cited in Hatim and Munday, 2004, p.117). 

However, in some cases, machine translation programs may commit 

errors at one or all of the above mentioned levels. This, in turn, leads to 

many errors in the translations made by the software in question. In other 

terms, there are shifts that may take place in the translation process which 

may result in semantic shifts that might change the meaning of the text/s in 

hand. Such errors may widen the gap between human translators and 

machine translation as two faces for the same coin since Brown defined 

errors as "a noticeable deviation from the adult grammar of native speakers, 

reflecting the inter-language competence of the learner" (2004, p.216). By 

analogy with humans' competency, errors indicate that the preprocessing 

mechanisms of the software in question are not doing well, so they need to 

be enhanced and well-fed. This echoes Al-Samawi‟s words who states:  

The question whether machine translation would replace 

human translation was and is still one of the primary 

concerns of research in machine translation. Researchers, in 

this regard, are between fear and confidence. Some look at it 

as a real threat to human translators; others are doubtful and 

base their doubt on the terrible errors committed by machine 

translation  (2014. Retrieved from: https://urlzs.com/74KMz). 

Consequently, many researchers have attempted to identify and 

classify the errors produced by machines in relation to their linguistic 


35 
 

competency, in particular, grammatical categories at smaller levels such as 

word and sentence level. For example, Hannouna (2004) states that the 

machine commits errors in areas such as: category and word class, 

syntactic arrangement, tense, pronoun translation, suffixes among other 

areas. She evaluates the quality of three Arabic machine translation systems 

but her study focuses only on one single level of texts which is the 

sentence. In addition, Vilar, Xu, D'Haro and Ney (2006) identified five big 

classes of errors which are: "missing words, "word order", "incorrect 

words", and "unknown words and punctuation errors". Their study also 

focuses on smaller units (Retrieved from: https://urlzs.com/XtQ9w). 

In addition, Al-Samawi (2014) identified a number of errors made by 

GT at text level both at syntactic and morphological levels such as: 

"Violating subject-verb agreement (masculine and feminine; singular, dual, 

and plural; first, second, and third person)" , "Using a noun in place of a 

verb", "Using a verb in place of a noun" and "Omitting functional 

morphemes (i.e. prepositions, articles, conjunctions, pronouns, auxiliary 

verbs, deixis, etc.)." However, his study focuses on counting the number of 

errors at the first ten sentences in each text without explaining them or the 

semantic shift that took place in the texts. Also, he uses 10 texts from 10 

different disciplines in his research; this in turn may not be objective or fair 

enough to make conclusions about GT performance in each field 

(Retrieved from: https://urlzs.com/74KMz).   


36 
 

Therefore, using GT to do certain translation tasks may yield a 

number of grammatical errors in different areas because when it comes to a 

software tool, e.g. GT, and linguistics, the situation may be vague and 

confusing for:  

Psychologists have told us that individuals acting alone do 

not normally cause too much trouble; it is only when they 

form into crowds that they become unmanageable. Similarly, 

individual lexical items. . . , can only stage sporadic strikes; it 

is when they group into long syntactic stretches that they 

begin really to launch all-out assaults on the translator 

(Wong, 2006, p.130).  

If it is so for human translators, then it would be at least the same or 

even far more challenging for GT since it does not have a sense of 

judgment or enough intelligence as humans do. Thereby, this chapter aims 

at detecting the grammatical errors that result in the translations produced 

by GT then classifying those errors under broad and sub-categories to 

demonstrate the effect of the grammatical shifts that take place then 

measure the semantic shifts and their effects at the comprehension level. 

Finally, this chapter will attempt to draw on the last step in the "Transfer 

approach" that was further developed by what is called "users‟ feedback 

button" nowadays. This button enables the end users to interact with the 

machine and contribute in enhancing the quality of the output.  


37 
 

In a nutshell, in the last step of the "Transfer approach" that is known 

as "synthesis", a number of machine translations including GT start to 

provide the end user with one or a number of suggested translations for the 

item in question. Accordingly, the end user starts to enjoy the freedom to: 

either accept the proposed translation provided by GT, reject it alright or 

come up with an acceptable translation by his/her own in cases where all 

the proposed equivalences provided by GT were fuzzy or imperfect.  

 Thus, this chapter aims at suggesting acceptable translations for the 

items translated erroneously by GT to be added to the list of options 

provided by GT in case where those items were re-inputted by a different 

user. In sum, this chapter seeks to give recommendations to solve each type 

of errors in an attempt to pursue a good reputation of machine translation 

and provide the end users with acceptable translations.           

3.2. Errors at Syntactic Level 

3.2.1. Organization of Constituents at Phrase Level  

    The Arabic noun phrase is formed when the noun precedes the 

attributive adjective while in English the attributive adjective precedes the 

noun according to the naturalness principle that controls the production of 

well-formed structures in both languages. However, the research observes, 

in some cases, that GT sticks to the same structure of the source text which 

results in forms that are irrelevant to the target language since they are 

incoherent constituents. In other words, it is a well-known fact that when it 


38 
 

comes to phrases, e.g., noun phrases, then the position of the modifier in 

relation to the modified noun affects the message. For example, Arabic 

starts with the noun first, then it gives information about it in a direct 

manner, while English triggers and prepares the readers/listeners for the 

theme since it describes the object first then reveals and unfolds its identity. 

These different ways of presenting facts and secrets about the same object 

in the two languages led GT to commit errors in this area as shown in Table 

(1) below: 

Table (1): Errors made at noun phrases level 

Ex. Source Text Google Translation 

1. Mendel chose the garden peas for 

his studies because: garden peas are 

available in many varieties. 

نذساعزّ  انجبصالء انؾذٚقخاخزبس يُذل 

يزٕفشح فٙ انؼذٚذ  انجبصالء ؽذٚقخألٌ: 

 يٍ األصُبف.
2. In a chemical reaction, all of the 

atoms in the reactants must be 

present in the products. The 

reactions must be balanced. 

انكًٛٛبئٙ ، ٚغت أٌ ركٌٕ  فٙ انزفبػم

عًٛغ انزساد انًٕعٕدح فٙ انًٕاد 

انًزفبػهخ يٕعٕدح فٙ انًُزغبد. ٚغت 

  يزٕاصَخ سدٔد انفؼم.أٌ ركٌٕ 

The noun phrase in the English text which consists of the noun 

"garden peas" in Ex.1 is not translated correctly. In other words, a structure 

shift takes a place because GT changes the order of the phrase in the 

translation to appear as a noun phrase that is made up of two nouns which 

are the "garden" and the "peas", in Arabic " انؾذٚقخانجبصٚالء  ". This results 

because GT resorts to literal translation which, in turn, derives it to treat the 

word "peas" in its current position in the English sentence as if it were a 

noun and the garden which is its modifier as an adjective. However, the 

word "garden peas" in the underlying structure of the noun phrase here 

serves as a noun which in Arabic means "ؽجبد انجبصٚالء". The noun here is 


39 
 

used to specify the type of seeds that Mendel selected for his experiments. 

In other words, the noun phrase is employed to make the idea more specific 

and precise. 

Thus, this shift proves that GT fails to recognize this underlying 

structure and the way Arabic makes it manifest in its surface structure so 

this leads to translating the two words as two nouns and this results in a 

form that is not familiar in the Arabic language which is two consecutive 

nouns each with the definite article  "انجبصٚالء انؾذٚقخ". That is to say, GT fails 

to analyze the noun phrase "garden peas" as a phrase with one noun, in 

Arabic " بصٚالءؽجبد انج ". Thus, the translation produced by GT may lead 

students to stand for a while to rearrange the sentence and allocate each 

word in its appropriate position to get the message. This, in turn, may 

weaken the translation of scientific texts since Ali and Ismail maintain that 

technical terms create a challenge for students to understand because these 

technical terms "have one or many meanings in everyday language" but in 

a scientific text, they have a different, peculiar and precise meaning (2006. 

Retrieved from: https://urlzs.com/FnT8d).  

Moreover, Ex.2 shows that GT does not stick, in some cases, to the 

same order that exists in the source text. In other words, GT does its own 

guesses to translate a certain sentence regardless of how words are 

combined in the same sentence in the source text. Thus, Ex.2 shows that 

GT fails to order the constituents of the sentence in the right way since 

English starts with the attributive adjective first then comes the noun. 


40 
 

However, this is not the case in Arabic since Arabic starts with the noun 

which in this case is " سدٔد انفؼم" then comes its adjective which is " يزٕاصَخ".  

Another dilemma is that GT fails to order the constituents of the 

noun phrase in the right way as Ex.3 in Table (2) shows. The noun phrase 

"electron transport chain" in the second translation provided by GT is not 

translated correctly since GT fails to recognize its head which is the noun 

"chain". In addition, providing two different translations of the same noun 

phrase despite the fact that it is an established scientific term: one that is 

right while the other is wrong indicates that GT is still unsure about the 

correct translation. 

Table (2): Errors in ordering the noun phrase. 

Ex. Source Text  Google Translation 

3. Electron transport chain accepts 

electrons from the breakdown 

products of the first two stages 

(most of them via NADH) and 

passes these electrons to an 

electron transport chain. 

 عهغهخ َقم اإلنكزشَٔبدرقجم  1

يٍ يُزغبد انزكغٛش فٙ  اإلنكزشَٔبد

انًشؽهزٍٛ األٔنٍٛٛ )يؼظًٓب ػجش 

NADH ِاإلنكزشَٔبد ( ٔرًشٚش ْز

 إنٗ عهغهخ َقم اإلنكزشٌٔ.

 اإلنكزشٌٔ عهغهخ انُقم رقجم 2  

اإلنكزشَٔبد يٍ انًُزغبد آَٛبس 

انًشؽهزٍٛ األٔنٍٛٛ )يؼظًٓى ػجش 

NADH ٔرًش ْزِ اإلنكزشَٔبد )

 إنٗ عهغهخ َقم اإلنكزشٌٔ.

Therefore, producing correct noun phrases requires that GT draw a 

map for the items in question in order to decide on the function of elements 

then rearrange them without any loss or distortion that may threaten the 

quality of the output. For example, in the translation of the above 

mentioned phrases, GT should have done it without any change in the order 


41 
 

of the constituents since the original is clear and precise, so it would be 

safer for GT to analyze how phrasal slots are ordered in both English 

/(Art.)(Adj.)N./ and Arabic /(Art.)N.(Adj.)/ then map unto them to produce 

correct structures. 

Therefore, to handle this phrase-level translation anomaly, the 

researcher suggests a procedure described in Figure [2]. First, the sentence 

provided by the user is to be split (tokenized) into tokens (words). Then, 

these tokens are passed on to a Part-of-Speech Tagger (POS) that finds the 

type of each token, i.e., whether a word is a verb, noun, adverb etc. Using 

these token types, one can find out whether a sentence complies with the 

"Art. + Adj. + N." pattern or not. If yes, then the nouns part undergoes the 

step of bigrams and trigrams extractions, where bigrams and trigrams are 

phrases consisting of 2 and 3 tokens, respectively. The translation of these 

noun phrases are looked up from a specialized lexicon. For Ex.1, the direct 

translation of the phrase "garden peas" would be "انجبصٚالء انؾذٚقخ". Now, the 

user can detect this anomaly and give his/her feedback by suggesting a new 

translation "ؽجبد انجبصٚالء", which would then be maintained in the lexicon. 


42 
 

Figure (2): Processing of noun phrases. 

3.2.2. Organization of Constituents at Sentence level 

The simplest sentence in English consists of SVO/C (subject, verb 

and object/complement) and conveys a certain message. However, when it 

comes to GT, it is clear that it commits errors at this level, in particular, 

with the arrangement of the elements that make up the whole sentence. 

This is due to the nature of the two languages and the features associated 

with each of them. That is to say "English is basically an analytic language, 

i.e., it shows syntactic relationships by word order and function words. 

Arabic is basically synthetic, i.e., it shows syntactic relationships by its 

frequent and systematic use of inflected forms" (Hawkins, 1980 as cited in 

Saraireh 2014). This diversity in ordering the constituents of the sentence 

may hinder the process of understanding the message since it may drive GT 


43 
 

to commit errors that cause structural ambiguity which in turn yields 

different interpretations of the same message as shown in Table (3) below: 

Table (3): Errors made at sentence level. 

Ex. Source text Tran. Google Translation 

4. The number of protons 

determines the atomic number 
 .انؼذد انزس٘ػذد انجشٔرَٕبد ٚؾذد  

5. Electron transport chain accepts 

electrons from the breakdown 

products of the first two stages 

(most of them via NADH) and 

passes these electrons to an 

electron transport chain. 

 اإلنكزشَٔبدعهغهخ َقم رقجم  1

يٍ يُزغبد انزكغٛش  اإلنكزشَٔبد

فٙ انًشؽهزٍٛ األٔنٍٛٛ )يؼظًٓب 

( ٔرًشٚش ْزِ NADHػجش 

اإلنكزشَٔبد إنٗ عهغهخ َقم 

 اإلنكزشٌٔ.
 اإلنكزشٌٔ عهغهخ انُقم رقجم 2  

اإلنكزشَٔبد يٍ انًُزغبد آَٛبس 

انًشؽهزٍٛ األٔنٍٛٛ )يؼظًٓى ػجش 

NADH ٔرًش ْزِ اإلنكزشَٔبد )

 عهغهخ َقم اإلنكزشٌٔ.إنٗ 

In English, there is only one type of sentences; one that starts with 

the subject followed by a verb along with its complement and it is called 

the verbal sentence; while in Arabic, there are two types of sentences: 

equational and verbal. The former starts with a noun followed by a 

predicate while the latter begins with a verb followed by a subject and a 

complement. However, in Ex.4 in Table (3), the English sentence starts 

with the subject which is "the number of protons" followed by the verb 

"determines" and its complement. This sentence follows the unmarked 

pattern of SVO in English. However, the structure of the English sentence 

is reversed in the Arabic sentence by GT leading to a semantic shift that 

results in two readings of the Arabic sentence: it's either that the atomic 

number is the one that decides the number of protons or the number of 


44 
 

protons is the one that is responsible for deciding on the atomic number. In 

the Arabic sentence, both nouns- the number of protons and the atomic 

number- would stand in both subject and object positions.  

 Otherwise stated, the problem lies in that readers of this sentence 

will get confused about the correct meaning of the sentence, especially 

those readers who are not well-acquainted with the Arabic syntactic rule 

which states that: if there are two consecutive nouns in a verbal sentence, 

and the sentence does not use case markers/inflections to distinguish 

between them, then the subject is the first noun and the object is the second 

noun. In other words, the process of sorting them is going to be done 

according to the order in which they appear in a given sentence. To resolve 

this ambiguity, GT needs to be improved by adding inflections to the 

Arabic sentence. The inflections (diacritics) to be used in this case are: 

damma to indicate the subject position and fatha to indicate the object 

position; however, as GT does translate the input, the Arabic sentence has 

two possible readings, the thing that weakens the quality, precision and 

level of comprehension of the translated text. 

     Moreover, Ex.5 in Table (3) shows that reordering the elements in a 

given sentence may produce redundant stretches of language such as using 

two similar nouns immediately after one another in the same sentence. 

Thus, this may give away one meaning from the sentence which, in turn, 

may change or alter the intended meaning in the source text since repetition 

may lead to ambiguity. Thus, in Ex.5, the English sentence starts with the 


45 
 

subject "electron transport chain" but GT inverted the order in the Arabic 

sentence leading to two similar forms following one another. Accordingly, 

when it comes to sorting out these two nouns and assigning them the 

appropriate inflections (diacritics) in Arabic, the result will be two nouns 

with the same inflection which is: Kasrah " االنكزشَٔبدِ االنكزشَٔبِد  ". This 

redundancy may lead students to realize or perceive it as a typo so they 

would read the sentence as "رقجم عهغهخ َقم االنكزشَٔبد". This means that they 

might omit the second word االنكزشَٔبد" " in the Arabic sentence " رقجم عهغهخ

زشَٔبِد اإلنكزشَٔبدِ َقم اإلنك " since they might be deceived or misled by the 

wrong ordering produced by GT which results in producing redundant 

words.  

Therefore, in the case of translating active sentences where both the 

subject and the object contain similar nouns and the inflections do not help 

in clarifying the meaning, it would be better and safer for GT to: either 

maintain the order of the source text to prevent any speculations about how 

the source text might be like to get the bulk of the message or detach the 

two constituents using a verb. Thus, it would be better for GT to produce 

nominal sentences that start with the subjects which are: “the number of 

protons” and "electron transport chain" in the examples then give 

information about their function. 

These examples prove that GT still commits errors in ordering the 

constituents of both the phrase and sentence due to the differences of the 

rules that combine these patterns such as NP, SVO, etc. together in the 


46 
 

language/s in hand. These erroneous and random switches between such 

patterns will lead to errors, such as producing a sentence or a phrase with 

more than one interpretation; producing forms that do not exist in the target 

language or wrong ordering of the name of the scientific term. All these 

errors weaken the level of comprehension and therefore the quality of the 

translation outcome.  In the current situation, the GT users will rely on their 

intuition to make the sentence sound coherent and cohesive. 

               Therefore, to solve this problem, GT should adopt a two step 

procedure described in Figure [3]. The first step is a "text preprocessing 

step" in which GT analyzes the input sentence into tokens, in a "Tokenizer" 

then those tokens will be marked along with their grammatical categories in 

a "POS" tagger. This will help in identifying the sentence pattern employed 

in the input based on the order of the elements in the sentence under study. 

For example, the sentence in Ex.4 conforms to the unmarked sentence-

pattern in English: /sub.+ v.+ obj./ for it starts with a NP and ends with a 

NP, "the number of protons" and "the atomic number", respectively. 

Secondly, GT needs to decide on whether the two nouns are inflected 

for case or not. If not, then GT will undergo a second step to preserve the 

meaning of the ST through allocating both the sub. and the obj. to their 

correct positions in the present sentence pattern. In other words, the 

proposed system should split the two nouns "the number of protons" and 

"the atomic number" in the sub. and obj. slots in Arabic, respectively. Thus, 

the system should be programmed to map the unmarked pattern of /sub.+ v. 


47 
 

+ obj./ in English to /v.+ sub.+ obj./ pattern in Arabic. Accordingly, users 

can suggest a precise translation through adding the Arabic diacritics to the 

sentence such as: fatha and damma to avoid any ambiguity that may 

weaken the quality of the output. Thus, such diacritics will be added to a 

specialized lexicon to be reused again to solve the confusion that may 

occur in identifying the subject and the object in the GT output as shown in 

Figure [3]: 

 
Figure (3): Mapping of nominal sentences to verbal sentences. 

 
However, in cases where the subject and the object contain similar 

words which in turn may lead to redundancy in the output, GT should be 

programmed to block those two nouns from following one another in a 

procedure described in Figure [4]. First, GT should undergo the same "text 

preprocessing step" explained previously to analyze the input sentence. 


48 
 

Second, GT should decide on whether there are similar words in the input 

between the sub. and the obj. or not. If yes, then GT needs to map the input 

sentence into a nominal sentence with the pattern /sub.+ v.+ obj./. This 

sentence pattern will separate the subject from the object by the verb which 

in turn will reduce the redundancy in the output. At this stage, users can 

assist GT by providing it with the correct nominal pattern of the sentence 

under study. Finally, "a diacritics extraction step" will take place to 

emphasize both the subject and the object. Such procedures will allow the 

users to enjoy translations of high quality and precision when it comes to 

translating active sentences. 

Figure (4): Mapping of nominal sentences. 

  
49 
 

3.2.3. Erroneous Shifts from Verbal to Nominal Sentences in Arabic 

    The simplest sentence in any language is made up of different parts 

of speech such as nouns, verbs, adjectives and adverbs. However, these 

categories may be problematic to GT since languages differ in the way they 

derive such parts of speech and the way they combine those elements 

together to communicate a message as the sentence in Table (4) below 

shows: 

Table (4): Errors of turning a sentence to a noun phrase 

Ex. Source Text Google Translation 

6. Some isotopes are radioactive ثؼض انُظبئش انًشؼخ 

      GT fails to order the constituents of the sentence because it does not 

recognize the auxiliary "are" in Ex.6, GT neglects it in the process of 

translation, this in turn leads to translate the sentence into a noun phrase, 

 this leads to a problem in comprehending the sentence ;"انُظبئش انًشؼخ"

because GT replaces the adjective by a noun phrase. In such case, the 

reader may search for a main verb after the noun but s/he finds nothing 

since GT drops the auxiliary "are" from the sentence. Thus, this turns the 

verbal sentence into a noun phrase. These kinds of errors where a verb is 

not translated directly as a verb; instead, it is turned into a noun are 

classified as structure shifts. 

Therefore, the researcher suggests that GT be programmed to 

translate the verb to be (Aux.) and the adjective that follows it in a verbal 

sentence in English into an adjective which makes the sentence equational 

in Arabic as explained in Figure [5]. In other words, Arabic does not use 


50 
 

such type of pseudo-verbs which include: is, am, are, etc. to introduce 

adjectives. Therefore, GT should undergo the prepossessing step first to 

identify the aux. and then translate it and the adjective that follows it in 

English into an equational sentence in Arabic which consists of a subject 

and a predicate. Second, GT should undergo an extraction step. In other 

words, the Arabic predicate (adj.) has to be derived from the /aux./ and the 

/adj./ in the English sentence. However, in Ex.6, GT neglects the presence 

of such pseudo-verbs; this results in "genitive structures" in English, " يضبف

 Such form may not help ."ثؼض انُظبئش انًشؼخ" :in Arabic, such as "ٔيضبف انّٛ

the readers to distinguish or identify the topic of the sentence which is 

called the theme and the comment that tells the readers more about the 

theme, that is called the rheme. Therefore, users could add their suggested 

translation for the sentence which is "َظبئش يشؼخ" to be maintained in the 

lexicon to be reused again in similar constructions.   


51 
 

Figure (5): Mapping of sentences with verb to be. 

3.3. Errors at Morphological Level 

3.3.1. Inappropriate Choice of Suffixes 

Affixes in English are of three types: prefixes that are added in front 

of the word, infixes which are put in the middle of words, and suffixes that 

come at the end of the word. Each type has a function which helps in 

constructing a precise meaning.        

3.3.1.1. Inflections Attached to Sub-headings      

In some cases, GT fails to add the definite article "the", "ال ", in 

Arabic. This is due to a well-known fact that in English, when people want 

to refer to things in general they use the plural form while in Arabic the 

situation is different. In other words, Arabic employs the  "ال"  to refer to 


52 
 

things in general while English uses it to refer to or specify the 

referent/topic.  

However, in the example in Table (5), GT fails to add the "ال" to the 

noun phrase "chemical reactions" in the Arabic text making the noun 

phrase indefinite since English and Arabic differ in the process of assigning  

the definite article to nouns according to the function of the sentence. In the 

English sentence below, the noun phrase which starts with a capital letter 

/C/ aims to refer to chemical reactions in general for it introduces the topic 

of the subsequent sentence. In other words, GT should be programmed to 

attach the  "ال" to the noun phrase "Chemical reactions" for it functions as a 

sub-heading. However, Ex.7 in Table (5) proves that GT still needs to be 

enhanced in this area since the Arabic noun phase indicates that the 

reactions are unknown which in turn makes the topic vague and not 

specific enough due to the absence of "the" which adds some kind of 

familiarity and smoothness to the sub-sequent sentence as the example in 

Table (5) shows: 

Table (5): Errors in treating the definite article 

Ex. Source Text Google Translation 

7. Chemical reactions 

In chemical reactions, chemical bonds are 

broken and reformed, leading to new 

arrangements of atoms. 

 رفبػالد كًٛٛبئٛخ

فٙ انزفبػالد انكًٛٛبئٛخ ، ٚزى 

ركغٛش انشٔاثظ انكًٛٛبئٛخ 

ٔإصالؽٓب ، يًب ٚؤد٘ إنٗ 

 رشرٛجبد عذٚذح نهزساد

 
53 
 

3.3.1.2. Inflections Attached to the Verb  

A clear example related to errors in affixation is the main verb in the 

Arabic sentence below. GT fails to recognize that the /s/ in its current 

position is used to indicate a verb that is both active and present and has a 

singular subject. However, GT treats the /s/ as a grammatical category that 

is used to indicate the plural form of the noun /function/ as shown in Table 

(6) below:  

Table (6): Errors in the selection of parts of speech.   

Ex. Source Text 

 
Google Translation 

8. In a multi-cellular organism, cell 

division functions to repair and renew 

cells that die 

فٙ انكبئُبد يزؼذدح انخالٚب، 

االَقغبو انخهٕ٘ إلصالػ  ٔظبئف

 ٔرغذٚذ انخالٚب انزٙ رًٕد

     This example shows that GT fails to distinguish between words that 

have similar forms in both plural and simple present cases. In other words, 

a category shift that changed the category of the word "functions" form a 

verb in the English sentence to a noun in the Arabic text took place. 

However, the word function is used: either as a verb or a noun since 

/function/ could have two forms, this means that function could be used 

both as a verb which means to serve/work or as a noun which means a 

job/task. Thus, this duality of forms of the same word leads the reader to 

realize that the output sentence has no verb since it is mistranslated by GT 

into a noun while in the source text; it is intended to serve as a verb and not 

a plural noun. This makes the Arabic sentence appear as if it were verb-less 

which in turn does not help to get the message in the Arabic text since it is 


54 
 

not allowed to identify the verb in a given sentence using one's intuition 

especially in scientific texts. 

 Thus, it is important that GT developers feed GT with a procedure 

described in Figure [6] to enable it to handle all the words that have the 

same form in both plural and present tense, with a 3
rd

 person, singular 

subject cases. First, GT will undergo the text preprocessing step to decide 

on the function of the word in question and what it aims to achieve. That is 

GT needs to process both the position of the word in the sentence and the 

surrounding elements that shape its identity. For example, the position of 

the word in question which is "function" in Ex.8 shows that the word is 

used as a verb for the subject "cell division". In other words, the verbs 

"repair" and "renew" could not be the verbs for the subject "cell function" 

because there is the particle "to" before them. Thus, GT needs to answer 

this question: "Does the sentence have a verb for the subject “cell 

division?". If no, then GT will extract a verb that agrees with the present 

subject in person, number, etc. However, at this stage, GT cannot derive the 

appropriate form of the verb "function" in Arabic since it translates it as a 

noun not a verb, in Arabic "ٔظبئف". Thus, at this stage,  users can suggest a 

translation for the word "function" as a verb which is "ٗٚؼًم ػه". 

Accordingly, this translation would be maintained in the lexicon to be 

reused again in similar circumstances. 


55 
 

Figure (6): Processing of words with similar forms in plural and present tense. 

3.3.2. Passive Constructions 

    Passive constructions are used heavily in scientific texts to achieve 

certain purposes. Swales states: "the passive can be used to give the 

necessary information in the best possible way; impersonally, concisely, 

objectively, and giving importance to the most important facts" (1971, 

p.41). However, when considering GT, there is a number of errors that take 

place in the translation of certain sentences form active to passive. These 

errors include: 

3.3.2.1. Failure to Distinguish between the Simple Past and Passive 

Inflections 

     In some cases, GT mistranslates sentences that contain a passive 

construction by using a simple past form in place of a passive. In other 


56 
 

words, GT fails to distinguish between the simple past form and the 

participle form that comes after the auxiliary in passive constructions-

passive adjectival-. This in turn may affect the truth value of the sentence 

as Table (7) below shows:  

Table (7): Errors in recognizing the passive construction 

Ex. Source Text Google Translation 

9. a disaccharide consists of two 

monosaccharides joined by a 

glycosidic linkage. 

يٍ اصٍُٛ يٍ انغكشٚبد  ٚزكٌٕ دٚغبكٓبسٚذ

 سثظ اَضًذ إنٗاألؽبدٚخ انزٙ 

glycoside 

ٚزكٌٕ دٚغبكٓبساٚذ اصٍُٛ يٍ انغكشٚبد   

 .انشثظ غهٛكٕصٚذٚخ اَضى انٛٓىاألؽبدٚخ 

The sentence in Ex.9 states a fact about the components of 

"disaccharides", so the verb that is usually used to refer to factual issues in 

English is the simple present not the simple past since the use of the simple 

past "joined" may indicate that the components change or that the process 

of producing disaccharide is done in the past and now it is over. Thus, this 

is not acceptable in science language since things have to be clear, exact 

and fixed to establish a kind of mutual trust between the readers and the 

text/s in hand. In other words, the verb "joined" does not indicate a simple 

past but it is a passive construction that is erroneously recognized by GT as 

a simple past. This shows that GT fails to make use of the present key 

words in the sentence such as the preposition "by" and  the verb "consists" 

to understand that the sentence is talking about actions that happen at a 

present situation or something that takes place whenever there is a process 

of disaccharide production.  


57 
 

Basically, GT fails to recover the underlying structure of the 

sentence to translate it as a passive construction so it goes with the 

superficial structure which is the simple past. However, this leads to errors 

in the translation of the passive construction since reading the Arabic 

sentence may lead to a conclusion about a process that happened in the past 

due to the use of the simple past form of the verb "ٗاَضى ان" and not a 

process that could be repeated whenever disaccharides are formed since the 

underlying structure of the sentence is: "are joined", "ٗرُضى ان" not "joined", 

" ضًذاَ " in Arabic- as a past form. Thus, GT should be programmed to 

benefit from the words in the textual context in its translation box like: "by" 

and the verb "consists" in the present case. Such words should help GT 

recognize the verb in question as a passive form not a past form.     

Another issue is that GT neglects the passive construction that is 

used to describe certain objects in given sentences leading to verb-less 

sentences that do not have an obvious meaning as the example in Table (8) 

below shows: 

Table (8): Errors at passive construction arrangement level 

Ex. Source Text Google Translation 

10. Substances dissolved in a solvent are 

called solutes. 
 .رغًٗ انًٕاد انًزاثخ فٙ يزٚت

In Ex.10, the passive construction is not identified by GT which 

results in an incomplete sentence since the sentence suggests that there is a 

name for the materials that dissolve in water but this name is not given in 

the Arabic sentence for GT fails to put the sentence in the correct order to 


58 
 

come up with a correct passive construction. Thus, readers may expect to 

find a concept that refers to those substances that dissolve in a solvent but 

they end up with an incomplete sentence.  

     This proves that GT fails to parse the relative clause that is used to 

describe the term "solutes". In other words, GT fails to retrieve the 

underlying structure of the relative clause which states that substances that 

are dissolved in a solvent are called solutes. GT fails to come up with a 

linking word that helps to get a meaningful sentence which in this case 

could be the linking pronoun "ٙانز". Thus, an acceptable Arabic translation 

that needs to be inserted among the options list may be: " رغًٗ انًٕاد انزٙ رزٔة

د يزاثخفٙ يزٚت/يؾهٕل يٕا " .  

 3.3.2.2. Passive Inflections     

     GT fails to use the appropriate inflections that indicate that the 

sentence is passive as in the verb "رُغزة" which does not have any 

inflections to indicate whether it is an active verb " زةغ  ُ  ر   "  or a passive one 

" زةغ  رُ  "  as the sentence in Table (9) shows: 

 Table (9): Errors at passive inflections level 

Ex. Source Text Google Translation 

11. Polar substances and ions dissolve in 

water because opposite charges are 

attracted to the appropriate ends of water. 
 

رزٔة انًٕاد ٔاألَٕٚبد 

انقطجٛخ فٙ انًبء ألٌ انشعٕو 

إنٗ َٓبٚبد  رُغزةانًؼبكغخ 

 انًٛبِ انًُبعجخ.

The underlying structure of the Arabic sentence is that polar 

substances go and move out of their will while the English sentence states 


59 
 

that they are moved by an external force. In other words, they do not move 

out of their will instead they are attracted by a non-mentioned force. This 

lack of inflections leads to two readings of the constituent "are attracted". 

However,  in light of this structure or other kinds of structures, the word 

two has to disappear and replaced by oneness particularly in scientific texts 

as a genre.  

Therefore, the researcher suggests that passive constructions should 

receive double attention from the software developers since such type of 

texts is loaded with passive structures for the focus in scientific texts is on 

the scientific facts rather than the ones who came up with those facts. Thus, 

the researcher suggests a procedure explained in Figure [7]. First, the 

sentence passes through the prepossessing step to analyze all its elements. 

In other words, GT should parse the sentence correctly through identifying 

the grammatical subject and object. Second, GT should identify the pattern 

of the sentence: "whether is it an active /sub. +v. +obj./ or passive /obj.+ 

v.+ sub./?". Next, if the sentence conforms to the pattern of /obj.+ v. +sub./, 

an extraction step of the appropriate passive inflections should take place. 

However, at present, GT cannot insert the appropriate passive inflections. 

Thus, users can suggest a translation for the passive sentence below 

through assigning the appropriate diacritics in Arabic to make the sentence 

meaningful. The unmarked diacritic used in Arabic to indicate the passive 

construction is: damma which is attached to both the verb and the 

grammatical subject that follows it. Such suggested translations would be 


60 
 

kept in the lexicon to be reused again by different users having the same 

input.  

 
Figure (7): Processing of passive constructions. 

3.3.3. Unnecessary Derivation for Certain Words  

GT randomly selects a word in the input sentence then derives new 

forms from that word and inserts those forms in the output. However, this 

derivation is sometimes done at the expense of other functional/content 

words in the same sentence. Thus, this may lead to loss in meaning and 

redundancy in the output such as: "يشؾَٕخ انشؾُخ" and كًفبػم يزفبػم" " in the 

examples in Table (10). Such errors occur when GT fails to identify and 

choose the correct part of speech to be used and that best completes the 

sentence. 

  
61 
 

Table (10): Redundancy due to unnecessary repetition 

Ex. Source Text Google Translation 

12. Ionic bonds are electrical attractions 

between oppositely charged ions. 
انغُذاد األَٕٚٛخ ْٙ ػٕايم 

عزة كٓشثبئٛخ ثٍٛ إَٔٚبد 

 يشؾَٕخ انشؾُخ
13. Aerobic respiration consumes oxygen as a 

reactant to complete the breakdown of a 

variety of organic molecules (aerobic is 

from the Greek aer, air, and bios, life). 

ٚغزٓهك انزُفظ انٕٓائٙ 

 كًفبػم يزفبػماألٔكغغٍٛ 

إلكًبل رؾهٛم يغًٕػخ 

يزُٕػخ يٍ انغضٚئبد 

انؼضٕٚخ )انٕٓائٛخ ْٙ يٍ 

انٕٓاء انغٕ٘ ٔانٕٓائٙ 

 َٕبَٛخ ، انؾٛبح(.ٔانغٛش انٛ

      In Ex.12, the constituent "oppositely charged" is rendered incorrectly 

as "يشؾَٕخ انشؾُخ"  while it means "يزؼبكغخ انشؾُخ". Thus, these unnecessary 

derivations from the word "charged" took the place of the word 

"oppositely". Accordingly,