An-Najah National University Faculty of Graduate Studies Language Errors in Machine Translation of Scientific Biological Texts from English to Arabic: The Case of Google Translate By Hanan Jamal Alawneh Supervisor Dr. Abdel Karim Daragmeh This Thesis is Submitted in Partial Fulfillment of the Requirements for the Degree of ‎Master of Applied Linguistics and Translation, Faculty of Graduate Studies, An- Najah ‎National University, Nablus, Palestine.‎ 2019 iii Dedication To the jewel of Palestine, the eternal capital, Jerusalem To my first teachers, my dearest mom and dad for their unconditional love and their insistence to make me use the right hand instead of the left one since I was four years, believing that the right one is more blessed. Their vision turned out to be true for I got a blessing from God to finish this thesis. To my lovely sisters, Rafeef, Raniem, Wa‟ed, and Haneen. To my one and only brother, Ra‟if. To my true friend Zainab who has been by my side since the 1 st grade. To my relatives and friends who ever concern about my thesis with love and passion. To Google team who have always been a source of inspiration in their good and bad results. This thesis is to assist you in cutting the clouds of low quality results, so Google Translate works better. iv Acknowledgement I would like to express my gratitude to all my teachers who have nourished my intellectual life since kindergarten. Special thanks to my supervisor, Dr. Abdel Karim Daragmeh for his valuable comments from the first letter of this thesis till the last drop of ink. Many thanks are due to Dr. Mahmoud Eshreteh for his constructive comments concerning the linguistic part of the thesis. Very special thanks go to Dr. Hamed Abdelhaq for his feedback regarding the maps and computer science aspects included in this work. Last but not least, I shall never forget my dearest father and mother who have been a source of continuous support from the very beginning of this journey till the last breath. vi Table of Contents Subject Page Defense Committee Members ii Dedication iii Acknowledgments iv Declaration v Table of Contents vi List of Tables viii List of Figures ix List of Abbreviations xi Abstract xii Chapter One: Introduction 1.1. Introduction 1 1.2. Scientific Translation and Machine Translation 4 1.3.Why the Biology Textbooks? 5 1.4. Problem Statement 7 1.5. Purpose of the Study 10 1.6. Significance of the Study 11 1.7. Limitations of the Study 11 1.8. Research Questions 12 1.9. Thesis Chapters 13 Chapter Two: Literature Review and Methodology 2.1. Related Literature to Machine Translation 15 2.2. Methodology 24 2.3. Theoretical Framework 27 Chapter Three: Linguistic Discordance in Google Translation 3.1. Introduction 29 3.2. Errors at Syntactic Level 37 3.2.1. Organization of Constituents at Phrase Level 37 3.2.2. Organization of Constituents at Sentence Level 42 3.2.3. Erroneous Shifts from Verbal to Nominal Sentences in Arabic 49 3.3. Errors at Morphological Level 51 3.3.1 Inappropriate Choice of Suffixes 51 3.3.1.1. Inflections Attached to Sub-headings 51 3.3.1.2. Inflections Attached to the Verb 53 3.3.2. Passive Constructions 53 vii Subject Page 3.3.2.1. Failure to Distinguish Between the Simple Past and Passive Inflections 55 3.3.2.2. Passive inflections 58 3.3.3. Unnecessary Derivation for Certain Words 60 3.3.4. Pronouns Translation 62 3.3.4.1. Relative Pronouns Referent/s 62 3.3.4.2. Pronouns Refer to Gender Neutral Nouns 64 3.3.4.3. Phrasal Verbs Meanings along with their Gender Marked Subjects 68 3.4. Conclusion 71 Chapter Four: Cohesive Markers at Paragraph Boundaries in Google Translation 4.1. Introduction 75 4.2. Areas of Eff./Def. of Lexical Markers in Description Paragraphs 80 4.2.1. Cohesion in Introducing Definitions 80 4.2.2. Cohesion Achieved through Scientific Terms Chains 85 4.3. Areas of Eff./Def. of Lexical Markers in Process Paragraphs 91 4.3.1. Cohesion Achieved through Word-Choice 91 4.3.2. Process-Sequence Verbs 96 4.3.3. Process Signals within Non-past Constructions 99 4.4. Areas of Eff./Def. of Lexical Markers in Causality Paragraphs 103 4.4.1. Word Collocations Deployed in Cause/Result Relations 103 4.4.2. Causal Chains 108 4.5. Conclusion 117 Chapter Five: Conclusions and Recommendations 5.1. Conclusions 120 5.2. Recommendations 128 References 129 Appendixes Appendix (1): The selected texts and their translation as produced by Google Translate 136 ب الممخصّ viii List of Tables Table No. Subject Page Table (1) Errors made at noun phrases level 38 Table (2) Errors in ordering the noun phrase 40 Table (3) Errors made at sentence level 43 Table (4) Errors of turning a sentence to a noun phrase 49 Table (5) Errors in treating the definite article 52 Table (6) Errors in the selection of parts of speech 53 Table (7) Errors in recognizing the passive construction 56 Table (8) Errors at passive construction arrangement level 57 Table (9) Errors at passive inflections level 58 Table (10) Redundancy due to unnecessary repetition 61 Table (11) Errors in assigning the correct referent 62 Table (12) Errors in using gender markers 65 Table (13) Errors made at pronoun level 67 Table (14) Errors in SV agreement 69 Table (15) Errors in relative clauses translation 85 ix List of Figures Figure No. Subject Page Figure (1) The process of treating the selected data 27 Figure (2) Processing of noun phrases 42 Figure (3) Mapping of nominal sentences to verbal sentences 47 Figure (4) Mapping of nominal sentences 48 Figure (5) Mapping of sentences with verb to be 51 Figure (6) Processing of words with similar forms in plural and present tense 55 Figure (7) Processing of passive constructions 60 Figure (8) The process of matching the relative pronoun with its referent 64 Figure (9) The process of assigning gender to both the sub. and the verb 66 Figure (10) The process of gender matching between the pronoun and its antecedent 68 Figure (11) The process of composing the meaning of the phrasal verb "keep" 70 Figure (12) Parallel structures used in introducing definitions 85 Figure (13) GT failure in reproducing adjective clauses 85 Figure (14) The lexical chain used in paragraph 2a 88 Figure (15) Deficiency at terminological level in paragraph 2a 89 Figure (16) Suggested translation for paragraph 2a 89 Figure (17) The process of identifying descriptive paragraphs based on the lexical markers 91 Figure (18) GT inexact translation for the process in question 95 Figure (19) GT failure in identifying the verb as a sequence transition 97 Figure (20) Suggested translation for paragraph 3a 99 Figure (21) Suggested translation for paragraph 4a 102 x Figure No. Subject Page Figure (22) The process of identifying process paragraphs based on the lexical markers 103 Figure (23) Errors in employing collocations 105 Figure (24) 'Because' along with its complement in Arabic 106 Figure (25) Errors in pronoun-referent resolution 107 Figure (26) GT failure in deploying "wa" 107 Figure (27) Suggested translation for paragraph 5a 108 Figure (28) GT failure in reproducing if-structures 111 Figure (29) Suggested translation for paragraph 6a 111 Figure (30) The process of identifying causality paragraphs based on the lexical markers 113 Figure (31) Suggested translation for paragraph 7a 116 xi List of Abbreviations AJ : Adjective AT : Article Aux. : Auxiliary AV : Adverb C : Complement C/R : Cause/Result CJ : Conjunction Def. : Definition Des. : Description DT : Determiner Eff./def. : Efficiency/deficiency FEMTI : Framework for the Evaluation of Machine Translation in the ISLE GT : Google Translate ISLE : International Standards for Language Engineering L.M. : Lexical Marker MTE : Machine Translation Evaluation NLP : Natural Language Processing NN : Noun Obj. : Object PN : Pronoun Pro. : Process SL/TT : Source Language/ Target Language Sub. : Subject TO : Infinitive Marker to VB : Verb 'BE' VM : Modal verb VV : Verb xii Language Errors in Machine Translation of Scientific Biological Texts from English to Arabic: The Case of Google Translate By Hanan Jamal Alawneh Supervisor Dr. Abdel Karim Daragmeh Abstract Machine translation has planted its roots deeply in research domains since it becomes the first aid for survival in this era of "globalization". Thus, the present research explores the areas of efficiency/deficiency in Google Translate performance in scientific biological texts translation from English to Arabic. More specifically, the research aims to test GT performance at two levels: sentence and paragraph levels. Thus, Catford‟s translation shifts (1965), Halliday and Hassan's model of cohesive devices (1976) and types of paragraphs frequently used in scientific texts are the main tools used to judge GT output. Finally, the researcher attempts to propose solutions for the errors encountered to enhance GT performance in this particular text type to help GT produce translations with high accuracy rates. Chapter One Introduction 1.1. Introduction The 21 st century can be best described as a competitive marketplace with two main competing forces. On the one hand, there are companies that work hard in order to put the best products in the hands of their consumers. On the other hand, there are clients who struggle to find an optimal product that both eases their life and saves them time and effort. This leads machines to become like shadows of human beings; if one wants to talk to someone who is far away from him/her, then s/he has to use a machine which is the cell phone in order to communicate with that person; or if one wants to move from one place to another, s/he has to use a car which is also a package of machines. Similarly, if a student, a mother, a father, a tourist, or a beginner translator, wants to learn to read a paragraph, to check the pronunciation, the spelling of certain words, or to translate a short excerpt, a word, a phrase or even a text of whatever kind from one language into another, s/he often uses a machine to perform such tasks. Thus, such trends reflect the fact that machine translation has become a necessity for living in the modern world. There is a plenty of choices among machine translation software that users often benefit from such as: Bing Translator, which was introduced by Microsoft in 2012 and, provides a multi-lingual translation service as well as Babylon which played an important role in machine translation from and 2 to Arabic through developing dictionaries that contain acronyms and abbreviations. In addition, there is voice translation software which provides customers with voice to text or text to voice translations by turning a certain message into a unit of translation then producing written or oral translations for it according to the customers' needs. A clear example of voice translation software is Google Translate (GT). This software provides its users with voice translations. Thus, all they need to do is to click on the button "speak" and a written translation of their speech will appear on the screen. Moreover, GT provides translations among 103 languages with over than 200 million users daily (Wikipedia, 2018). Therefore, GT has become the most fashionable, trendy and easily accessible machine nowadays for translation tasks. However, this software can sometimes be misleading since it is a machine that depends heavily on word-recognition and pattern-matching between the components of the input and the likely equivalence for that input in its translation memory. This framework of translation action was explained by GT team (2012) who stated that: When Google Translate generates a translation, it looks for patterns in hundreds of millions of documents to help decide on the best translation. By detecting patterns in documents that have already been translated by human translators, Google Translate can make intelligent guesses as to what an appropriate translation should be. This process of seeking 3 patterns in large amounts of text is called “statistical machine translation" (Retrieved from: https://urlzs.com/xtgSo). In other words, the truth that GT is both fast and economical cannot be denied; however, when it comes to its accuracy, the translation product can be inaccurate, incomprehensible and often misleading. It is quite evident that GT comprehension ability is still inferior to human translation. This deficiency is due to the fact that GT has to deal with many languages with different linguistic systems. Thus, Aiken and Balan (2011) stated that: "Although Google Translate provides translations among a large number of languages, the accuracies vary greatly... translations between European languages are usually good, while those involving Asian languages are often relatively poor" (Retrieved from: https://urlzs.com/aPm1L). This proves that GT is still in its initial stage and thus the door is still open to improve it; the evaluation of its performance is deemed a vital stage in improving the performance of the translation software. Therefore, this thesis aims at evaluating GT performance and pinpointing the problems that it may encounter while translating texts from English into Arabic, particularly scientific biological texts taken from Biology 1 textbook which is taught at the Faculty of Science at An-Najah National University for 1 st year students; the machine translation users in this case are 18-19 years old. The texts contain chapters on The Chemical Context of Life, Water and the Fitness of the Environment, The Structure and Function of Large Biological Molecules, An Introduction to Metabolism, Cellular 4 Respiration, The Cell Cycle, Mendel and the Gene Idea and From Gene to Protein. Finally, the thesis attempts to recommend solutions for the errors encountered in the translation action to enable software developers to enhance their translation program. Such solutions help to reach high accuracy levels when translating such type of texts since they are considered an example of a controlled area covered under the umbrella of scientific genre. 1.2. Scientific Translation and Machine Translation A scientific text is considered one of the writing modes embedded within the general term called 'scientific genre' for Hatim and Munday define the word 'genre' as "a conventionalized form of speaking or writing which we associate with particular 'communicative events'. Participants in these events tend to have set goals, with strict norms regulating what can or cannot be said within the confines of given genre settings" (2004, p.88). That is to say, the scientific genre is a well-established mode because it employs a set of agreed upon standards and textual norms that regulate the use of both language and message-building within the texts that conform to such genre. In other words, the scientific genre is tied with a language that is characterized by "impersonal style, simpler syntax, use of acronyms, and clarity (Ilyas, 1989, p.109). Accordingly, when it comes to translation, scientific translation is considered to have an informative function. Byrne (2006) stated: "scientific translation primary goal is to deliver scientific information; it aims at presenting well expressed information, that may be 5 used easily, properly and effectively" (as cited in Soualmia, 2009, p.21). Moreover, Soualmia stated: "Scientific translation is defined as the method employed to help organize thought, procedures and then come into clear, faithful and reliable results, free of subjectivity and personal involvements" (2009, p.19). However, translators may face a difficulty in translating scientific terms and constructions since Zinaser states that: "Every profession has its growing arsenal of jargon to fire at the lay man and hurls him back from its walls" (1976, p.15). Thus, translators may resort to different procedures while translating texts such as: transliteration, borrowing, or providing footnotes. Thereby, with regard to GT, the situation may be much more challenging for the machine in question may not enjoy enough level of recognition to decide upon which procedure to use. Thus, this may result in low quality translations. This echoes Parikh‟s words who stated: "machine translation rarely reaches accuracy levels above 70%, while a human translation almost always produces accuracy levels above 95%" (2012. Retrieved from: https://urlzs.com/STgBk). Thus, the present research aims to test the extent GT adheres to the norms associated with the scientific language while translating excerpts taken from scientific biological textbooks. 1.3. Why the Biology Textbooks? The reasons behind choosing the biology textbooks for students in their 1 st year in biology specialization is: First of all, scientific texts are 6 very challenging; they are "a good example of the most challenging text type … these texts often present information that is conceptually rich but also conceptually dense and abstract. They use terminology that is unfamiliar to many students … using language in ways that students do not encounter in their reading of fictional and narrative texts" (Palincsar, 2013. Retrieved from: https://urlzs.com/ZBv4f). Secondly, scientific biological texts deal with terminologies and processes related to everyday life activities like sleeping, eating, etc unlike the other branches of science like chemistry and physics which are basically about numbers and statics that are close to the sign language such as the mathematical calculations (+ / - / * ); those later texts contain minimum text and therefore they can be easily understood by looking at the symbols. In comparison, the biology text is basically about concepts, descriptions, and terms. Thus, it can be hypothesized that GT can work better with biology. Finally, students at this stage (1 st year in specialization) will take compulsory courses that usually contain introductory and basic concepts about biology in English. These courses will serve as a repertoire for them later on in their specialization. This makes it vital for students to make sure that they understand the ideas and get the accurate equivalent. Therefore, 1 st year students who are majoring in biology may choose GT to translate certain texts and terms from English to Arabic to understand the information in their textbooks. In other words, students need to cope with 7 the language of science which, in turn, uses the English language to express the new experiments/studies in biology; this tendency to use machine translation supports the report issued by GT in 2016 which states that more than 500 million people use GT around the world and that the Arabic language is one of the most widely used languages in the application with more than 100 billion words a day. 1.4. Problem Statement This research will be concerned with the mistranslations performed by GT while translating scientific biological texts from English to Arabic. When it concerns machine translation, issues related to features and functions may create a challenge for the machine in question, i.e., GT. Thus, this study will explore problems including those named by Wisniewski, Kubler and Yvon (2014) such as: "lexical errors, morphological errors, syntax errors, semantic errors, format errors… " (Retrieved from: https://urlzs.com/CAQbk). These types of errors may affect the quality of scientific texts since these texts may contain different types of paragraphs including: description paragraphs which aim at describing concepts/objects, process paragraphs that mark the sequence of certain biological processes or causality paragraphs which explain the cause/result of particular phenomena. Thus, in all cases, scientific texts must meet four standards: Syntax, Morphology, Terminology and Cohesion/Naturalness. 8 First of all, 'Syntax'. In such kind of texts, syntactic structures must guide the machine to one and only one meaning. This means that ambiguity which may threaten the precision of the text is not welcomed in scientific translation so there must be no room for structural ambiguity in the resulting translations made by GT. The second standard is 'Morphology'. It examines certain morphemes attached to certain words to help the software to get the exact meaning such as connectors, negation, tense and number. However, GT may not benefit from these morphemes to process and understand the stated facts directly since it depends on its intelligent guesses to connect the parts of the text together. The third standard is 'Terminology' which refers to domain specific terms used heavily in scientific texts. Such terms create a challenge for both GT and students to understand because these technical terms "have one or many meanings in everyday language" while having a different, peculiar and precise meaning in scientific texts (Ali and Ismail, 2006. Retrieved from: https://urlzs.com/FnT8d). The fourth standard is 'Cohesion/Naturalness' of the resulting translation. In other words, scientific texts vary in the cohesive devices they employ to connect the ideas in a coherent way since they may be descriptive, persuasive, or informative. All these functions aim at putting the information in the hand of the readers without being redundant or consuming much time/effort to get the intended meaning. However, in the 9 case of GT software, these standards may be demanding. This echoes Al- Asali‟s words who points out that "the real problem with today‟s MT systems … is that they do not achieve the appropriate interpretation of certain parts of the source text (ST), which may depend, in one way or another, on the appropriate comprehension of the devices controlling them" (2000:xix). Therefore, the study will explore problems related to text type; the unique nature of scientific texts leads to machine translation problems when used by students. Thus, the research focus will be on: Firstly, the mistranslations made by GT in areas of describing particular biological processes or biological terms at sentence level including phrasal constituents; e.g. the mistranslation of "inheritance law" in the sentence “Mendel used the scientific approach to identify two laws of inheritance” into "قبٌَٕ انًٛشاس/االسس" which is a phrase that is used to refer to the process of genes movement from parents to their offspring and it has nothing to do with the concept that refers to the possessions‟ of a dead person. Secondly, the mistranslations of scientific texts at paragraph level according to the format of such type of texts from English into Arabic such as translating the English text: "water is an excellent solvent for many substances because of its polar nature. Polar substances and ions dissolve in water because opposite charges are attracted to the appropriate ends of water. Strictly hydrophobic molecules, including most lipids, do not mix well with water." into Arabic as: 10 . رزٔة انًٕاد ٔاألَٕٚبد انقطجٛخ فٙ "انًبء ْٕ يزٚت يًزبص نهكضٛش يٍ انًٕاد ثغجت طجٛؼزّ انقطجٛخ نهًبء انغضٚئبد انكبسْخ . ال رخزهظبء ألٌ انشعٕو انًؼبكغخ رُغزة انٗ َٓبٚبد انًٛبِ انًُبعجخانً ثشذح ، ثًب فٙ رنك يؼظى انذٌْٕ ، ثشكم عٛذ يغ انًبء." The Arabic paragraph contains errors such as the underlined short sentence that contains the passive construction "are attracted". The correct translation of the English structure is "األقطبة انًزؼبكغخ رُغزة انٗ َٓبٚبد انًٛبِ In other words, these "opposite charges" do not move by . انًُبعجخ" themselves; instead they are moved by an external force. This meaning is not expressed correctly in the Arabic text because the verb is active "رُغزة" not passive. There are also mistranslations of certain terms, despite the context makes their meanings clear such as: "opposite charges" which is translated as انًؼبكغخ" "انشعٕو instead of and ("األقطبة انًزؼبكغخ )عبنجخ ٔيٕعجخ" "hydrophobic molecules" which means "َبفشح نهًبء" not "كبسْخ" since the latter is not a scientific term. 1.5. Purpose of the Study The present research aims at examining the translation problems GT encounters when translating scientific biological texts found in Biology 1 textbook. It has been observed that GT makes errors in areas such as syntax, morphology, terminology, among others, (Hannouna, 2004, p.450). Thus, the research attempts to detect these errors at sentence and paragraph levels; the researcher will highlight the areas of eff./def. in GT performance to provide useful input about the quality of translation. Then, the researcher will propose solutions for the errors to enhance GT performance. Such 11 outcomes can be useful for the translation software developers and users alike since Ulitkin (2011) stated that: "despite their efficiency and outlooks, the translation software and electronic means cannot replace the human translator and guarantee high-quality translations". He believes that a good translation is a result of the combination between the translator‟s talents and experience on the one hand and the electronic technologies on the other hand; therefore, users cannot only depend on the use of machines in translation (Retrieved from: https://urlzs.com/3US1f). 1.6. Significance of the Study This research is of great importance for it deals with the most widely used machine translation system, GT. Thus, it aims at identifying the challenges GT encounters in scientific biological texts translation; it highlights the areas (syntax, morphology, terminology, cohesion) which are best treated by GT and the ones produced in low quality. Thus, it ends at suggesting recommendations to enhance GT performance concerning the level at which the users/software developers can best use/improve GT in this particular text type translation. 1.7. Limitations of the Study The present research is limited to GT language errors found in scientific biological texts translated form English to Arabic only. Yet, it does not tackle language errors committed by other machine translation programs. In addition, the present research will be concerned with testing 12 GT performance at both sentence and paragraph levels since evaluating GT at text level is beyond the scope of the present research. The researcher observed that GT commits errors at both sentence and paragraph levels; thus, it is hypothesized that GT may not perform well at text level since sentence and paragraph levels serve as the basic building blocks of any text. In other words, communicative texts cannot function without strong blocks. Thus, it would be better to evaluate GT performance at smaller levels at first to pave the road for GT evaluation at text level. Moreover, some figures and drawings might be inserted within the text to clarify the information being presented. Thus, users might resort to input GT with only separated/short paragraphs in lieu of longer texts to avoid such visual representations. Finally, the research focus will be on the external characteristics of GT, in particular, eff./def. areas in GT performance regardless of its internal characteristics which include speed, storage, or cost. 1.8. Research Questions In attempting to evaluate GT performance and investigate the translation problems encountered in scientific biological texts translation, it is important to answer these questions: 1. What are the grammatical errors made by GT when translating biological texts at sentence level? 13 2. Which cohesive markers are mishandled/mistreated by GT and which of these are reproduced correctly when translating biological texts at paragraph level? 3. What are the possible explanations for making ill-formed translations/inadequate system performance? 4. What are the main recommendations for improving machine translation in this particular text type? 1.9. Thesis Chapters: The present thesis contains five chapters; the sequence is summarized below. Chapter One is devoted to introductory information that describes the state of technology in the 21 st century in general and machine translation in particular. The chapter also includes: the problem statement, the purpose of the research, the significance of the research, the limitations of the research, the research questions. Finally, the chapters of the thesis. Chapter Two includes literature review about machine translation and its development. In addition, the methodology, data collection, and the framework in which the data will be treated along with in the present research. Chapter Three will present the data through analyzing and comparing the source text/input with Google translation/output based on 14 Catford‟s translation shifts (1965). Thus, the researcher will identify the errors made by GT at sentence level then recommend solutions for them. Chapter Four will discuss the errors made at paragraph level by GT based on Halliday and Hassan's model of cohesive markers (1976) which includes both grammatical and lexical devices, beside the types of paragraphs frequently used in scientific texts. Finally, the chapter will present suggested solutions for the encountered challenges. Chapter Five presents the conclusions; it is expected that the thesis presents conclusions regarding the quality of GT output, the reasons of its failure, and the effects of mismatches between the source text and GT output. The research also attempts to suggest recommendations for further research that could help in enhancing GT performance in scientific biological texts translation. 15 Chapter Two Literature Review and Methodology 2.1. Related Literature to Machine Translation Machine translation life-cycle is best likened to a baby who starts taking his/her first steps through leaning on one couch and another to follow his/her parents footprints. Yet, once that child balances his/her body and masters the walking skill, his/her parents can hardly catch and control his/her movement. Similarly, machine translation took its early steps after World War ΙΙ drawing on two main factors: First, the invention of the first computer in the 1950s and the desire to benefit from this invention in specific domains. Second, the rising tensions between the two main forces at that time: the United States and the Soviet Union (Russia now) manifested in the Cold War. Accordingly, the American government developed the first version of machine translation to break on the Russian communications and decode their military plans. Thus, machine translation early days were of one function, that is military (Errens, 2019. Retrieved from: https://urlzs.com/95Af3). However, this mono-function of machine translation started to fade with the need of global communication. In other words, machine translation started to impose itself on civilian domains "because of globalization, the rising of international trade, the expansion of mass media and technology, the increase of migration, and the recognition of linguistic 16 minorities" (Al-Khawalda, Al-Oliemat, 2014. Retrieved from: https://urlzs.com/XKfcN). In other words, machine translation shifts from being restricted to military interests to serve a number of civilian functions including translation of texts between different languages. Accordingly, the new discipline of "computational linguistics" came into light. This new discipline is defined as: a subfield of linguistics and computer science that is concerned with computer processing of human language. It includes automatic machine translation (MT) of one language into another, the analysis of written texts and spoken discourse, the use of language for communication between people and computers, computer modeling of linguistic theories, and the role of human language in artificial intelligence (AI) (Hannouna, 2004, p.53). Consequently, researchers tend to reflect on this new branch of linguistics through forming linguistic models about machine translation including: the way the machine works, error-tracking or eff./def. identification, accuracy levels of particular text types, etc. Such studies are carried out to see whether the machine could replace human translators, aid them, assist translation theorists who seek to test hypothesis using a particular translation program or software developers who want to promote their translation programs and impress the end users to trust their product (Hatim and Munday, 2004, p.120). 17 Thus, machine translation life is divided into two main stages: The first generation and the second generation. The former generation, usually referred to as the direct approach, refers to the early days of machine translation where the machine was fed with only a limited number of linguistic rules of each language and a bi-lingual dictionary. Thus, this indicates that machine translation was merely word-for-word replacement at first. In other words, the translation action is done directly between the languages in question provided that the machine has both the necessary rules and vocabulary. However, this generation received criticism since translation is not just word for word substitution. Yet, it is an art of crafting texts. This echoes Somers and Hutchins‟s words who stated: "From a linguistic point of view, what is missing is any analysis of the internal structure of the source text, particularly the grammatical relationships between the principal parts of the sentences" (1992. Retrieved from: https://urlzs.com/KpTMD). Moreover, the first generation input was limited to small levels only including: words, phrases and sentences which users cannot edit their output translation. In other words, this direct approach derives its name from the fact that it does not allow the users to interact with the machine for the translation action is done only through literal translation between the source text and the target text/output. This echoes Craciunescu, Gerding-Salas, Stringer-O'Keeffe‟s words who state: "The first versions of machine translation programs were based on detailed 18 bilingual dictionaries that offered a number of equivalent words in the target language for each word listed in the source language, as well as a series of rules on words order" (2004. Retrieved from: https://urlzs.com/GDTS5). Moreover, Somers and Hutchins maintain that this approach results in "frequent mistranslations at the lexical level and largely inappropriate syntax structures" (1992. Retrieved from: https://urlzs.com/KpTMD). Such errors may affect the meaning and take the source text away from its intended meaning. Accordingly, the criticism thrown at the first generation of machine translation led to the evolution of the second generation. In other words, machine translation has developed and it started to view the translation action as a process done along three dimensions: First, the machine decodes the meaning of the ST. Second, it re-encodes this meaning in the target language. In other words, decoding the meaning of the ST in its entirety requires that the machine interprets and analyzes all the elements of the text and transfers them into the target language. Thus, "this process requires in-depth knowledge of the grammar, semantics, syntax etc of the source language and the same in-depth knowledge is required for re- encoding the meaning in the target language" (Dubey, 2013, p.18). Third, this indirect approach started to allow the end user to interact with the machine. In other words, the direct relationship which holds between the input and the output in the first generation is broken by the interaction of the end user who started to take place in the second 19 generation since the machine starts to "ask the user to supplement its linguistic information, requesting confirmation of its decisions, or selection from among alternatives" (Somers and Hutchins, 1992. Retrieved from: https://urlzs.com/KpTMD). Errens asserts that "to meet that demand and clean up its data, Google Translate has an improvement function that led users enter suggestions for smoother translations" (2019. Retrieved from: https://urlzs.com/95Af3). Such mutual procedures between the machine and the end user enhanced the machine performance, in particular, in areas where the tested text type/genre is limited to a set of norms. This echoes Austermuhal‟s words who states: "the simple but effective system depends on careful pre- editing and the adoption of very controlled lexis and syntactic structures" (2001:163-4 as cited in Hatim and Munday, 2004, p.117). Thus, success stories started to flourish including the well-known story of the Canadian METEO system which was accomplished at the University of Montreal; it translates the weather bulletins automatically from English to French and vice versa for the Metrological Service of Canada. In other words, weather forecasts have specific norms that the machine could easily recognize including: single words and fixed expressions such as: sunny, low 7, wind southwest 10km/h. Moreover, Fromkin and Rodman state: "the greater recognition of the role of syntax and the application of linguistic principles over the past forty years have made it possible to use computers to translate simple texts grammatically and accurately between well-studied languages" 20 (1995, p.473). This proves that the machine could produce high accuracy rates in cases where the domain is specific enough. Moreover, this generation yields a number of fruitful concepts including: statistical machine translation and neural machine translation. Pestove states that the former "is based on the idea that if you feed a computer … enough data in the shape of parallel texts in two languages, it will be able to spot and recreate the statistical patterns between them. While the latter means "the source text is the set of specific features. Basically, it means that you encode it, and let the other neural network decode it back to the text, but, in another language". It is a new discipline and it is limited to nine languages only. However, neural machine translations "are helpless when the word is not in their lexicon" (2018. Retrieved from: https://urlzs.com/enQrN). Consequently, Google team launched their Google Translator Toolkit in 2009. This Toolkit is considered as a platform where translators upload texts and submit them for translation. Thus, Google resorts to use bilingual “parallel corpora”. This corpora consists of a pair of texts, where one text is a translation of the other. This interaction between GT and human translators led Google team to develop their “phrasebook” where users can save their translations. Thus, they started to enjoy the freedom to access their favorite translations of certain phrases and texts. This framework of GT is explained as: 21 To translate a text, Google Translate search different documentaries to find the best appropriate translation pattern between translated texts by human. This pattern searching is called SMT. Consequently, the quality of Google Translate depends on the number of human translated texts searched by Google Translate … SMT uses a bilingual text corpora which is a database of the sentences in both source language and target language. A large group of sentences translated from for example English to Persian will be provided for the machine to calculate the probability of the words. If for instance a word like X has probability 75% to be translated into Y, then it will choose Y as the translation of X (Karami, 2014). However, such new concepts do not indicate that the machine would replace human translators since a lot of research has been done on machine translation including GT. Yet, most of the attempts were sentence level focused and of randomly selected domains. For example, Key mentions types of errors committed by machine translation including: "words with multiple meanings, sentences with multiple grammatical structures, uncertainty about what a pronoun refers to, and other problems of grammar" (1980/2003 as cited in Hatim and Munday, 2004, p.116). In addition, Al-Khawalda and Al-Oliemat (2014) tested GT in translating twelve sentences with different temporal references from English to Arabic. They conclude that GT is confusing for non native English 22 speakers when it comes to temporal signals (Retrieved form: https://urlzs.com/XKfcN). Moreover, Al Shehab (2013) tested GT in translating six legal sentences from English to Arabic. Thus, he noted that GT could achieve partial equivalent yet it commits errors in translating the archaic English terms, the passive voice and the modal "shall". Such researches end with no suggested solutions for the errors being identified (Retrieved from: http://www.eajournals.org). In other words, there are errors that are still committed but systematic research on a specific genre may yield fruitful results which may enhance GT performance through developing lexicons containing only technical terms and constructions for the domain in question to reduce problems related to word-choice. In additions, post editing processes could be reduced through minimizing keyboard press rates since Craciunescu et al. (2004) state: "when translation tasks are repeated, … keyboard use can be reduced by as much as 70% with some texts" (Retrieved from: https://urlzs.com/GDTS5). In connection with machine translation at levels larger than the single sentence, only a handful number of researches attempted to test GT competency in translating long stretches of language between English and Arabic. ElShiekh (2012) examined GT performance at text level. He selected three genres/disciplines which are: advertisement, Koranic and literary texts. Yet, ads contain single words and phrases for they aim to be short, persuasive and eye catching instead of longer paragraphs or texts. In 23 addition, literary and koranic texts are loaded with emotive words that may pose a difficulty for GT (Retrieved from: http://dx.doi.org/10.5539 /ells.v2n1p56). Abdulhaq believes that "machine translation can handle the parole part of language but it can never master the langue part" (2016, p.8). ElShiekh classified the errors made by GT at sentence level such as: transliteration and mismatches of polysemous words; however, he neglected paragraph and text levels. In other words, he did not explore the idea of cohesive markers at paragraph and text levels even though his study promised to focus on text-level (2012. Retrieved from: http://dx.doi.org/ 10.5539/ells.v2n1p56). Al-Samawi notes the shortage of studies at paragraph or text levels: "most of the previous studies that tried to use error analysis in machine translation research were at the level of the single word or phrase. Like a rare bird, research on errors of machine translation at the text level may not be easy to find, especially in Arabic English" (2014. Retrieved from: https://urlzs.com/74KMz). However, Hatim and Munday state: "at present, there is a limited possibility of concordancing the search results or of configuring the search to select the specific text types or genres that are of interests" (2004, p.120). Accordingly, this research will take a step forward in highlighting the norms frequently used in scientific biological texts as a branch of scientific translation to test GT performance in this text type then identify the areas of eff./def. in its output for the features associated with scientific texts may do the mission promising. In other words, Craciunescu et al. 24 (2004) state: "machine translation is most useful with texts possessing the following characteristics: First, "Terminological homogeneity" which means that the meaning of terms does not vary. Second, "Phraseological homogeneity". It means that the ideas or actions are expressed or described with the same words. Third, short, simple sentences: these increase the probability of repetition and reduce ambiguity" (Retrieved from: https://urlzs.com/GDTS5). Moreover, Errens states: "It follows that for now, MT delivers best results with scientific and technical writing, anything that adheres more strictly to formulas. Wherever the use of language deviates from standard, where it is more colloquial or artistic, MT falters" (2019. Retrieved from: https://urlzs.com/95Af3). Accordingly, such requirements are available in scientific biological texts. Thus, this will make it easy for the researcher to test the areas of def. in GT performance which prevents it from reaching high accuracy levels then suggest solutions for those defects to enable GT to reach high accuracy rates as much as possible. 2.2. Methodology This research will adopt the qualitative approach in analyzing the selected data; so to collect relevant data, the researcher uses three successive steps. The first step is the translation of the biological texts using GT. The researcher decides to use ten texts which are taken randomly from the biology textbooks, particularly the Biology 1 textbook that is currently used at the Faculty of Science for 1 st year students. The researcher 25 selects ten texts to reach fair conclusions regarding GT performance and emphasis the fact that the errors committed are not just a coincidence; instead, they serve as indicators that there are serious defects in GT translation program. The texts are technical and they cover topics like the micro/organisms, particularly The Chemical Context of Life (p.1-4), Water and the Fitness of the environment (p.5-8), The Structure and Function of Large Biological Molecules (p.16-31), An Introduction to Metabolism (p.59-67), Cellular Respiration (p.68-78), The Cell Cycle (p.91-99), Mendel and the Gene Idea (p.108-117), From Gene to Protein (p.132-140). These topics contain various biological terms, descriptions and processes expressed in different syntactic structures such as: active and passive constructions, present tense forms, if- structures, etc. The second step is the examination of the resulting texts; this includes the classification of errors at two levels: Sentence Level: The research aims to start the evaluation with smaller units such as sentences, then moves gradually to longer stretches of language such as paragraphs. Thus, chapter three traces the recurrent errors made by GT at sentence level in the translation of the selected texts through analyzing them and categorizing the errors in order to pinpoint the semantic shifts that may result and alter the meaning of the scientific text. In other words, the research focuses on both the errors and the extent to which those errors affect or hinder the level of comprehension in each single sentence. 26 Paragraph Level: The research, in chapter four, aims at testing GT competency in deploying cohesive devices in English to Arabic scientific texts translation at paragraph level. The researcher identifies the paragraph types frequently used in scientific text including: definition paragraphs, process paragraphs and causality paragraphs. Thus, different types of paragraphs taken from the ten texts are inputted into GT to be converted to Arabic to carry out the evaluation. The paragraphs express biological information related to inheritance, water, cell division, enzymes and gene expression. The researcher examines these levels according to the features of scientific texts as a 'normative genre'. This genre includes universal features of scientific texts such as: technicality which refers to domain specific terms and structural clarity at the sentence/text level which includes issues such as: active and passive constructions and pronoun reference. The researcher also considers the presence of all functional/ morphological items such as connectors that show time, cause, etc in the selected texts. In the final step, based on GT performance at the above mentioned levels, explanations are given for each type of errors along with suggested solutions to enhance GT performance in this particular text type. Figure (1) gives the areas where GT errors may occur in the translation action. 27 Figure (1): The process of treating the selected data. 2.3. Theoretical Framework The present research relies mainly on the model of Machine Translation Evaluation (MTE) based on the International Standards for Language Engineering (ISLE's framework of taxonomy 3). The model distinguishes between two types of evaluation; The 'glass box' evaluation which considers GT a „glass box‟, so the evaluator looks inside the translation engine to see how the translation process is done. While the second type of evaluation is concerned with the relationship between the input and output. In this type of evaluation, GT is treated as a 'black-box', which means that the evaluator has to look at the input and output without taking into account the mechanisms by which the GT engine works (FEMTI, 2003 as cited in Hannouna, 2004, p.115). Thus, it is the 'black box' evaluation that will be adapted in this research since it helps in identifying areas of errors that may occur in the GT performance and which are deemed to be the main objectives of the present evaluation. In other words, the 'black box' evaluation focuses on the 28 external quality characteristics of GT. These characteristics of the outcome can be traced by comparing between the input and the output without the need to explore the system functions since the aim of the research is to identify the errors that may result in the translation. However; when the time comes to giving recommendations for the software developers on how to improve it, the research will shed light on the 'glass box' evaluation. Consequently, the research will draw on Catford‟s translation shifts (1965) including both „level shift‟ and „category shift‟ to highlight the grammatical errors that may result at sentence level. Thus, the researcher will compare the source text and GT output to identify the types of shifts that may take place. Yet, when the research shifts to paragraph level, it will examine four types of paragraphs frequently used in scientific texts, descriptive, process, causality and mixed paragraphs, according to Halliday and Hassan's model of cohesive markers (1976) employed to make the text cohesive and coherent. This model includes both grammatical and lexical cohesive devices. Thus, the researcher will trace the cohesive markers deployed in the source text to test whether GT reproduces them correctly in the output based on that model or not. 29 Chapter Three Linguistic Discordance in Google Translation 3.1. Introduction Linguistics can be best described as a musical instrument that is capable of playing different melodies according to a group of tunes set together to form a musical scale that is usually put in front of musicians to follow in big concerts. Similarly, the linguistic system, in any language, does the same function of that musical instrument since linguistics plays a major role in producing stretches of language that sound harmonious and meaningful to both the ears and minds of all language users. In other words, linguistics forms a scientific model that helps language users to form and communicate coherent and innermost thoughts since it controls the way people use their language to express their human experience. However, this experience may be different from one group of language users to another because "language … gives structure to experience, and helps to determine our way of looking at things, so it requires some intellectual effort to see them in any other way than that which our language suggests to us" (Halliday, 1970, p.143). Therefore, the effort and time spent in understanding the differences between the linguistic systems of all languages worldwide is affected by the components of the linguistic system of the language/s in question. The linguistic system of any language could be seen as an umbrella that covers 30 different areas, one of which is named 'grammar'. This area consists of two main parts which are: syntax and morphology. The former part deals with the arrangement of words in different language structures while the latter deals with the structure or build up of individual words. Notwithstanding, the division of this umbrella into two parts does not mean that they are different or unrelated to one another; instead, both syntax and morphology are much more interrelated that being contradicted because the term 'grammar' has been used to refer to the two concepts of syntax and morphology by many researchers including Baker who maintains that "grammar is organized along two main dimensions: morphology and syntax. Morphology covers the structure of words", while "Syntax covers the grammatical structure of groups, clauses, and sentences" (1992, p.83). However, language users do not use the grammatical categories that exist in their languages in the same way since these grammatical elements are not identical to all languages. In other words, each language differs from the other in the way it expresses the same message, for each has its own grammatical patterns that it imposes upon its users. Therefore, in the translation process, the variety among grammatical categories between source and target language poses a difficulty for human translators "because one cannot always match the content of a message in language A by an expression with exactly the same content in language B, because what can be expressed and what must be expressed is a property of a 31 specific language in much the same way as how it can be expressed" (Winter, 1961, p.98). Thus, the possibility of achieving equivalence at grammatical level has been examined by many researchers who discussed this dilemma in relation to translation. For example, the clear-cut differences between English and Arabic in relation to grammatical categories led Baker to identify five problematic grammatical categories between English and Arabic which are: number, gender, person, tense and aspect, and voice. She maintains that the differences at this level constitute a source of difficulty for human translators because such differences are capable of changing the content of the message in the process of translation. This change may lead to the addition of information which is not found in the original or omitting information from the source text. She concludes that this may happen when "the target language has a grammatical category which the source language lacks", or "if the target language lacks a grammatical category which exists in the source language" (1992, p.83). Moreover, Catford discussed the process of translation between two languages with different linguistic systems. He maintains that there are some "translation shifts" that may occur in the process of translating a text from a mother language to a target language. He states that such shifts may take place at two main levels: lexical and category. The former shift "occurs when an SL item has a target language equivalence at a different linguistic level from its own (grammatical, lexical, etc.) "while category 32 shift takes place at four levels which are: class, structure, unit and intra- system shifts. First, "class shift" which involves changing the class of a word, e.g., from an adjective to a noun. Second, "structure shift" which refers to altering the grammatical structure of a sentence, e.g. from active to passive. Third, "unit shift" which refers to switching the rank of: e.g. a clause to a phrase. Finally, "intra-system shift" "which occurs when translation involves selection of a non-corresponding term in the TL system …: e.g. an SL 'singular' becomes a TL 'plural'" (1965 as cited in Hatim, 2001, p.16). Such shifts may take place when the translator cannot adhere to the linguistic forms that exist in the source text. Therefore, Nord states that "linguistic problems arise from differences of structure in the vocabulary and syntax of second language (SL) and target language TL" (1991, p.88). In the same vein, Abbasi and Karimnia assert that most students commit errors while doing translation tasks at syntactic and morphological levels or what they call "syntactic- morphological errors" such as: errors in the use of the appropriate tense, errors in the use of articles and prepositions, and errors in the use of active and passive voice. They state that students while doing certain translation tasks, they do commit errors at different grammatical levels because they transfer the grammatical rules of their own language into the target language (2011. Retrieved from: https://urlzs.com/wpw5J). Even so, these problems may be solved since Hannouna believes that human translators can work hard and focus their effort on understanding 33 and mastering the grammars of the two languages involved in the translation task and they can "draw on general knowledge of the subject matter and the world to arrive at the intended meaning" (2004, p.54). Thus, the research hypothesizes that when it comes to machine translation, deficiencies related to grammatical categories would be more painstaking since machines may not have the ability to analyze all the grammatical categories between the languages involved especially if they are far-distant languages such as English and Arabic; such languages have more differences in their linguistic systems than similarities. On the grounds of this, linguistic errors may blow in while using machine translation because "MT is often impeded by lexical and syntactic ambiguities, structural disparities between the two languages, morphological complexities and other cross-linguistic differences" (Hannouna, 2004, p.54). In other words, machine translation errors may be attributed to the framework adopted by all machine translation programs that is known as the "Transfer approach". This approach consists of three steps which are: First of all, the scanning and analysis of the ST syntactic structures into their basic building blocks. Secondly, the transfer of those syntactic structures into the TL structure. Finally, the synthesis and restructuring of the output based on that TL structure which may yield one or a number of proposed translations for the same structure. Thus, this approach indicates that the process in machine translation programs is sequential so each step has to pave the road for the next one to take place to 34 produce optimal output that satisfies the users‟ desires (Somers, 1998, p.145 as cited in Hatim and Munday, 2004, p.117). However, in some cases, machine translation programs may commit errors at one or all of the above mentioned levels. This, in turn, leads to many errors in the translations made by the software in question. In other terms, there are shifts that may take place in the translation process which may result in semantic shifts that might change the meaning of the text/s in hand. Such errors may widen the gap between human translators and machine translation as two faces for the same coin since Brown defined errors as "a noticeable deviation from the adult grammar of native speakers, reflecting the inter-language competence of the learner" (2004, p.216). By analogy with humans' competency, errors indicate that the preprocessing mechanisms of the software in question are not doing well, so they need to be enhanced and well-fed. This echoes Al-Samawi‟s words who states: The question whether machine translation would replace human translation was and is still one of the primary concerns of research in machine translation. Researchers, in this regard, are between fear and confidence. Some look at it as a real threat to human translators; others are doubtful and base their doubt on the terrible errors committed by machine translation (2014. Retrieved from: https://urlzs.com/74KMz). Consequently, many researchers have attempted to identify and classify the errors produced by machines in relation to their linguistic 35 competency, in particular, grammatical categories at smaller levels such as word and sentence level. For example, Hannouna (2004) states that the machine commits errors in areas such as: category and word class, syntactic arrangement, tense, pronoun translation, suffixes among other areas. She evaluates the quality of three Arabic machine translation systems but her study focuses only on one single level of texts which is the sentence. In addition, Vilar, Xu, D'Haro and Ney (2006) identified five big classes of errors which are: "missing words, "word order", "incorrect words", and "unknown words and punctuation errors". Their study also focuses on smaller units (Retrieved from: https://urlzs.com/XtQ9w). In addition, Al-Samawi (2014) identified a number of errors made by GT at text level both at syntactic and morphological levels such as: "Violating subject-verb agreement (masculine and feminine; singular, dual, and plural; first, second, and third person)" , "Using a noun in place of a verb", "Using a verb in place of a noun" and "Omitting functional morphemes (i.e. prepositions, articles, conjunctions, pronouns, auxiliary verbs, deixis, etc.)." However, his study focuses on counting the number of errors at the first ten sentences in each text without explaining them or the semantic shift that took place in the texts. Also, he uses 10 texts from 10 different disciplines in his research; this in turn may not be objective or fair enough to make conclusions about GT performance in each field (Retrieved from: https://urlzs.com/74KMz). 36 Therefore, using GT to do certain translation tasks may yield a number of grammatical errors in different areas because when it comes to a software tool, e.g. GT, and linguistics, the situation may be vague and confusing for: Psychologists have told us that individuals acting alone do not normally cause too much trouble; it is only when they form into crowds that they become unmanageable. Similarly, individual lexical items. . . , can only stage sporadic strikes; it is when they group into long syntactic stretches that they begin really to launch all-out assaults on the translator (Wong, 2006, p.130). If it is so for human translators, then it would be at least the same or even far more challenging for GT since it does not have a sense of judgment or enough intelligence as humans do. Thereby, this chapter aims at detecting the grammatical errors that result in the translations produced by GT then classifying those errors under broad and sub-categories to demonstrate the effect of the grammatical shifts that take place then measure the semantic shifts and their effects at the comprehension level. Finally, this chapter will attempt to draw on the last step in the "Transfer approach" that was further developed by what is called "users‟ feedback button" nowadays. This button enables the end users to interact with the machine and contribute in enhancing the quality of the output. 37 In a nutshell, in the last step of the "Transfer approach" that is known as "synthesis", a number of machine translations including GT start to provide the end user with one or a number of suggested translations for the item in question. Accordingly, the end user starts to enjoy the freedom to: either accept the proposed translation provided by GT, reject it alright or come up with an acceptable translation by his/her own in cases where all the proposed equivalences provided by GT were fuzzy or imperfect. Thus, this chapter aims at suggesting acceptable translations for the items translated erroneously by GT to be added to the list of options provided by GT in case where those items were re-inputted by a different user. In sum, this chapter seeks to give recommendations to solve each type of errors in an attempt to pursue a good reputation of machine translation and provide the end users with acceptable translations. 3.2. Errors at Syntactic Level 3.2.1. Organization of Constituents at Phrase Level The Arabic noun phrase is formed when the noun precedes the attributive adjective while in English the attributive adjective precedes the noun according to the naturalness principle that controls the production of well-formed structures in both languages. However, the research observes, in some cases, that GT sticks to the same structure of the source text which results in forms that are irrelevant to the target language since they are incoherent constituents. In other words, it is a well-known fact that when it 38 comes to phrases, e.g., noun phrases, then the position of the modifier in relation to the modified noun affects the message. For example, Arabic starts with the noun first, then it gives information about it in a direct manner, while English triggers and prepares the readers/listeners for the theme since it describes the object first then reveals and unfolds its identity. These different ways of presenting facts and secrets about the same object in the two languages led GT to commit errors in this area as shown in Table (1) below: Table (1): Errors made at noun phrases level Ex. Source Text Google Translation 1. Mendel chose the garden peas for his studies because: garden peas are available in many varieties. نذساعزّ انجبصالء انؾذٚقخاخزبس يُذل يزٕفشح فٙ انؼذٚذ انجبصالء ؽذٚقخألٌ: يٍ األصُبف. 2. In a chemical reaction, all of the atoms in the reactants must be present in the products. The reactions must be balanced. انكًٛٛبئٙ ، ٚغت أٌ ركٌٕ فٙ انزفبػم عًٛغ انزساد انًٕعٕدح فٙ انًٕاد انًزفبػهخ يٕعٕدح فٙ انًُزغبد. ٚغت يزٕاصَخ سدٔد انفؼم.أٌ ركٌٕ The noun phrase in the English text which consists of the noun "garden peas" in Ex.1 is not translated correctly. In other words, a structure shift takes a place because GT changes the order of the phrase in the translation to appear as a noun phrase that is made up of two nouns which are the "garden" and the "peas", in Arabic " انؾذٚقخانجبصٚالء ". This results because GT resorts to literal translation which, in turn, derives it to treat the word "peas" in its current position in the English sentence as if it were a noun and the garden which is its modifier as an adjective. However, the word "garden peas" in the underlying structure of the noun phrase here serves as a noun which in Arabic means "ؽجبد انجبصٚالء". The noun here is 39 used to specify the type of seeds that Mendel selected for his experiments. In other words, the noun phrase is employed to make the idea more specific and precise. Thus, this shift proves that GT fails to recognize this underlying structure and the way Arabic makes it manifest in its surface structure so this leads to translating the two words as two nouns and this results in a form that is not familiar in the Arabic language which is two consecutive nouns each with the definite article "انجبصٚالء انؾذٚقخ". That is to say, GT fails to analyze the noun phrase "garden peas" as a phrase with one noun, in Arabic " بصٚالءؽجبد انج ". Thus, the translation produced by GT may lead students to stand for a while to rearrange the sentence and allocate each word in its appropriate position to get the message. This, in turn, may weaken the translation of scientific texts since Ali and Ismail maintain that technical terms create a challenge for students to understand because these technical terms "have one or many meanings in everyday language" but in a scientific text, they have a different, peculiar and precise meaning (2006. Retrieved from: https://urlzs.com/FnT8d). Moreover, Ex.2 shows that GT does not stick, in some cases, to the same order that exists in the source text. In other words, GT does its own guesses to translate a certain sentence regardless of how words are combined in the same sentence in the source text. Thus, Ex.2 shows that GT fails to order the constituents of the sentence in the right way since English starts with the attributive adjective first then comes the noun. 40 However, this is not the case in Arabic since Arabic starts with the noun which in this case is " سدٔد انفؼم" then comes its adjective which is " يزٕاصَخ". Another dilemma is that GT fails to order the constituents of the noun phrase in the right way as Ex.3 in Table (2) shows. The noun phrase "electron transport chain" in the second translation provided by GT is not translated correctly since GT fails to recognize its head which is the noun "chain". In addition, providing two different translations of the same noun phrase despite the fact that it is an established scientific term: one that is right while the other is wrong indicates that GT is still unsure about the correct translation. Table (2): Errors in ordering the noun phrase. Ex. Source Text Google Translation 3. Electron transport chain accepts electrons from the breakdown products of the first two stages (most of them via NADH) and passes these electrons to an electron transport chain. عهغهخ َقم اإلنكزشَٔبدرقجم 1 يٍ يُزغبد انزكغٛش فٙ اإلنكزشَٔبد انًشؽهزٍٛ األٔنٍٛٛ )يؼظًٓب ػجش NADH ِاإلنكزشَٔبد ( ٔرًشٚش ْز إنٗ عهغهخ َقم اإلنكزشٌٔ. اإلنكزشٌٔ عهغهخ انُقم رقجم 2 اإلنكزشَٔبد يٍ انًُزغبد آَٛبس انًشؽهزٍٛ األٔنٍٛٛ )يؼظًٓى ػجش NADH ٔرًش ْزِ اإلنكزشَٔبد ) إنٗ عهغهخ َقم اإلنكزشٌٔ. Therefore, producing correct noun phrases requires that GT draw a map for the items in question in order to decide on the function of elements then rearrange them without any loss or distortion that may threaten the quality of the output. For example, in the translation of the above mentioned phrases, GT should have done it without any change in the order 41 of the constituents since the original is clear and precise, so it would be safer for GT to analyze how phrasal slots are ordered in both English /(Art.)(Adj.)N./ and Arabic /(Art.)N.(Adj.)/ then map unto them to produce correct structures. Therefore, to handle this phrase-level translation anomaly, the researcher suggests a procedure described in Figure [2]. First, the sentence provided by the user is to be split (tokenized) into tokens (words). Then, these tokens are passed on to a Part-of-Speech Tagger (POS) that finds the type of each token, i.e., whether a word is a verb, noun, adverb etc. Using these token types, one can find out whether a sentence complies with the "Art. + Adj. + N." pattern or not. If yes, then the nouns part undergoes the step of bigrams and trigrams extractions, where bigrams and trigrams are phrases consisting of 2 and 3 tokens, respectively. The translation of these noun phrases are looked up from a specialized lexicon. For Ex.1, the direct translation of the phrase "garden peas" would be "انجبصٚالء انؾذٚقخ". Now, the user can detect this anomaly and give his/her feedback by suggesting a new translation "ؽجبد انجبصٚالء", which would then be maintained in the lexicon. 42 Figure (2): Processing of noun phrases. 3.2.2. Organization of Constituents at Sentence level The simplest sentence in English consists of SVO/C (subject, verb and object/complement) and conveys a certain message. However, when it comes to GT, it is clear that it commits errors at this level, in particular, with the arrangement of the elements that make up the whole sentence. This is due to the nature of the two languages and the features associated with each of them. That is to say "English is basically an analytic language, i.e., it shows syntactic relationships by word order and function words. Arabic is basically synthetic, i.e., it shows syntactic relationships by its frequent and systematic use of inflected forms" (Hawkins, 1980 as cited in Saraireh 2014). This diversity in ordering the constituents of the sentence may hinder the process of understanding the message since it may drive GT 43 to commit errors that cause structural ambiguity which in turn yields different interpretations of the same message as shown in Table (3) below: Table (3): Errors made at sentence level. Ex. Source text Tran. Google Translation 4. The number of protons determines the atomic number .انؼذد انزس٘ػذد انجشٔرَٕبد ٚؾذد 5. Electron transport chain accepts electrons from the breakdown products of the first two stages (most of them via NADH) and passes these electrons to an electron transport chain. اإلنكزشَٔبدعهغهخ َقم رقجم 1 يٍ يُزغبد انزكغٛش اإلنكزشَٔبد فٙ انًشؽهزٍٛ األٔنٍٛٛ )يؼظًٓب ( ٔرًشٚش ْزِ NADHػجش اإلنكزشَٔبد إنٗ عهغهخ َقم اإلنكزشٌٔ. اإلنكزشٌٔ عهغهخ انُقم رقجم 2 اإلنكزشَٔبد يٍ انًُزغبد آَٛبس انًشؽهزٍٛ األٔنٍٛٛ )يؼظًٓى ػجش NADH ٔرًش ْزِ اإلنكزشَٔبد ) عهغهخ َقم اإلنكزشٌٔ.إنٗ In English, there is only one type of sentences; one that starts with the subject followed by a verb along with its complement and it is called the verbal sentence; while in Arabic, there are two types of sentences: equational and verbal. The former starts with a noun followed by a predicate while the latter begins with a verb followed by a subject and a complement. However, in Ex.4 in Table (3), the English sentence starts with the subject which is "the number of protons" followed by the verb "determines" and its complement. This sentence follows the unmarked pattern of SVO in English. However, the structure of the English sentence is reversed in the Arabic sentence by GT leading to a semantic shift that results in two readings of the Arabic sentence: it's either that the atomic number is the one that decides the number of protons or the number of 44 protons is the one that is responsible for deciding on the atomic number. In the Arabic sentence, both nouns- the number of protons and the atomic number- would stand in both subject and object positions. Otherwise stated, the problem lies in that readers of this sentence will get confused about the correct meaning of the sentence, especially those readers who are not well-acquainted with the Arabic syntactic rule which states that: if there are two consecutive nouns in a verbal sentence, and the sentence does not use case markers/inflections to distinguish between them, then the subject is the first noun and the object is the second noun. In other words, the process of sorting them is going to be done according to the order in which they appear in a given sentence. To resolve this ambiguity, GT needs to be improved by adding inflections to the Arabic sentence. The inflections (diacritics) to be used in this case are: damma to indicate the subject position and fatha to indicate the object position; however, as GT does translate the input, the Arabic sentence has two possible readings, the thing that weakens the quality, precision and level of comprehension of the translated text. Moreover, Ex.5 in Table (3) shows that reordering the elements in a given sentence may produce redundant stretches of language such as using two similar nouns immediately after one another in the same sentence. Thus, this may give away one meaning from the sentence which, in turn, may change or alter the intended meaning in the source text since repetition may lead to ambiguity. Thus, in Ex.5, the English sentence starts with the 45 subject "electron transport chain" but GT inverted the order in the Arabic sentence leading to two similar forms following one another. Accordingly, when it comes to sorting out these two nouns and assigning them the appropriate inflections (diacritics) in Arabic, the result will be two nouns with the same inflection which is: Kasrah " االنكزشَٔبدِ االنكزشَٔبِد ". This redundancy may lead students to realize or perceive it as a typo so they would read the sentence as "رقجم عهغهخ َقم االنكزشَٔبد". This means that they might omit the second word االنكزشَٔبد" " in the Arabic sentence " رقجم عهغهخ زشَٔبِد اإلنكزشَٔبدِ َقم اإلنك " since they might be deceived or misled by the wrong ordering produced by GT which results in producing redundant words. Therefore, in the case of translating active sentences where both the subject and the object contain similar nouns and the inflections do not help in clarifying the meaning, it would be better and safer for GT to: either maintain the order of the source text to prevent any speculations about how the source text might be like to get the bulk of the message or detach the two constituents using a verb. Thus, it would be better for GT to produce nominal sentences that start with the subjects which are: “the number of protons” and "electron transport chain" in the examples then give information about their function. These examples prove that GT still commits errors in ordering the constituents of both the phrase and sentence due to the differences of the rules that combine these patterns such as NP, SVO, etc. together in the 46 language/s in hand. These erroneous and random switches between such patterns will lead to errors, such as producing a sentence or a phrase with more than one interpretation; producing forms that do not exist in the target language or wrong ordering of the name of the scientific term. All these errors weaken the level of comprehension and therefore the quality of the translation outcome. In the current situation, the GT users will rely on their intuition to make the sentence sound coherent and cohesive. Therefore, to solve this problem, GT should adopt a two step procedure described in Figure [3]. The first step is a "text preprocessing step" in which GT analyzes the input sentence into tokens, in a "Tokenizer" then those tokens will be marked along with their grammatical categories in a "POS" tagger. This will help in identifying the sentence pattern employed in the input based on the order of the elements in the sentence under study. For example, the sentence in Ex.4 conforms to the unmarked sentence- pattern in English: /sub.+ v.+ obj./ for it starts with a NP and ends with a NP, "the number of protons" and "the atomic number", respectively. Secondly, GT needs to decide on whether the two nouns are inflected for case or not. If not, then GT will undergo a second step to preserve the meaning of the ST through allocating both the sub. and the obj. to their correct positions in the present sentence pattern. In other words, the proposed system should split the two nouns "the number of protons" and "the atomic number" in the sub. and obj. slots in Arabic, respectively. Thus, the system should be programmed to map the unmarked pattern of /sub.+ v. 47 + obj./ in English to /v.+ sub.+ obj./ pattern in Arabic. Accordingly, users can suggest a precise translation through adding the Arabic diacritics to the sentence such as: fatha and damma to avoid any ambiguity that may weaken the quality of the output. Thus, such diacritics will be added to a specialized lexicon to be reused again to solve the confusion that may occur in identifying the subject and the object in the GT output as shown in Figure [3]: Figure (3): Mapping of nominal sentences to verbal sentences. However, in cases where the subject and the object contain similar words which in turn may lead to redundancy in the output, GT should be programmed to block those two nouns from following one another in a procedure described in Figure [4]. First, GT should undergo the same "text preprocessing step" explained previously to analyze the input sentence. 48 Second, GT should decide on whether there are similar words in the input between the sub. and the obj. or not. If yes, then GT needs to map the input sentence into a nominal sentence with the pattern /sub.+ v.+ obj./. This sentence pattern will separate the subject from the object by the verb which in turn will reduce the redundancy in the output. At this stage, users can assist GT by providing it with the correct nominal pattern of the sentence under study. Finally, "a diacritics extraction step" will take place to emphasize both the subject and the object. Such procedures will allow the users to enjoy translations of high quality and precision when it comes to translating active sentences. Figure (4): Mapping of nominal sentences. 49 3.2.3. Erroneous Shifts from Verbal to Nominal Sentences in Arabic The simplest sentence in any language is made up of different parts of speech such as nouns, verbs, adjectives and adverbs. However, these categories may be problematic to GT since languages differ in the way they derive such parts of speech and the way they combine those elements together to communicate a message as the sentence in Table (4) below shows: Table (4): Errors of turning a sentence to a noun phrase Ex. Source Text Google Translation 6. Some isotopes are radioactive ثؼض انُظبئش انًشؼخ GT fails to order the constituents of the sentence because it does not recognize the auxiliary "are" in Ex.6, GT neglects it in the process of translation, this in turn leads to translate the sentence into a noun phrase, this leads to a problem in comprehending the sentence ;"انُظبئش انًشؼخ" because GT replaces the adjective by a noun phrase. In such case, the reader may search for a main verb after the noun but s/he finds nothing since GT drops the auxiliary "are" from the sentence. Thus, this turns the verbal sentence into a noun phrase. These kinds of errors where a verb is not translated directly as a verb; instead, it is turned into a noun are classified as structure shifts. Therefore, the researcher suggests that GT be programmed to translate the verb to be (Aux.) and the adjective that follows it in a verbal sentence in English into an adjective which makes the sentence equational in Arabic as explained in Figure [5]. In other words, Arabic does not use 50 such type of pseudo-verbs which include: is, am, are, etc. to introduce adjectives. Therefore, GT should undergo the prepossessing step first to identify the aux. and then translate it and the adjective that follows it in English into an equational sentence in Arabic which consists of a subject and a predicate. Second, GT should undergo an extraction step. In other words, the Arabic predicate (adj.) has to be derived from the /aux./ and the /adj./ in the English sentence. However, in Ex.6, GT neglects the presence of such pseudo-verbs; this results in "genitive structures" in English, " يضبف Such form may not help ."ثؼض انُظبئش انًشؼخ" :in Arabic, such as "ٔيضبف انّٛ the readers to distinguish or identify the topic of the sentence which is called the theme and the comment that tells the readers more about the theme, that is called the rheme. Therefore, users could add their suggested translation for the sentence which is "َظبئش يشؼخ" to be maintained in the lexicon to be reused again in similar constructions. 51 Figure (5): Mapping of sentences with verb to be. 3.3. Errors at Morphological Level 3.3.1. Inappropriate Choice of Suffixes Affixes in English are of three types: prefixes that are added in front of the word, infixes which are put in the middle of words, and suffixes that come at the end of the word. Each type has a function which helps in constructing a precise meaning. 3.3.1.1. Inflections Attached to Sub-headings In some cases, GT fails to add the definite article "the", "ال ", in Arabic. This is due to a well-known fact that in English, when people want to refer to things in general they use the plural form while in Arabic the situation is different. In other words, Arabic employs the "ال" to refer to 52 things in general while English uses it to refer to or specify the referent/topic. However, in the example in Table (5), GT fails to add the "ال" to the noun phrase "chemical reactions" in the Arabic text making the noun phrase indefinite since English and Arabic differ in the process of assigning the definite article to nouns according to the function of the sentence. In the English sentence below, the noun phrase which starts with a capital letter /C/ aims to refer to chemical reactions in general for it introduces the topic of the subsequent sentence. In other words, GT should be programmed to attach the "ال" to the noun phrase "Chemical reactions" for it functions as a sub-heading. However, Ex.7 in Table (5) proves that GT still needs to be enhanced in this area since the Arabic noun phase indicates that the reactions are unknown which in turn makes the topic vague and not specific enough due to the absence of "the" which adds some kind of familiarity and smoothness to the sub-sequent sentence as the example in Table (5) shows: Table (5): Errors in treating the definite article Ex. Source Text Google Translation 7. Chemical reactions In chemical reactions, chemical bonds are broken and reformed, leading to new arrangements of atoms. رفبػالد كًٛٛبئٛخ فٙ انزفبػالد انكًٛٛبئٛخ ، ٚزى ركغٛش انشٔاثظ انكًٛٛبئٛخ ٔإصالؽٓب ، يًب ٚؤد٘ إنٗ رشرٛجبد عذٚذح نهزساد 53 3.3.1.2. Inflections Attached to the Verb A clear example related to errors in affixation is the main verb in the Arabic sentence below. GT fails to recognize that the /s/ in its current position is used to indicate a verb that is both active and present and has a singular subject. However, GT treats the /s/ as a grammatical category that is used to indicate the plural form of the noun /function/ as shown in Table (6) below: Table (6): Errors in the selection of parts of speech. Ex. Source Text Google Translation 8. In a multi-cellular organism, cell division functions to repair and renew cells that die فٙ انكبئُبد يزؼذدح انخالٚب، االَقغبو انخهٕ٘ إلصالػ ٔظبئف ٔرغذٚذ انخالٚب انزٙ رًٕد This example shows that GT fails to distinguish between words that have similar forms in both plural and simple present cases. In other words, a category shift that changed the category of the word "functions" form a verb in the English sentence to a noun in the Arabic text took place. However, the word function is used: either as a verb or a noun since /function/ could have two forms, this means that function could be used both as a verb which means to serve/work or as a noun which means a job/task. Thus, this duality of forms of the same word leads the reader to realize that the output sentence has no verb since it is mistranslated by GT into a noun while in the source text; it is intended to serve as a verb and not a plural noun. This makes the Arabic sentence appear as if it were verb-less which in turn does not help to get the message in the Arabic text since it is 54 not allowed to identify the verb in a given sentence using one's intuition especially in scientific texts. Thus, it is important that GT developers feed GT with a procedure described in Figure [6] to enable it to handle all the words that have the same form in both plural and present tense, with a 3 rd person, singular subject cases. First, GT will undergo the text preprocessing step to decide on the function of the word in question and what it aims to achieve. That is GT needs to process both the position of the word in the sentence and the surrounding elements that shape its identity. For example, the position of the word in question which is "function" in Ex.8 shows that the word is used as a verb for the subject "cell division". In other words, the verbs "repair" and "renew" could not be the verbs for the subject "cell function" because there is the particle "to" before them. Thus, GT needs to answer this question: "Does the sentence have a verb for the subject “cell division?". If no, then GT will extract a verb that agrees with the present subject in person, number, etc. However, at this stage, GT cannot derive the appropriate form of the verb "function" in Arabic since it translates it as a noun not a verb, in Arabic "ٔظبئف". Thus, at this stage, users can suggest a translation for the word "function" as a verb which is "ٗٚؼًم ػه". Accordingly, this translation would be maintained in the lexicon to be reused again in similar circumstances. 55 Figure (6): Processing of words with similar forms in plural and present tense. 3.3.2. Passive Constructions Passive constructions are used heavily in scientific texts to achieve certain purposes. Swales states: "the passive can be used to give the necessary information in the best possible way; impersonally, concisely, objectively, and giving importance to the most important facts" (1971, p.41). However, when considering GT, there is a number of errors that take place in the translation of certain sentences form active to passive. These errors include: 3.3.2.1. Failure to Distinguish between the Simple Past and Passive Inflections In some cases, GT mistranslates sentences that contain a passive construction by using a simple past form in place of a passive. In other 56 words, GT fails to distinguish between the simple past form and the participle form that comes after the auxiliary in passive constructions- passive adjectival-. This in turn may affect the truth value of the sentence as Table (7) below shows: Table (7): Errors in recognizing the passive construction Ex. Source Text Google Translation 9. a disaccharide consists of two monosaccharides joined by a glycosidic linkage. يٍ اصٍُٛ يٍ انغكشٚبد ٚزكٌٕ دٚغبكٓبسٚذ سثظ اَضًذ إنٗاألؽبدٚخ انزٙ glycoside ٚزكٌٕ دٚغبكٓبساٚذ اصٍُٛ يٍ انغكشٚبد .انشثظ غهٛكٕصٚذٚخ اَضى انٛٓىاألؽبدٚخ The sentence in Ex.9 states a fact about the components of "disaccharides", so the verb that is usually used to refer to factual issues in English is the simple present not the simple past since the use of the simple past "joined" may indicate that the components change or that the process of producing disaccharide is done in the past and now it is over. Thus, this is not acceptable in science language since things have to be clear, exact and fixed to establish a kind of mutual trust between the readers and the text/s in hand. In other words, the verb "joined" does not indicate a simple past but it is a passive construction that is erroneously recognized by GT as a simple past. This shows that GT fails to make use of the present key words in the sentence such as the preposition "by" and the verb "consists" to understand that the sentence is talking about actions that happen at a present situation or something that takes place whenever there is a process of disaccharide production. 57 Basically, GT fails to recover the underlying structure of the sentence to translate it as a passive construction so it goes with the superficial structure which is the simple past. However, this leads to errors in the translation of the passive construction since reading the Arabic sentence may lead to a conclusion about a process that happened in the past due to the use of the simple past form of the verb "ٗاَضى ان" and not a process that could be repeated whenever disaccharides are formed since the underlying structure of the sentence is: "are joined", "ٗرُضى ان" not "joined", " ضًذاَ " in Arabic- as a past form. Thus, GT should be programmed to benefit from the words in the textual context in its translation box like: "by" and the verb "consists" in the present case. Such words should help GT recognize the verb in question as a passive form not a past form. Another issue is that GT neglects the passive construction that is used to describe certain objects in given sentences leading to verb-less sentences that do not have an obvious meaning as the example in Table (8) below shows: Table (8): Errors at passive construction arrangement level Ex. Source Text Google Translation 10. Substances dissolved in a solvent are called solutes. .رغًٗ انًٕاد انًزاثخ فٙ يزٚت In Ex.10, the passive construction is not identified by GT which results in an incomplete sentence since the sentence suggests that there is a name for the materials that dissolve in water but this name is not given in the Arabic sentence for GT fails to put the sentence in the correct order to 58 come up with a correct passive construction. Thus, readers may expect to find a concept that refers to those substances that dissolve in a solvent but they end up with an incomplete sentence. This proves that GT fails to parse the relative clause that is used to describe the term "solutes". In other words, GT fails to retrieve the underlying structure of the relative clause which states that substances that are dissolved in a solvent are called solutes. GT fails to come up with a linking word that helps to get a meaningful sentence which in this case could be the linking pronoun "ٙانز". Thus, an acceptable Arabic translation that needs to be inserted among the options list may be: " رغًٗ انًٕاد انزٙ رزٔة د يزاثخفٙ يزٚت/يؾهٕل يٕا " . 3.3.2.2. Passive Inflections GT fails to use the appropriate inflections that indicate that the sentence is passive as in the verb "رُغزة" which does not have any inflections to indicate whether it is an active verb " زةغ ُ ر " or a passive one " زةغ رُ " as the sentence in Table (9) shows: Table (9): Errors at passive inflections level Ex. Source Text Google Translation 11. Polar substances and ions dissolve in water because opposite charges are attracted to the appropriate ends of water. رزٔة انًٕاد ٔاألَٕٚبد انقطجٛخ فٙ انًبء ألٌ انشعٕو إنٗ َٓبٚبد رُغزةانًؼبكغخ انًٛبِ انًُبعجخ. The underlying structure of the Arabic sentence is that polar substances go and move out of their will while the English sentence states 59 that they are moved by an external force. In other words, they do not move out of their will instead they are attracted by a non-mentioned force. This lack of inflections leads to two readings of the constituent "are attracted". However, in light of this structure or other kinds of structures, the word two has to disappear and replaced by oneness particularly in scientific texts as a genre. Therefore, the researcher suggests that passive constructions should receive double attention from the software developers since such type of texts is loaded with passive structures for the focus in scientific texts is on the scientific facts rather than the ones who came up with those facts. Thus, the researcher suggests a procedure explained in Figure [7]. First, the sentence passes through the prepossessing step to analyze all its elements. In other words, GT should parse the sentence correctly through identifying the grammatical subject and object. Second, GT should identify the pattern of the sentence: "whether is it an active /sub. +v. +obj./ or passive /obj.+ v.+ sub./?". Next, if the sentence conforms to the pattern of /obj.+ v. +sub./, an extraction step of the appropriate passive inflections should take place. However, at present, GT cannot insert the appropriate passive inflections. Thus, users can suggest a translation for the passive sentence below through assigning the appropriate diacritics in Arabic to make the sentence meaningful. The unmarked diacritic used in Arabic to indicate the passive construction is: damma which is attached to both the verb and the grammatical subject that follows it. Such suggested translations would be 60 kept in the lexicon to be reused again by different users having the same input. Figure (7): Processing of passive constructions. 3.3.3. Unnecessary Derivation for Certain Words GT randomly selects a word in the input sentence then derives new forms from that word and inserts those forms in the output. However, this derivation is sometimes done at the expense of other functional/content words in the same sentence. Thus, this may lead to loss in meaning and redundancy in the output such as: "يشؾَٕخ انشؾُخ" and كًفبػم يزفبػم" " in the examples in Table (10). Such errors occur when GT fails to identify and choose the correct part of speech to be used and that best completes the sentence. 61 Table (10): Redundancy due to unnecessary repetition Ex. Source Text Google Translation 12. Ionic bonds are electrical attractions between oppositely charged ions. انغُذاد األَٕٚٛخ ْٙ ػٕايم عزة كٓشثبئٛخ ثٍٛ إَٔٚبد يشؾَٕخ انشؾُخ 13. Aerobic respiration consumes oxygen as a reactant to complete the breakdown of a variety of organic molecules (aerobic is from the Greek aer, air, and bios, life). ٚغزٓهك انزُفظ انٕٓائٙ كًفبػم يزفبػماألٔكغغٍٛ إلكًبل رؾهٛم يغًٕػخ يزُٕػخ يٍ انغضٚئبد انؼضٕٚخ )انٕٓائٛخ ْٙ يٍ انٕٓاء انغٕ٘ ٔانٕٓائٙ َٕبَٛخ ، انؾٛبح(.ٔانغٛش انٛ In Ex.12, the constituent "oppositely charged" is rendered incorrectly as "يشؾَٕخ انشؾُخ" while it means "يزؼبكغخ انشؾُخ". Thus, these unnecessary derivations from the word "charged" took the place of the word "oppositely". Accordingly,