Research Article | | Peer-Reviewed

A Comparative Study on the Translation Quality of Chinese Diplomatic Discourse by NMT and LLMs Based on Multidimensional Quality Metrics

Received: 26 September 2025     Accepted: 10 October 2025     Published: 27 October 2025
Views:       Downloads:
Abstract

Chinese diplomatic discourse plays a crucial role in articulating China’s position and enhancing its influence in global forums. However, machine translation (MT) often struggles with culturally nuanced and abstract expressions, highlighting the need to compare various advanced MT tools. This study assesses and compares the translation quality of Neural Machine Translation (NMT) systems and Large Language Models (LLMs) in translating Chinese diplomatic texts, focusing on the 2025 China-US tariff statements by China’s Foreign Ministry Spokesperson Lin Jian, with China Daily’s official English versions serving as references. Four NMT tools (Niutrans, Youdao, Google, DeepL) and four LLMs (DeepSeek, Ernie-4.5, ChatGPT-4.0, Gemini) were examined. Using the Multidimensional Quality Metrics (MQM) framework, the study evaluated translations, especially for phrases like “奉陪到底” (fight to the end) and “得道多助,失道寡助” (A just cause enjoys abundant support while an unjust one finds little). Results show that LLMs outperform NMTs: 50% of LLMs (DeepSeek, Ernie-4.5) accurately translated both phrases, while only 25% of NMTs (Google) did so for “奉陪到底,” and none for “得道多助,失道寡助.” Both systems faced issues such as undertranslation, omission, and a lack of diplomatic formality. The findings suggest that LLMs have greater potential to handle cultural nuances and abstract content in diplomatic texts, providing insights for enhancing domain-specific MT training and striking a balance between accuracy and acceptability in conveying Chinese diplomatic messages.

Published in International Journal of Applied Linguistics and Translation (Volume 11, Issue 4)
DOI 10.11648/j.ijalt.20251104.12
Page(s) 107-115
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2025. Published by Science Publishing Group

Keywords

Neural Machine Translation (NMT), Large Language Models (LLMs), Translation of Chinese Diplomatic Discourse, Translation Quality Assessment

1. Introduction
Characterized by its rich content, diverse forms, and distinct features, Chinese diplomatic discourse serves as a crucial channel through which the global community can gain an authentic understanding of China . Therefore, improving the quality of translation and the effectiveness of dissemination of Chinese diplomatic discourse is essential not only for allowing the world to hear China’s voice but also, more importantly, for promoting global understanding and engagement with China’s messages, thereby strengthening China’s international discursive influence and building a positive global image .
On February 1, 2025, the U.S. government announced a 10% tariff on all Chinese goods imported into the United States, citing fentanyl as a pretext . In response, China enacted retaliatory tariffs on specific goods from the U.S., effective February 10, 2025 . The trade dispute temporarily eased during the China-U.S. Geneva Economic and Trade Talks on May 12, 2025 . Throughout this period, the incident drew significant international attention. Lin Jian, one of the spokespersons for China’s Ministry of Foreign Affairs, engaged in serious negotiations with the U.S. side.
This study takes this incident as the starting point. It uses Lin Jian’s statements about the incident as the source text. It references translations published on the official English website of China Daily. Additionally, it employs four leading Neural Machine Translation (NMT) systems (Niutrans, Youdao, Google, and DeepL) and four Large Language Models (LLMs) (DeepSeek, Ernie-4.5, GPT-4.0, and Gemini) as translation tools. The primary objective is to evaluate the translation performance of these NMT systems and LLMs in the context of Chinese diplomatic discourse translation.
From a theoretical perspective, this study examines how various translation technologies handle Chinese diplomatic discourse. It offers innovative analytical frameworks and methodological insights for related academic disciplines. The results contribute to advancing translation theory and practice while also encouraging interdisciplinary integration and innovation. From a practical perspective, this research provides meaningful contributions to the global communication of China’s diplomatic discourse. By refining translation strategies in NMT systems and LLMs, it effectively minimizes cross-cultural misunderstandings and biases. This, in turn, boosts China’s influence in international discourse and enhances its soft power. Moreover, the study is committed to accurately conveying culturally embedded values within Chinese diplomatic discourse. This helps reduce intercultural friction, foster harmonious international relations, and promote a more constructive environment for global cooperation.
Building on this foundation, the study explores two key research questions:
1) How do various NMT systems and LLMs approach translating Chinese diplomatic discourse?
2) What are the comparative performance metrics of different NMT and LLM approaches in translating Chinese diplomatic discourse?
2. Literature Review
2.1. The Study on the Translation of Chinese Diplomatic Discourse
Academic research on the translation of China’s diplomatic discourse primarily focuses on three main dimensions: corpus development, translation strategies and methodologies, and cross-cultural communication.
Research on corpus development has mainly focused on analyzing diplomatic discourse to clarify translation strategies for diplomatic dissemination . Studies in this field have shown that translating diplomatic discourse involves multiple dimensions, including linguistic, cultural, ideological, and cognitive aspects. For example, corpus-based studies reveal how translations exhibit attitudinal changes, employ metaphorical translation techniques, and influence the portrayal of national images. This process extends beyond simple language translation, facilitating the transfer of cultural meanings and significantly shaping international discursive influence.
Research on translation strategies and methodologies systematically analyzes party and government documents, political speeches, and specialized terminology to understand how translation conveys China’s stance in the diplomatic discourse . This research highlights the significance of cultural elements, particularly in cross-cultural communication. Translators must convey cultural nuances to advance China’s cultural diplomacy. As cultural mediators, they use theoretical frameworks to bridge textual and ideological differences, enhancing China’s global discourse influence through high-quality translations.
Scholars have also studied cross-cultural communication, primarily investigating how Chinese diplomatic terminology is represented in international media and spread via global news outlets . These studies show that various interconnected factors—such as translation methods, media framing techniques, and cultural contextualization—affect the effectiveness of translating diplomatic discourse across cultures. This process goes beyond simple language translation to serve as a form of cultural transmission.
2.2. Machine Translation (MT) and Translation Quality Assessment (TQA)
The field of Machine Translation (MT) has undergone a significant transformation with the integration of artificial intelligence, neural networks, and deep learning technologies . This change has advanced MT from rule-based systems (RBMT) to contemporary Neural Machine Translation (NMT) architectures . The rise of Large Language Models (LLMs) has further transformed the landscape . Known for their enhanced generation abilities, improved contextual understanding, fluency, and accuracy, LLMs represent a significant step forward in MT .
Despite this notable progress in accuracy, MT still struggles with culturally embedded expressions and creative linguistic devices, such as metaphors . This is due to its limited ability to fully understand socio-cultural nuances, which often leads to semantic errors, incorrect syntax, and grammatical mistakes. As a result, a quality gap remains between machine and human translation .
From the standpoint of Translation Quality Assessment (TQA), examining translation quality has historically been a central concern in the field of translation studies . TQA provides practical guidance for translators by reviewing the strengths and weaknesses of machine translation in structural processing and vocabulary selection . It also helps practitioners choose suitable translation methods based on the specific task requirements.
Contemporary research classifies MT quality assessment into three main methodological approaches :
1) human evaluation, which uses rating and ranking techniques;
2) reference-based automated metrics, divided into non-linguistic, light-linguistic, and heavy-linguistic categories;
3) reference-free estimation, which relies on either manually engineered features or attributes derived from neural networks.
This study employs a human evaluation method based on ratings, using the Multidimensional Quality Metrics (MQM) framework. Developed by the German Research Center for Artificial Intelligence (DFKI) as part of the EU-funded QTLaunchPad initiative, the MQM framework is a versatile system for defining translation quality evaluation metrics .
The MQM framework systematically categorizes translation errors into seven main categories: terminology, accuracy, linguistic conventions, style, locale conventions, audience appropriateness, and design and markup . Within each dimension, errors are further categorized into four severity levels: Neutral, Minor, Major, and Critical . This hierarchy enables a detailed and thorough assessment of translation quality, considering both the type and severity of errors.
3. Materials and Methods
3.1. Text Selection
The source text is a 196-character excerpt from the official statement issued by Lin Jian, Spokesperson for the Ministry of Foreign Affairs of China, regarding the 2025 Tariff Incident. The selection of this text is based on two primary considerations.
First, the statements released by the spokesperson of China’s Ministry of Foreign Affairs directly reflect China’s official stance toward the United States and carry important political and strategic implications. Phrases like “奉陪到底” (fight to the end) and “得道多助,失道寡助” (a just cause enjoys abundant support while an unjust one finds little) exemplify common diplomatic language and are typical of official communications. Their inclusion supports the study’s goal, which is to examine the translation skills of Chinese diplomatic discourse.
Second, these phrases are unique Chinese idioms and proverbs. Accurate translation of such expressions requires a deep understanding of the relevant cultural and historical context, which makes literal translation unfeasible. As a result, these phrases are considered suitable examples for evaluating the performance of current artificial intelligence technologies in translating Chinese diplomatic discourse.
3.2. Selection of Translation Tools
To provide a comprehensive overview of modern translation technologies, this study selected a diverse range of translation tools. Specifically, four Neural Machine Translation (NMT) tools and four Large Language Models (LLMs) were chosen, representing both domestic and international sources. This selection aims to reflect the current state of the art in machine translation and facilitate a balanced comparison of various technological approaches. Detailed information about the selected translation tools is available in Table 1.
Table 1. Translation Tool Information Sheet.

Category

Products

Neural Machine Translation (NMT)

Niutrans, Youdao, Google, DeepL

Large Language Models (LLMs)

DeepSeek, Ernie-4.5, ChatGPT-4.0, Gemini

To select NMT systems, this study primarily based its choices on the evaluation results from the quantitative analysis of MT systems conducted by Cady et al. (2023) . Using these evaluations, four translation engines that demonstrated notable performance in the Chinese-English translation direction were chosen. This selection ensures that the NMT tools used in this study are representative of the current state of the art in Neural Machine Translation (NMT), especially for translating between English and Chinese.
In selecting LLMs, this study chose four mainstream general-purpose models. The process was guided by evaluation results from the authors’ qualitative and quantitative research on the translation capabilities of these large language models , providing a comprehensive assessment of their abilities. Additionally, the popularity and discussion frequency of these models on social platforms were considered to ensure they are widely recognized and used in the field (see Table 1). To capture the latest advancements, the models selected are the most recent versions available at the time of the experiment. This approach enables an up-to-date evaluation of LLMs’ performance in translating Chinese diplomatic discourse.
3.3. Research Procedure
First, the research process began with a systematic collection of translations from the chosen translation tools. For Neural Machine Translation (NMT) tools, the source text was entered individually in plain text format into the designated original text box of each NMT web version. The corresponding translations were then retrieved directly from the output interface.
For Large Language Models (LLMs), a standard prompt, “Chinese to English,” was consistently used. The source text, formatted in plain text, was entered separately into each LLM's web interface. The translations generated by the LLMs were then retrieved from their respective output screens.
Second, after collecting the translations, the next step involved compiling and analyzing the data statistically. Since this study is designed to compare and evaluate the performance of Machine Translation (NMT) systems and Large Language Models (LLMs) in translating Chinese diplomatic discourse into English, the translations of the phrases “奉陪到底” and “得道多助, 失道寡助” were explicitly included in the evaluation scope. These phrases were selected due to their distinctive characteristics as Chinese idioms and proverbs, as well as their relevance to the study’s focus on diplomatic discourse.
The translation outputs from NMT systems and LLMs for the two phrases were collected separately. Similar translation approaches were identified and combined to facilitate a simplified analysis. Next, the occurrences of each unique translation approach were tallied. This method provided a systematic and objective way to evaluate the translation strategies used by both groups of tools, enabling a detailed comparison of their effectiveness in managing culturally and politically sensitive expressions.
Third, the evaluation criteria were established. This study utilized the MQM-Full error classification system from the Multidimensional Quality Metrics (MQM) model as a core framework for assessing translation quality. The criteria were tailored to fit the specific features of the translations being analyzed, ensuring their relevance for translating Chinese diplomatic discourse into English. The final evaluation criteria, with examples, are presented in Table 2. This method provides a systematic and objective evaluation process based on established standards, while also addressing the particular challenges of the subject matter.
Fourth, evaluate the quality of the translations and collect statistical data.
Table 2. Translation Evaluation Criteria and Examples.

Translation Quality

Explanation

Example

correct translation

The translation faithfully and thoroughly captures both the explicit details and the subtle nuances of the original text, all while using idiomatic English.

“奉陪到底” is translated as “fight to the end”;

“得道多助,失道寡助” is translated as “A just cause enjoys abundant support while an unjust one finds little.”

mistranslation

An error occurs when the target content does not accurately match the source content.

“奉陪到底” is translated as

“will accompany them to the end”; “得道多助,失道寡助” is translated as “If you get more help, but if you lose it, you will get little help”;

undertranslation

Error in the target content that is less specific than the source content.

“奉陪到底” is translated as

“will accompany it to the end”; “得道多助,失道寡助” is translated as “Those who follow the right path will have many supporters, while those who go against it will have few.”

omission

Error where content present in the source is absent in the target.

Omitting the translation of “奉陪到底” or “得道多助,失道寡助”

unidiomatic style

The translation sounds unnatural and doesn’t follow idiomatic English conventions.

“奉陪到底” is translated as “I humbly accompany you to the end”; “得道多助,失道寡助” is translated as “He who is just will have many to help him, but he who is unjust will have few.”

4. Results and Discussion
This study evaluates the translation performance of Neural Machine Translation (NMT) and Large Language Models (LLMs) in translating the Chinese phrases “奉陪到底” and “得道多助,失道寡助” into English, using the official English translations published on China Daily’s website as references. The results for the phrase “奉陪到底” are shown in Table 3.
Table 3. Translation Quality of NMT and LLMs for the Phrase “奉陪到底” (fight to the end).

Translation Quality

NMT

LLMs

Quantity

Proportion

Quantity

Proportion

correct translation

1

25%

2

50%

mistranslation

0

0

0

0

undertranslation

3

75%

1

25%

omission

0

0

1

25%

unidiomatic style

0

0

0

0

In the official reference translation, the phrase “奉陪到底” is rendered as “fight to the end.” When testing four different Neural Machine Translation (NMT) systems, only Google produced a translation that exactly matched the reference, making it correct. The translations from the other three NMT systems showed varying degrees of semantic differences and stylistic issues.
The translations of “奉陪到底” by Niutrans, DeepL, and Youdao are rendered as “will surely accompany it to the end,” “will certainly accompany it to the end,” and “will surely stand by it to the end,” respectively. These translations use the verbs “accompany” and “stand by.” While “accompany” is a direct translation of “奉陪,” it completely misses the resolute and combative tone inherent in the original Chinese phrase. This rendering is too mild and doesn’t match the typical context where such an expression is used, especially in diplomatic settings. The verb “stand by,” which implies “support” or “adhere to,” is somewhat stronger than “accompany.” However, it still does not fully capture the core meaning of “奉陪到底.” Instead, it is more appropriate for expressing a supportive attitude rather than the adversarial intent that the original phrase implies.
Including adverbs like “surely” and “certainly” in these translations can lead to unnecessary repetition and make the language sound overly conversational. This conflicts with the formal and precise style required for diplomatic language. Since these translations do not fully capture the original text’s semantic strength or its pragmatic role in diplomacy, they are considered undertranslations.
Among the four Large Language Models (LLMs) evaluated, only DeepSeek and Ernie-4.5 successfully conveyed the semantic meaning of the phrase “奉陪到底.” In contrast, ChatGPT’s translation, “will accompany it to the end,” had a similar issue to that seen in the translations from the three Neural Machine Translation (NMT) systems previously discussed. Specifically, the verb choice in ChatGPT’s translation weakened the original phrase’s adversarial semantic features, resulting in an undertranslation. Notably, Gemini failed to translate the idiom entirely, which is categorized as an omission.
These findings highlight the common challenges that modern machine translation systems encounter when processing diplomatic discourse. First, there is a notable difficulty in understanding the core semantics of culturally embedded terms. Second, it is challenging to strike a balance between maintaining the formality required in diplomatic settings and achieving semantic accuracy. Some large language models (LLMs) do show potential advantages over Neural Machine Translation (NMT) systems. However, overall, both LLMs and NMT systems still need to improve their domain-specific training for diplomatic texts. This is especially crucial for preserving semantic nuances and accurately conveying pragmatic functions.
Table 4. Translation Quality of NMT and LLMs for the Phrase “得道多助,失道寡助” (A just cause enjoys abundant support while an unjust one finds little).

Translation Quality

NMT

LLMs

Quantity

Proportion

Quantity

Proportion

correct translation

0

0

2

50%

mistranslation

1

25%

0

0

undertranslation

2

50%

2

50%

omission

1

25%

0

0

unidiomatic style

0

0

0

0

In official reference translations, the phrase “得道多助,失道寡助” is translated as “A just cause enjoys abundant support while an unjust one finds little.” When testing this with four Neural Machine Translation (NMT) systems, as shown in Table 4, it was observed that none of the NMT systems accurately captured the meaning of the original text.
Google renders the phrase as “The righteous will have many supporters, while the unrighteous will have few.” This translation simplifies the complex concept of “道” into a binary of “righteous/unrighteous,” thereby missing the deeper philosophical meaning of the original term. Additionally, the language style is too informal, which detracts from the formal and precise tone required for diplomatic language.
Youdao renders the phrase as “Those who follow the right path will have many supporters, while those who go against it will have few.” This translation uses the term “right path,” turning the abstract concept of “道” into a concrete “path.” As a result, it diminishes the philosophical depth and complexity of the original term. Additionally, the style is informal and colloquial, which is suitable for popular explanations but doesn’t meet the formal standards required for diplomatic contexts. Because of these issues, both this translation and the ones discussed earlier are considered undertranslations.
Niutrans provides an inaccurate translation, rendering the phrase as “If you get more help, but if you lose it, you will get little help.” This translation misinterprets the original meaning by reducing a profound philosophical idea to a simplistic assumption and overlooking the core concept of “道.” Due to these significant errors, it is considered a mistranslation. In contrast, DeepL’s approach results in an omission, failing to translate this phrase at all.
Among the four Large Language Models (LLMs) evaluated in Table 4, DeepSeek and Ernie-4.5 successfully captured the semantic connotations of the proverbs “得道多助,失道寡助.” In contrast, ChatGPT’s translation, “Those who are righteous will gain support, while those who are unjust will find few allies,” mainly preserves the intended meaning. However, using “righteous” adds a moral element that isn’t fully aligned with the original concept of “道,” which more accurately means “the just cause.” Additionally, the word “allies” is too specific, implying a narrower form of support compared to the broader idea of “support” conveyed by the original proverbs. A better translation would maintain the general concept of “support” rather than limiting it to “allies.”
Gemini’s translation, “A just cause attracts much support, while an unjust one finds little,” is semantically accurate. However, the verb “attracts” adds a level of subjectivity and is less idiomatic than “enjoys,” which is more commonly used in formal contexts to express the act of receiving or benefiting from something. Considering this nuance, the translations provided by both ChatGPT and Gemini, while conveying the general meaning, fall short of the best expression. As a result, they are also categorized as undertranslation.
The test results reveal a marked disparity in the translation of Chinese diplomatic discourse between Neural Machine Translation (NMT) systems and Large Language Models (LLMs). NMT systems generally exhibited suboptimal performance.
In contrast, Large Language Models (LLMs) demonstrated superior semantic understanding capabilities. Notably, the domestic systems DeepSeek and Ernie-4.5 accurately interpreted the original text’s meaning. While ChatGPT and Gemini showed some bias in word choice, they largely preserved the core meaning. This distinction underscores the strengths of LLMs in handling abstract concepts and adapting to cultural nuances, while also highlighting the ongoing challenges faced by current machine translation systems in effectively mastering the stylistic requirements of diplomatic discourse. The research findings suggest that improving the quality of translating diplomatic discourse requires a focus on better conveying abstract concepts and increasing the adaptability of register style.
5. Conclusion
This study employs the speech content of the spokesperson from China’s Ministry of Foreign Affairs regarding the 2025 tariff event as the source text for analysis. To conduct a thorough evaluation, four representative Neural Machine Translation (NMT) tools have been selected, covering both domestic and international options: Niutrans, Youdao, Google, and DeepL. Additionally, four Large Language Models (LLMs) - DeepSeek, Ernie-4.5, ChatGPT-4.0, and Gemini - have been selected to process the English translation of the chosen source text.
The research indicates that Large Language Models (LLMs) generally exhibit greater potential than Neural Machine Translation (NMT) systems in translating Chinese diplomatic discourse. This advantage is particularly evident in their ability to understand cultural metaphors and abstract concepts that are prevalent in such texts. Still, both LLMs and NMT systems need specialized tuning for the diplomatic domain to improve their performance.
Machine translation (MT), although a helpful tool, has inherent limitations when used for translating diplomatic discourse. These limitations call for a hybrid approach that combines machine translation with human proofreading to achieve accuracy and capture nuance. Additionally, domain-specific training is crucial for enhancing the quality of translations in diplomatic texts.
In the context of disseminating Chinese diplomatic messages externally, it is essential to balance “accuracy” and “acceptability.” Accuracy preserves the original meaning and purpose, while acceptability ensures the translation is understandable and culturally suitable for the audience.
This study acknowledges certain limitations. Specifically, the selection of source texts was restricted to speeches by the Chinese Foreign Ministry spokesperson, all of which were from the context of the 2025 event, during which the U.S. government announced a 10% tariff increase on all Chinese goods imported to the U.S. due to the fentanyl issue. This narrow focus results in a limited variety of sample types, which may restrict the generalizability of the findings.
The scope of this research is somewhat limited, as it mainly examines the diplomatic discourse related to specific events and does not cover broader types of Chinese diplomatic discourse. As a result, the study’s findings might not entirely reflect the complexities and subtleties of the broader Chinese diplomatic discourse. Additionally, the evaluation process relies on a single evaluator, which could introduce bias and restrict the thoroughness of the analysis.
Future research efforts could benefit from incorporating more representative examples of Chinese diplomatic discourse, such as the concepts of the “Community of Shared Future for Mankind” and the “Belt and Road Initiative.” By analyzing a broader range of text types, including policy documents, speeches by leaders, and white papers, the overall relevance and applicability of the research findings can be enhanced.
Moreover, enhancing the assessment criteria and implementing a multi-party evaluation system could provide a more comprehensive understanding of translation quality. This approach would involve collecting feedback from professional translators, global audiences, political scholars, and other key stakeholders. Such a multifaceted evaluation would support a more balanced assessment of both translation accuracy and its effectiveness in cross-cultural communication.
Abbreviations

NMT

Neural Machine Translation

LLMs

Large Language Models

Author Contributions
Dong Lu is the sole author. The author read and approved the final manuscript.
Conflicts of Interest
The author declares no conflicts of interest.
References
[1] Semenov, Alexander & Tsvyk, Anatoly. (2021). The approach to the Chinese diplomatic discourse. Fudan Journal of the Humanities and Social Sciences. 14. 1-22.
[2] Liu, Mingze & Yan, Jiale & Yao, Guangyuan. (2023). Themes and ideologies in China’s diplomatic discourse – a corpus-assisted discourse analysis in China’s official speeches. Frontiers in Psychology. 14.
[3] China Daily. China urges US to reverse tariffs, preserve counternarcotics cooperation. Available from:
[4] China Daily. China plans to add tariffs on US products. Available from:
[5] China Daily. China-US trade talks make substantial progress. Available from:
[6] Guo, Zezhang & Shen, Shu. A corpus-based study on metaphorical modes of China’s diplomatic discourse and corresponding French translation strategies: Taking the speeches made at the regular press conferences of the Ministry of Foreign Affairs from 2020 to 2022 as an example. Advances in Education, Humanities and Social Science Research. 2024, 12(1), 452-462.
[7] Li, Tao & Xu, Fang. Re-appraising self and other in the English translation of contemporary Chinese political discourse. Discourse, Context & Media. 2018, 25(6), 1-8.
[8] Tekwa, Kizito & Mei, Li. (2022). Translation, politics, and development: A corpus-based approach to evaluating China’s development aid discourse. Linguistica Antverpiensia New Series - Themes in Translation Studies. 21.
[9] Zhang, Chenxia & Afzaal, Muhammad & Omar, Abdulfattah & Altohami, Waheed. (2023). A corpus-based analysis of the stylistic features of Chinese and American diplomatic discourse. Frontiers in Psychology. 14.
[10] Fu, Rongbo. (2016). Comparing modal patterns in Chinese-English interpreted and translated discourses in diplomatic setting: A systemic functional approach. Babel. 62. 104-121.
[11] Hu, Kaibao & Li, Xiaoqian. (2022). The image of the Chinese government in the English translations of Report on the Work of the Government: A corpus-based study. Asia Pacific Translation and Intercultural Studies. 9. 1-20.
[12] Liu, Yangyang. A study on language conversion and construction of discourse power in Chinese diplomacy. International Journal of Linguistics, Literature and Translation. 2024, 7(4), 85-91.
[13] Yu, Hailing & Wu, Canzhong. Functions of the pronoun ‘we’ in the English translations of Chinese government reports. Advances in Discourse Analysis of Translation and Interpreting (pp.85-105), 1st Edition. London: Routledge; 2020, 1-240.
[14] Chang, Jiang & Ying, Luo. (2024). A Contrastive Study of the Translator’s Behaviour in English and Spanish Translations of Metaphors in Xi Jinping: The Governance of China. Sinología hispánica. China Studies Review. 17. 113-138.
[15] Xu, Dong & Abdou Moindjie, Mohamed & Mehar Singh, Manjet Kaur. (2024). Assessing narratives in the translation of Chinese political discourse: A perspective from the narrative paradigm. International Journal of English Linguistics. 14. 62-62.
[16] Aina, Sun & Chwee Fang, Ng & Subramanlam, Vijayaletchumy & Ghani, C. Chinese-to-English translation of political discourse: A feature-oriented analysis. International Journal of Materials Science and Applications. 2022, 13(2), 205-213.
[17] Huang, Mengyan & Xie, Zenan. (2025). Translation Strategies of Tautology in Chinese Political Discourse - A Case Study of Xi Jinping: The Governance of China (Volume III). Stallion Journal for Multidisciplinary Associated Research Studies. 4. 1-7.
[18] Wang, Yizhe & Ruan, Hongmei. Study of Chinese political terminology translation and national image shaping. International Journal of Languages, Literature and Linguistics. 2023, 9(5), 378-384.
[19] Li, Tao & Pan, Feng. (2020). Reshaping China’s image: A corpus-based analysis of the English translation of Chinese political discourse. Perspectives. 29. 1-17.
[20] Xu, Dong & Abdou Moindjie, Mohamed & Mehar Singh, Manjet Kaur. Framing narratives in the translation of Chinese political discourse: Case examples from The Governance of China. English Language and Literature Studies. 2024, 14(2), 1-12.
[21] Tian, Xujun. Translators as mediators to mend the psychological gap between source text and target text: A corpus-based study on the Chinese English translation of modal verbs in the Chinese Report on the Work of the Government (2000–2022). PLOS ONE. 2025, 20(3), 1-14.
[22] Lingqian, Zheng & Ren, Wen. Interpreting as an influencing factor on news reports: A study of interpreted Chinese political discourse recontextualized in English news. Perspectives: Studies in Translatology. 2018, 26(5), 691-707.
[23] Gu, James Chonglong & Wang, Binhua. (2021). Interpreter-mediated discourse as a vital source of meaning potential in intercultural communication: The case of the interpreted premier-meets-the-press conferences in China. Language and Intercultural Communication. 21. 1-16.
[24] Pan, Li & Huang, Chuxin. (2020). Stance mediation in media translation of political speeches. In book: Advances in Discourse Analysis of Translation and Interpreting: Linking Linguistic Approaches with Socio-cultural Interpretation (pp.131-149) Chapter: 7 Publisher: Routledge.
[25] Zhang, Chenxia. (2025). When translation meets dissemination: Translations of the Chinese diplomatic term Mìngyùn Gòngtóngtǐ in English news reports. Language Sciences. 110. 101727.
[26] Ping, Yuan. Quoting Chinese Political Discourse through Translation: An Analysis of Xi Jinping’s Climate Change Discourse in English-language News Media. International Journal of Chinese and English Translation & Interpreting. 2023(3), 1-17.
[27] Xin J, Matheson D. One Belt, competing metaphors: The struggle over strategic narrative in English-language news media [J]. International Journal of Communication, 2018, 12: 21.
[28] Zhao, Jiaming & Wang, Jiayin. Discursive practices in translating political discourse: Insights from white papers on China-US economic and trade frictions. Humanities and Social Sciences Communications. 2025, 12(1), 1-11.
[29] Stahlberg, F. (2020). Neural machine translation: A review. Journal of Artificial Intelligence Research, 69, 343-418.
[30] Dwivedi, Ritesh & Nand, Parma & Pal, Om. (2024). Hybrid NMT model and comparison with existing machine translation approaches. Multidisciplinary Science Journal. 7. 2025146.
[31] Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901.
[32] Jiao, Wenxiang & Wang, Wenxuan & Huang, Jen-Tse & Wang, Xing & Shi, Shuming & Tu, Zhaopeng. (2023). Is ChatGPT A Good Translator? A Preliminary Study.
[33] Khoshafah, Saleh & Tagaddeen, Ibraheem. (2023). Effectiveness of Machine Translation in Rendering Yemeni Culture-Specific Items into English: Sana'ani Dialect as a Case-in-Point. مجلة جامعة صنعاء للعلوم الإنسانية. 5.
[34] Shutova, Ekaterina. (2015). Design and Evaluation of Metaphor Processing Systems. Computational Linguistics. 41. 579-623.
[35] Lihua, Zhao. (2022). The Relationship between Machine Translation and Human Translation under the Influence of Artificial Intelligence Machine Translation. Mobile Information Systems. 2022. 1-8.
[36] Varmazyari, Hamid & Anari, Salar. (2016). House's Newly Revised Translation Quality Assessment Model in Practice: A Case Study. 13. 27-46.
[37] Li, Hanji & Chen, Haiqing. Human vs. AI: An assessment of the translation quality between translators and machine translation. International Journal of Translation, Interpretation, and Applied Linguistics. 2019, 1(1), 43-54.
[38] Thompson, Brian & Post, Matt. (2020). Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing.
[39] Lommel, Arle & Burchardt, Aljoscha & Uszkoreit, Hans. (2014). Multidimensional Quality Metrics (MQM): A Framework for Declaring and Describing Translation Quality Metrics. Tradumàtica: tecnologies de la traducció. 455-463.
[40] The MQM Council. MQM (Multidimensional Quality Metrics). Available from:
[41] Lommel, Arle & Gladkoff, Serge & Melby, Alan & Wright, Sue & Strandvik, Ingemar & Gasova, Katerina & Vaasa, Angelika & Marazzato Sparano, Romina & Faresi, Monica & Innis, Johani & Han, Lifeng & Nenadic, Goran. The Multi-Range Theory of Translation Quality Measurement: MQM Scoring Models and Statistical Quality Control. 2024.
[42] Cady, L. P., Tsou, B. K., & Lee, J. S. (2023, September). Comparing Chinese‐English MT Performance Involving ChatGPT and MT Providers and the Efficacy of AI Mediated Post‐Editing. In Machine Translation Summit XIX (MT Summit 2023) (pp. 205-216). Asia-Pacific Association for Machine Translation.
[43] Weigang, Li & Brom, Pedro. (2025). The Paradox of Poetic Intent in Back-Translation: Evaluating the Quality of Large Language Models in Chinese Translation.
[44] Hendy, Amr & Abdelrehim, Mohamed & Sharaf, Amr & Raunak, Vikas & Gabr, Mohamed & Matsushita, Hitokazu & Kim, Young Jin & Afify, Mohamed & Awadalla, Hany. (2023). How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation.
[45] Othman, Achraf & Chemnad, Khansa & Tlili, Ahmed & Da, Ting & Wang, Huanhuan & Huang, Ronghuai. (2024). Comparative analysis of GPT-4, Gemini, and Ernie as gloss sign language translators in special education. Discover Global Society. 2.
Cite This Article
  • APA Style

    Lu, D. (2025). A Comparative Study on the Translation Quality of Chinese Diplomatic Discourse by NMT and LLMs Based on Multidimensional Quality Metrics. International Journal of Applied Linguistics and Translation, 11(4), 107-115. https://doi.org/10.11648/j.ijalt.20251104.12

    Copy | Download

    ACS Style

    Lu, D. A Comparative Study on the Translation Quality of Chinese Diplomatic Discourse by NMT and LLMs Based on Multidimensional Quality Metrics. Int. J. Appl. Linguist. Transl. 2025, 11(4), 107-115. doi: 10.11648/j.ijalt.20251104.12

    Copy | Download

    AMA Style

    Lu D. A Comparative Study on the Translation Quality of Chinese Diplomatic Discourse by NMT and LLMs Based on Multidimensional Quality Metrics. Int J Appl Linguist Transl. 2025;11(4):107-115. doi: 10.11648/j.ijalt.20251104.12

    Copy | Download

  • @article{10.11648/j.ijalt.20251104.12,
      author = {Dong Lu},
      title = {A Comparative Study on the Translation Quality of Chinese Diplomatic Discourse by NMT and LLMs Based on Multidimensional Quality Metrics
    },
      journal = {International Journal of Applied Linguistics and Translation},
      volume = {11},
      number = {4},
      pages = {107-115},
      doi = {10.11648/j.ijalt.20251104.12},
      url = {https://doi.org/10.11648/j.ijalt.20251104.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijalt.20251104.12},
      abstract = {Chinese diplomatic discourse plays a crucial role in articulating China’s position and enhancing its influence in global forums. However, machine translation (MT) often struggles with culturally nuanced and abstract expressions, highlighting the need to compare various advanced MT tools. This study assesses and compares the translation quality of Neural Machine Translation (NMT) systems and Large Language Models (LLMs) in translating Chinese diplomatic texts, focusing on the 2025 China-US tariff statements by China’s Foreign Ministry Spokesperson Lin Jian, with China Daily’s official English versions serving as references. Four NMT tools (Niutrans, Youdao, Google, DeepL) and four LLMs (DeepSeek, Ernie-4.5, ChatGPT-4.0, Gemini) were examined. Using the Multidimensional Quality Metrics (MQM) framework, the study evaluated translations, especially for phrases like “奉陪到底” (fight to the end) and “得道多助,失道寡助” (A just cause enjoys abundant support while an unjust one finds little). Results show that LLMs outperform NMTs: 50% of LLMs (DeepSeek, Ernie-4.5) accurately translated both phrases, while only 25% of NMTs (Google) did so for “奉陪到底,” and none for “得道多助,失道寡助.” Both systems faced issues such as undertranslation, omission, and a lack of diplomatic formality. The findings suggest that LLMs have greater potential to handle cultural nuances and abstract content in diplomatic texts, providing insights for enhancing domain-specific MT training and striking a balance between accuracy and acceptability in conveying Chinese diplomatic messages.
    },
     year = {2025}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - A Comparative Study on the Translation Quality of Chinese Diplomatic Discourse by NMT and LLMs Based on Multidimensional Quality Metrics
    
    AU  - Dong Lu
    Y1  - 2025/10/27
    PY  - 2025
    N1  - https://doi.org/10.11648/j.ijalt.20251104.12
    DO  - 10.11648/j.ijalt.20251104.12
    T2  - International Journal of Applied Linguistics and Translation
    JF  - International Journal of Applied Linguistics and Translation
    JO  - International Journal of Applied Linguistics and Translation
    SP  - 107
    EP  - 115
    PB  - Science Publishing Group
    SN  - 2472-1271
    UR  - https://doi.org/10.11648/j.ijalt.20251104.12
    AB  - Chinese diplomatic discourse plays a crucial role in articulating China’s position and enhancing its influence in global forums. However, machine translation (MT) often struggles with culturally nuanced and abstract expressions, highlighting the need to compare various advanced MT tools. This study assesses and compares the translation quality of Neural Machine Translation (NMT) systems and Large Language Models (LLMs) in translating Chinese diplomatic texts, focusing on the 2025 China-US tariff statements by China’s Foreign Ministry Spokesperson Lin Jian, with China Daily’s official English versions serving as references. Four NMT tools (Niutrans, Youdao, Google, DeepL) and four LLMs (DeepSeek, Ernie-4.5, ChatGPT-4.0, Gemini) were examined. Using the Multidimensional Quality Metrics (MQM) framework, the study evaluated translations, especially for phrases like “奉陪到底” (fight to the end) and “得道多助,失道寡助” (A just cause enjoys abundant support while an unjust one finds little). Results show that LLMs outperform NMTs: 50% of LLMs (DeepSeek, Ernie-4.5) accurately translated both phrases, while only 25% of NMTs (Google) did so for “奉陪到底,” and none for “得道多助,失道寡助.” Both systems faced issues such as undertranslation, omission, and a lack of diplomatic formality. The findings suggest that LLMs have greater potential to handle cultural nuances and abstract content in diplomatic texts, providing insights for enhancing domain-specific MT training and striking a balance between accuracy and acceptability in conveying Chinese diplomatic messages.
    
    VL  - 11
    IS  - 4
    ER  - 

    Copy | Download

Author Information