Evaluating Code Synthesis in Large Language Models (LLMs): A Case Study of DeepSeek and ChatGPT

Published: October 18, 2025
Views:       Downloads:
Abstract

Large Language Models (LLMs) have become more popular in recent years and are now used in various areas of software development because of their ability to understand and generate natural language, including source code. With more powerful LLMs available, developers face the challenge of choosing the right one for generating source code. While some studies have looked at tools like DeepSeek or ChatGPT, there is little research on how developers can pick the best model for their needs. It's important to know if a model can generate useful code that meets quality standards and if developers can use the code effectively. This paper presents a way to compare different models by looking at how two models perform. We tested them on 25 programming tasks using Python language. Both models generated good code, but each had few issues. We investigated the functional and non-functional qualities of the code synthesized by the models on a program synthesis benchmark containing these 25 tasks. Overall, our evaluation shows that overall DeepSeek performs better based on this comparison, which is also supported by human reviewers who checked the generated code manually.

Published in Abstract Book of the National Conference on Advances in Basic Science & Technology
Page(s) 103-103
Creative Commons

This is an Open Access abstract, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2025. Published by Science Publishing Group

Keywords

Code Generator, DeepSeek, ChatGPT, Developers, Software Development