American Journal of Computer Science and Technology

Submit a Manuscript

Publishing with us to make your research visible to the widest possible audience.

Propose a Special Issue

Building a community of authors and readers to discuss the latest research and develop new ideas.

Research and Implementation of Word Detection System Based on Improved DFA in China

Since the second half of the last century, the intensive usage of digital texts and textual databases produced the need for efficient search methods and data structures. Even though there are many traditional pattern matching algorithms such as regular matching, AC algorithm, and WM algorithm, in this paper, we use on a word detection method based on an improved DFA algorithm. We focus on the implementation of content matching technology using an improved DFA algorithm. We used the approach that can retrieve the emoticon icon, half corner character, repeated word based on ConcurrentSkipListMap to construct the tree of the word filtering system. We introduce the architecture of the system that mainly depends on the middleware, database, and data processing parts. The algorithm performs functions including filtering the word to match multiple pattern strings, to share a common prefix of a string that can reduce repeated lookups and save memory space. We use the pre-trained word vector model to achieve good results for the expansion and improvement of the sensitive lexicon. The system realizes the functions of word matching, including initializing, changing, matching, and highlighting of the word database, various processes that are tested and analyzed. We did a simulation to capture relevant word data and import it into MySQL database for storage. The method for message sensitive word recognition effectively improves the speed and accuracy of the algorithm recognition, the efficiency of word matching. We emphasize the DFA algorithm is the best approach compared to AC algorithm and other algorithms. Through function test, system test, and performance test, some valuable results are obtained. As a result of the tests, valuable results are founded from functional tests, system tests, performance tests. The system realizes the characteristics of large thesaurus and high matching efficiency of long text. It can meet the requirement of network real-time transmission, so it can be applied in the network. This paper proposes an improved multi-mode matching algorithm for word detection based on DFA. The algorithm maximizes the speed of problem detection and response efficiency and purifies the network space by optimizing the algorithm for the characters of the text content, the number of basic words and the detection efficiency. As a result of our research, we have shown the data from different sources of the system can be reused to reduce repeated construction costs.

DFA, MySQL, Word Detection System, Word Changing, Word Matching

APA Style

Feng Kai, Tuyatsetseg Badarch. (2023). Research and Implementation of Word Detection System Based on Improved DFA in China. American Journal of Computer Science and Technology, 6(1), 25-32.

ACS Style

Feng Kai; Tuyatsetseg Badarch. Research and Implementation of Word Detection System Based on Improved DFA in China. Am. J. Comput. Sci. Technol. 2023, 6(1), 25-32. doi: 10.11648/j.ajcst.20230601.14

AMA Style

Feng Kai, Tuyatsetseg Badarch. Research and Implementation of Word Detection System Based on Improved DFA in China. Am J Comput Sci Technol. 2023;6(1):25-32. doi: 10.11648/j.ajcst.20230601.14

Copyright © 2023 Authors retain the copyright of this article.
This article is an open access article distributed under the Creative Commons Attribution License ( which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Zhao Junjie. A calculate way of rapid string precision used for keyword index matches. Computer Systems & Applications, 2010, 19 (2): 189-191.
2. Kurniawan D H, Munir R. A new string matching algorithm based on logical indexing//Proc of International Conference on Electrical Engineering and Informatics. Piscataway, NJ: IEEE Press, 2015: 394-399.
3. AHO A V, CORASICK M J. Efficient string matching: An aid to bibliographic search. Communications of the ACM, 1975, 18 (6): 333-340.
4. WU S, MANBER U. A fast algorithm for multi-pattern searching. Tucson, AZ: University of Arizona,
5. Liu Chuan, Wang Wenyong, Wang Meng, et al. An efficient instance selection algorithm to reconstruct training set for support vector machine. Knowledge-Based Systems, 2017, 116 (1): 58-73.
6. Deng Yigui, Wu Yuying. Information filtering algorithm of text content-based sensitive words decision tree. Computer Engineering, 2014, 40 (9): 300-304.
7. Chen Yongjie, Wushour•Silamu, Yu Qing. An improved multi-pattern matching algorithm based on Aho-Corasick algorithm. Modern Electronics Technique, 2019, 42 (4): 89-93.
8. Guan Donghai, Yuan Weiwei, Lee Y K, et al. Improving supervised learning performance by using fuzzy clustering method to select training data. Journal of Intelligent & Fuzzy Systems, 2008, 19 (4): 321-334.
9. Liu Lijun. Design and Optimization of DFA Word Segmentation Algorithm based on Keyword filtering system. Computer Application and Software, 2012 (1): 284-287.
10. Majed AbuSafiya. Automata-based Algorithm for Multiple Word Matching. International Journal of Advanced Computer Science and Applications (IJACSA), 2021, 12, (3), 54-65.
11. Cheng Yuanbin. Translating a kind of NFA into DFA straightly. Computer Systems & Applications, 2012, 21 (10): 109-113.
12. Xu Qiang. Design and implementation of regular expression engines based on deterministic finite automata. Xi’an: Xidian University, 2012.
13. Cavalcanti G D C, Soares R J O. Ranking-based instance selection for pattern classification. Expert Systems with Applications, 2020, 150: 113269.
14. Pinkerton A, Boerhout J I, Bottalico T. Using an embedded web server to allow a standard multi-tasking operating system to manage, control and display live or recorded condition monitoring data from real time hardware: US, US14816238. 2016-03-31.
15. Liu J, Bian G, Qin C, et al. A fast multi-pattern matching algorithm for mining big network data. China Communications, 2019, 16 (5), 121-136.
16. Proux D, Cheminot E, Guerin N. Method and system for phishing detection: US, 11/443240. 2010-02-23.
17. Zhao Wei. Research and Practice of Software Testing Strategy based on Black Box Testing. Management and Technology of Small and Medium-sized Enterprises (Upper issue), 2017, (01), 144-145.
18. Ranjan R. College Database Management System, 2021.
19. Lim H, Lee N. Survey and Proposal on Binary Search Algorithms for Longest Prefix Match. IEEE Communications Surveys & Tutorials, 2012, 14, (3), 681-697.
20. Dong Mei, Chang Zhijun, Zhang Runjie. A multi-pattern matching algorithm for incremental data specification of scientific literature metadata. Data Analysis and Knowledge Discovery, 2021, 5 (6), 10.
21. Pande A, V Pant, Gupta M, et al. Design Patterns Discovery in Source Code: Novel Technique Using Substring Match. TEM Journal, 2021, 10, (3), 1166-1174.
22. Wu S, Manber U. A fast algorithm for multi-pattern searching. US: Department of Computer Science, 1994: 1-11.
23. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521 (7553): 436-444.
24. Xu Jianhua. Designing nonlinear classifiers through minimizing VC dimension bound/ /Proc of International Symposium on Neural Networks. Berlin: Springer, 2005: 900-905.
25. Yao R, Cao Y, Ding Z, et al. A Sensitive Words Filtering Model Based on Web Text Features//Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence. 2018: 516-520.
26. Becchi M, Crowley P. A-DFA: A time-and space-efficient DFA compression algorithm for fast regular expression evaluation [J]. ACM Transactions on Architecture and Code Optimization (TACO), 2013, 10 (1): 1-26.
27. Xue Pengqiang, Wushouer, Lamu. Sensitive Information Filtering Algorithm based on Network Text Information. Computer Engineering and Design, 2016, 37 (9): 2447-2452.
28. Zhang Zhi-Yue. Research and implementation of website word monitoring system based on improved DFA algorithm.