| Peer-Reviewed

Rule-Based Sentence Detection Method (RBSDM) for Turkish

Received: 20 March 2013     Published: 2 May 2013
Views:       Downloads:
Abstract

The first process of generating a corpus, which is a representative of the language, is the determination of sen-tences, which is very complicated and hard to solve, but an important part of the corpus generation. Different approaches have been tried to find out sentence boundaries in some languages. In Turkish, the most known ways of determining sentence boundaries are using statistics and machine learning. In this study, to determine the sentence boundaries in contemporary Turkish, a rule-based method called “Rule-Based Sentence Detection Method for Turkish (RBSDM)” was developed by considering the agglutinative and rule based structure of Turkish. This method was tested on two different test sets generated by randomly selected columns from two Turkish newspapers. RBSDM determines end of sentences correctly and efficiently, about means of time and other costs, and provides success rate in a range of 99.60% and 99.80%.

Published in International Journal of Language and Linguistics (Volume 1, Issue 1)
DOI 10.11648/j.ijll.20130101.11
Page(s) 1-6
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2013. Published by Science Publishing Group

Previous article
Keywords

Linguistics, Natural Language Processing, Corpus, Turkish, Morphological Analysis, Sentence Boundary Detection

References
[1] Z. Güngördü, "A lexical-functional grammar for Turkish", MSc Thesis, Computer Engineering Department, Bilkent University, Ankara-Turkey, 1993.
[2] C. E. Shannon, "Prediction and Entropy of Printed English", The Bell System Technical Journal, vol. 30:1, pp. 50-64, 1951.
[3] D. Crystal, A Dictionary of Linguistics and Phonetics, 3rd Edition, Blackwell, 1991.
[4] J. Sinclair, "Corpus Concordance", Collocation, OUP, 1991.
[5] Varliklar, O. Developing a Method to Determine Root and Suffixes for Turkish Words to Generate Large Scale Turkish Corpus. M.Sc. Thesis, Dokuz Eylul University Graduate School of Natural and Applied Sciences Computer Engineering Department, Izmir - Turkey, 2005.
[6] Boye, J. "XML, What’s in it for us?", article published in www.irt.org, 1998.
[7] Ş. H. Akalın, R. Toparlı, Yazım Kılavuzu, Türk Dil Kurumu Yayınları, 24th Edition, Ankara, 2005.
[8] T. Kiss, J. Strunk, "Unsupervised Multilingual Sentence B oundary Detection", Computational Linguistics vol. 32:4 pp,. 485-525, 2006.
[9] B. Say, D. Zeyrek, K. Oflazer, U. Ozge, "Development of a Corpus and a Treebank for Present-day Written Turkish", Proceedings of the Eleventh International Conference of Turkish Linguistics, ICTL, Ankara, Turkey, 2002.
[10] T. Dinçer, B. Karaoğlan, "Sentence Boundary Detection in Turkish", Advances in Information Systems Proceedings: Third International Conference, Izmir-Turkey, pp. 255, 2004.
Cite This Article
  • APA Style

    Özlem AKTAŞ, Yalçın ÇEBİ. (2013). Rule-Based Sentence Detection Method (RBSDM) for Turkish. International Journal of Language and Linguistics, 1(1), 1-6. https://doi.org/10.11648/j.ijll.20130101.11

    Copy | Download

    ACS Style

    Özlem AKTAŞ; Yalçın ÇEBİ. Rule-Based Sentence Detection Method (RBSDM) for Turkish. Int. J. Lang. Linguist. 2013, 1(1), 1-6. doi: 10.11648/j.ijll.20130101.11

    Copy | Download

    AMA Style

    Özlem AKTAŞ, Yalçın ÇEBİ. Rule-Based Sentence Detection Method (RBSDM) for Turkish. Int J Lang Linguist. 2013;1(1):1-6. doi: 10.11648/j.ijll.20130101.11

    Copy | Download

  • @article{10.11648/j.ijll.20130101.11,
      author = {Özlem AKTAŞ and Yalçın ÇEBİ},
      title = {Rule-Based Sentence Detection Method (RBSDM) for Turkish},
      journal = {International Journal of Language and Linguistics},
      volume = {1},
      number = {1},
      pages = {1-6},
      doi = {10.11648/j.ijll.20130101.11},
      url = {https://doi.org/10.11648/j.ijll.20130101.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijll.20130101.11},
      abstract = {The first process of generating a corpus, which is a representative of the language, is the determination of sen-tences, which is very complicated and hard to solve, but an important part of the corpus generation. Different approaches have been tried to find out sentence boundaries in some languages. In Turkish, the most known ways of determining sentence boundaries are using statistics and machine learning. In this study, to determine the sentence boundaries in contemporary Turkish, a rule-based method called “Rule-Based Sentence Detection Method for Turkish (RBSDM)” was developed by considering the agglutinative and rule based structure of Turkish. This method was tested on two different test sets generated by randomly selected columns from two Turkish newspapers. RBSDM determines end of sentences correctly and efficiently, about means of time and other costs, and provides success rate in a range of 99.60% and 99.80%.},
     year = {2013}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Rule-Based Sentence Detection Method (RBSDM) for Turkish
    AU  - Özlem AKTAŞ
    AU  - Yalçın ÇEBİ
    Y1  - 2013/05/02
    PY  - 2013
    N1  - https://doi.org/10.11648/j.ijll.20130101.11
    DO  - 10.11648/j.ijll.20130101.11
    T2  - International Journal of Language and Linguistics
    JF  - International Journal of Language and Linguistics
    JO  - International Journal of Language and Linguistics
    SP  - 1
    EP  - 6
    PB  - Science Publishing Group
    SN  - 2330-0221
    UR  - https://doi.org/10.11648/j.ijll.20130101.11
    AB  - The first process of generating a corpus, which is a representative of the language, is the determination of sen-tences, which is very complicated and hard to solve, but an important part of the corpus generation. Different approaches have been tried to find out sentence boundaries in some languages. In Turkish, the most known ways of determining sentence boundaries are using statistics and machine learning. In this study, to determine the sentence boundaries in contemporary Turkish, a rule-based method called “Rule-Based Sentence Detection Method for Turkish (RBSDM)” was developed by considering the agglutinative and rule based structure of Turkish. This method was tested on two different test sets generated by randomly selected columns from two Turkish newspapers. RBSDM determines end of sentences correctly and efficiently, about means of time and other costs, and provides success rate in a range of 99.60% and 99.80%.
    VL  - 1
    IS  - 1
    ER  - 

    Copy | Download

Author Information
  • Sections