| Peer-Reviewed

Prediction of Escherichia Coli K-12 Promoters Using Convolutional Neural Network

Received: 11 October 2018     Accepted: 31 October 2018     Published: 30 November 2018
Views:       Downloads:
Abstract

Promoters are significant cis-acting elements in genomes and play important roles in gene regulation. Each gene is regulated by a specific type of promoter, so determining the type of promoter for regulation of a gene is crucial to explore the gene function. Although some computational methods to predict promoters have been proposed, their performances are not satisfying. Convolutional neural network (CNN) is a powerful model in deep learning, it has been applied in bioinformatics in recent years. To improve the performance of promoter prediction, in this study, six types of Escherichia coli K-12 promoter DNA sequences were collected from the RegulonDB database, and constructed a CNN model to predict promoters using the Keras platform. The CNN model is composed of two convolutional layers, three dropout layers, four batch normalization layers and one hidden layer. To evaluate the performances of the CNN model, the 10-fold cross-validation and the receiver operating characteristic (ROC) curve plotting were performed. The results show, the accuracies of predictions for promoters sigma 24, sigma 28, sigma 32, sigma 38, sigma 54 and sigma 70 are 94%, 97%, 95%, 95%, 97% and 83%, respectively. The convolutional neural network model achieves the highest accuracy in promoter prediction up to now. In conclusion, CNN is the best model in promoter prediction, and it will be a promising model both in DNA and protein sequence analysis.

Published in Computational Biology and Bioinformatics (Volume 6, Issue 2)
DOI 10.11648/j.cbb.20180602.11
Page(s) 31-35
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2018. Published by Science Publishing Group

Keywords

Convolutional Neural Network (CNN), Escherichia Coli, Promoter, Prediction

References
[1] He W, Jia C, Duan Y, et al. 70ProPred: a predictor for discovering sigma70 promoters based on combining multiple features. [J] BMC Systems Biology, 2018, 12(4):44.
[2] Barrios H, Valderrama B, Morett E. Compilation and analysis of sigma(54)-dependent promoter sequences. [J] Nucleic Acids Research, 1999, 27(22):4305-4313.
[3] Gershenzon N I, Stormo G D, Ioshikhes I P. Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites. [J] Nucleic Acids Research, 2005, 33(7):2290-301.
[4] Zhang L, Luo L. Splice site prediction with quadratic discriminant analysis using diversity measure. [J] Nucleic Acids Research, 2003, 31(21):6214-6220.
[5] Drioli S, Felluga F, Forzato C, et al. The recognition and prediction of σ 70, promoters in Escherichia coli K-12. [J] Journal of Theoretical Biology, 2006, 242(1):135.
[6] Gordon J J, Towsey M W, Hogan J M, et al. Improved prediction of bacterial transcription start sites. [J] Bioinformatics, 2006, 22(2):142-148.
[7] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. [J] Neural Computation, 2006, 18(7):1527-1554.
[8] Tran N H, Zhang X, Xin L, et al. De novo peptide sequencing by deep learning. [J] Proceedings of the National Academy of Sciences of the United States of America, 2017:201705691.
[9] Yang B, Liu F, Ren C, et al. BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone. [J] Bioinformatics, 2017, 33(13).
[10] Hong H, Xiao-Chen B O, Fei L I. Application of Deep Learning in Biomedical Data. [J] Journal of Medical Informatics, 2018, 39(03):2-9.
[11] Bengio Y. Learning Deep Architectures for AI. [J] Foundations & Trends® in Machine Learning, 2009, 2(1):1-127.
[12] Serre T, Kreiman G, Kouh M, et al. A quantitative theory of immediate visual recognition. [J] Progress in Brain Research, 2007, 165(6):33-56.
[13] Zhou F Y, Jin L P, Dong J. Review of Convolutional Neural Network. [J] Chinese Journal of Computers, 2017, 40(06):1229-1251.
[14] Lecun Y L, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proc IEEE. [J] Proceedings of the IEEE, 1998, 86(11):2278-2324.
[15] Lecun Y, Boser B, Denker J, et al. Backpropagation Applied to Handwritten Zip Code Recognition. [J] Neural Computation, 2014, 1(4):541-551.
[16] Gao L, Chen P Y, Yu S. Demonstration of Convolution Kernel Operation on Resistive Cross-Point Array. [J] IEEE Electron Device Letters, 2016, 37(7):870-873.
[17] Boureau Y L, Ponce J, Lecun Y. A Theoretical Analysis of Feature Pooling in Visual Recognition. International Conference on Machine Learning. DBLP, 2010:111-118.
[18] Ioffe S, Szegedy C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. [J] 2015:448-456.
[19] Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: a simple way to prevent neural networks from overfitting. [J] Journal of Machine Learning Research, 2014, 15(1):1929-1958.
[20] Zhou X, Li Z, Dai Z, et al. Predicting promoters by pseudo-trinucleotide compositions based on discrete wavelets transform. [J] Journal of Theoretical Biology, 2013, 319(5):1-7.
[21] Yan Y, Wan P. Prediction of Escherichia coli K-12 promoters using position-specific scoring matrix (PSSM) method. [J] Chinese Journal of Bioinformatics, 2015, 13(02):125-130.
[22] De A E S S, Echeverrigaray S, Gerhardt G J. BacPP: bacterial promoter prediction--a tool for accurate sigma-factor specific assignment in enterobacteria. [J] Journal of Theoretical Biology, 2011, 287(1):92.
Cite This Article
  • APA Style

    Lu Wang, Ping Wan. (2018). Prediction of Escherichia Coli K-12 Promoters Using Convolutional Neural Network. Computational Biology and Bioinformatics, 6(2), 31-35. https://doi.org/10.11648/j.cbb.20180602.11

    Copy | Download

    ACS Style

    Lu Wang; Ping Wan. Prediction of Escherichia Coli K-12 Promoters Using Convolutional Neural Network. Comput. Biol. Bioinform. 2018, 6(2), 31-35. doi: 10.11648/j.cbb.20180602.11

    Copy | Download

    AMA Style

    Lu Wang, Ping Wan. Prediction of Escherichia Coli K-12 Promoters Using Convolutional Neural Network. Comput Biol Bioinform. 2018;6(2):31-35. doi: 10.11648/j.cbb.20180602.11

    Copy | Download

  • @article{10.11648/j.cbb.20180602.11,
      author = {Lu Wang and Ping Wan},
      title = {Prediction of Escherichia Coli K-12 Promoters Using Convolutional Neural Network},
      journal = {Computational Biology and Bioinformatics},
      volume = {6},
      number = {2},
      pages = {31-35},
      doi = {10.11648/j.cbb.20180602.11},
      url = {https://doi.org/10.11648/j.cbb.20180602.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.cbb.20180602.11},
      abstract = {Promoters are significant cis-acting elements in genomes and play important roles in gene regulation. Each gene is regulated by a specific type of promoter, so determining the type of promoter for regulation of a gene is crucial to explore the gene function. Although some computational methods to predict promoters have been proposed, their performances are not satisfying. Convolutional neural network (CNN) is a powerful model in deep learning, it has been applied in bioinformatics in recent years. To improve the performance of promoter prediction, in this study, six types of Escherichia coli K-12 promoter DNA sequences were collected from the RegulonDB database, and constructed a CNN model to predict promoters using the Keras platform. The CNN model is composed of two convolutional layers, three dropout layers, four batch normalization layers and one hidden layer. To evaluate the performances of the CNN model, the 10-fold cross-validation and the receiver operating characteristic (ROC) curve plotting were performed. The results show, the accuracies of predictions for promoters sigma 24, sigma 28, sigma 32, sigma 38, sigma 54 and sigma 70 are 94%, 97%, 95%, 95%, 97% and 83%, respectively. The convolutional neural network model achieves the highest accuracy in promoter prediction up to now. In conclusion, CNN is the best model in promoter prediction, and it will be a promising model both in DNA and protein sequence analysis.},
     year = {2018}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Prediction of Escherichia Coli K-12 Promoters Using Convolutional Neural Network
    AU  - Lu Wang
    AU  - Ping Wan
    Y1  - 2018/11/30
    PY  - 2018
    N1  - https://doi.org/10.11648/j.cbb.20180602.11
    DO  - 10.11648/j.cbb.20180602.11
    T2  - Computational Biology and Bioinformatics
    JF  - Computational Biology and Bioinformatics
    JO  - Computational Biology and Bioinformatics
    SP  - 31
    EP  - 35
    PB  - Science Publishing Group
    SN  - 2330-8281
    UR  - https://doi.org/10.11648/j.cbb.20180602.11
    AB  - Promoters are significant cis-acting elements in genomes and play important roles in gene regulation. Each gene is regulated by a specific type of promoter, so determining the type of promoter for regulation of a gene is crucial to explore the gene function. Although some computational methods to predict promoters have been proposed, their performances are not satisfying. Convolutional neural network (CNN) is a powerful model in deep learning, it has been applied in bioinformatics in recent years. To improve the performance of promoter prediction, in this study, six types of Escherichia coli K-12 promoter DNA sequences were collected from the RegulonDB database, and constructed a CNN model to predict promoters using the Keras platform. The CNN model is composed of two convolutional layers, three dropout layers, four batch normalization layers and one hidden layer. To evaluate the performances of the CNN model, the 10-fold cross-validation and the receiver operating characteristic (ROC) curve plotting were performed. The results show, the accuracies of predictions for promoters sigma 24, sigma 28, sigma 32, sigma 38, sigma 54 and sigma 70 are 94%, 97%, 95%, 95%, 97% and 83%, respectively. The convolutional neural network model achieves the highest accuracy in promoter prediction up to now. In conclusion, CNN is the best model in promoter prediction, and it will be a promising model both in DNA and protein sequence analysis.
    VL  - 6
    IS  - 2
    ER  - 

    Copy | Download

Author Information
  • College of Life Sciences, Capital Normal University, Beijing, China

  • College of Life Sciences, Capital Normal University, Beijing, China

  • Sections