[ biopathway.org ]


BioIE is a novel system that extracts general biological interactions of arbitrary types, including protein-protein interactions, from the rapidly growing volume of biomedical literature in on-line resources such as MEDLINE and annotates the results with the terms of biomedical ontologies such as Gene Ontology and MeSH. It builds on the proposal [7], but is much enhanced in delivering both the quality and the diversity of the results by examining if the arguments of interaction keywords are noun phrases not only by themselves but also in the full sentential context and by retargeting new biological interactions in a straightforward way, showing one of the state-of-the-art performances. In the process of making grammatical validation, it utilizes a full-fledged grammar formalism, Combinatory Categorial Grammar (CCG), that is known to fully characterize the syntactic (and other) aspects of natural languages [8,9], and complements the grammar formalism containing rich domain-independent linguistic information with an intelligent treatment of unknown words that are mostly domain-specific, i.e. treating unknown words syntactically as nouns. Furthermore, in the process of ontological annotation, BioIE deals with syntactic variations of ontological terms in the literature by uncovering the syntactic dependencies between proteins and ontological terms in the same sentences. The present performance of the system is in the nineties for the precision and in the fifties for the recall. Test corpora are available from the following links: test corpus l and test corpus 2.


  1. Jung-jae Kim and Jong C. Park, Extracting Contrastive Information from Negation Patterns in Biomedical Literature, ACM Transactions on Asian Information Processing (TALIP), 2006. (to appear)
  2. Jong C. Park and Jung-jae Kim, Named Entity Recognition, Chapter Four of the Book "Text Mining for Biology", editors: B. Stapley and S. Ananiadou, Artech House Publishers.
  3. Jung-jae Kim, Zhuo Zhang, Jong C. Park and See-Kiong Ng, BioContrasts: Extracting and Exploiting Protein-Protein Contrastive Relations from Biomedical Literature, Bioinformatics Advance Access published December 20, 2005. (pdf)
  4. Jung-jae Kim and Jong C. Park, Annotation of Gene Products in the Literature with Gene Ontology Terms using Syntactic Dependencies, K.-Y. Su et al. (Eds.): IJCNLP 2004, Lecture Notes on Artificial Intelligence (LNAI) 3248, pp. 787-796, 2005. (link)
  5. Jung-jae Kim and Jong C. Park. Deciding When to Stop: Enhancing the Performance of Information Extraction with Deeper Linguistic Analysis, Proceedings of the 3rd Korea-Singapore Joint Workshop on Bioinformatics and Natural Language Processing, pp. 41-45, Muju Resort, Jeonbuk, South Korea, February, 2005. (pdf)
  6. Jung-jae Kim and Jong C. Park, BioIE: Retargetable Information Extraction and Ontological Annotation of Biological Interactions from the Literature, Journal of Bioinformatics and Computational Biology (JBCB), 2(3):551-568, 2004. (link)
  7. Jung-jae Kim and Jong C. Park, BioAR: Anaphora Resolution for Relating Protein Names with Proteome Database Entries, Reference Resolution and its Application Workshop in conjunction with ACL 2004, pp. 79-86, Barcelona, Spain, 2004. (pdf)
  8. Jung-jae Kim and Jong C. Park, Annotation of Gene Products in the Literature with Gene Ontology Terms using Syntactic Dependencies, The 1st International Joint Conference on Natural Language Processing (IJCNLP-04), pp. 528-534, Hainan island, China, March, 2004.
  9. Lynette Hirschman, Jong C. Park, Jun-ichi Tsujii, Limsoon Wong, and Cathy Wu. Accomplishments and Challenges in Literature Data Mining for Biology, Bioinformatics, 18(12):1553-1561, December, 2002.
  10. Jong C. Park, Using Combinatory Categorial Grammar to Extract Biomedical Information, IEEE Intelligent Systems, 16(6):62-67, 2001.
  11. Jong C. Park, Bioinformatics and Natural Language Processing, Special Issue in Korean Information Processing, Communications of the Korea Information Science Society (KISS), 19(10):46-51, 2001. (In Korean)
  12. Jong C. Park, Hyun Sook Kim, Jung Jae Kim, Bidirectional Incremental Parsing for Automatic Pathway Identification with Combinatory Categorial Grammar, Proceedings of the Pacific Symposium on Biocomputing (PSB), Hawaii, USA, 6:396-407, 2001.
  13. M. Steedman, The Syntactic Process, The MIT Press, 2000.
  14. J.C. Park and H.J. Cho, Informed Parsing for Coordination with Combinatory Categorial Grammar, International Conference on Computational Linguistics, pp. 593-599, 2000.

Page maintained by Jung-jae Kim
Last modified: March 23, 2006