Overview
BioIE is a novel system that extracts general biological interactions of arbitrary types, including protein-protein interactions, from the rapidly growing volume of biomedical literature in on-line resources such as MEDLINE and annotates the results with the terms of biomedical ontologies such as Gene Ontology and MeSH. It builds on the proposal [7], but is much enhanced in delivering both the quality and the diversity of the results by examining if the arguments of interaction keywords are noun phrases not only by themselves but also in the full sentential context and by retargeting new biological interactions in a straightforward way, showing one of the state-of-the-art performances. In the process of making grammatical validation, it utilizes a full-fledged grammar formalism, Combinatory Categorial Grammar (CCG), that is known to fully characterize the syntactic (and other) aspects of natural languages [8,9], and complements the grammar formalism containing rich domain-independent linguistic information with an intelligent treatment of unknown words that are mostly domain-specific, i.e. treating unknown words syntactically as nouns. Furthermore, in the process of ontological annotation, BioIE deals with syntactic variations of ontological terms in the literature by uncovering the syntactic dependencies between proteins and ontological terms in the same sentences. The present performance of the system is in the nineties for the precision and in the fifties for the recall. Test corpora are available from the following links: test corpus l and test corpus 2.References