ScienceIE

Example

The first part is the plain text paragraph (with keyphrases type-faced for better readability), followed by the paragraph annotations visualised with brat, followed by stand-off keyphrase annotations based on character offsets and by relation annotations.

Input: excerpt from a scientific publication

Information extraction is the process of extracting structured data from unstructured text, which is relevant for several end-to-end tasks, including question answering. This paper addresses the tasks of named entity recognition (NER), a subtask of information extraction, using conditional random fields (CRF). Our method is evaluated on the ConLL-2003 NER corpus.

Annotated paragraph visualised with brat

Subtask (A): Identification of keyphrases

ID	Start	End
0	0	22
1	150	168
2	204	228
3	230	233
4	249	271
5	279	304
6	306	309
7	343	364

Subtask (B): Classification of identified keyphrases

ID	Type
0	TASK
1	TASK
2	TASK
3	TASK
4	TASK
5	PROCESS
6	PROCESS
7	MATERIAL

Subtask (C): Extraction of relationship between two identified keyphrases

ID1	ID2	Type
2	3	SYNONYM-OF
3	4	HYPONYM-OF
5	6	SYNONYM-OF