posted on Monday, August 28, 2006 12:12 PM
by
Jonathan Hodgson
More natural language processing using GATE
After my previous entry about natural language processing, I came across a project by the University of Sheffield called 'General Architecture for Text Engineering' (GATE).
The project aims to be:
- The Eclipse of Natural Language Engineering, the Lucene of Information Extraction, the leading toolkit for Text Mining.
- Used worldwide by thousands of scientists, companies, teachers and students.
- Comprised of an architecture, a free open source framework (or SDK) and graphical development environment.
- Used for all sorts of language processing tasks, including Information Extraction in many languages.
You can read the user guide, watch video demos or download it for yourself.
There is an online entity recognition web service demo, I tried it on the same example as Inxight's Thingfinder - although not as good as that commercial offering it is impressive and extensible.

Certainly worth more investigation for processing and 'understanding' unstructured content.