Skip to main content

RIch Text Analysis through Enhanced Tools based on Lexical Resources

The objective of the project is the development of tools for the syntactico-semantic analysis of Spanish and Portuguese. To do this, we aim to build a framework to integrate the different capabilities and resources of the groups involved. In particular, we want to integrate different lines of work on compositional semantics and enriched lexica: the Lexicon-Grammar tables, verbal subcategorization frames, multiword expressions, grammatical formalisms with enough expressivity to integrate this information, and learning mechanisms capable of building complex models from examples at these levels of analysis.

Our main lines of work are:

● Theoretical and practical research on grammar formalisms and different parser types: among others, dependency analysis, categorial grammars, HPSG grammars, to reach a consensus on the type of formalism and analyzer which will implement the achievements.
● Survey and integration of lexical resources from different sources and possibly different languages: integration of verbal lexicons, annotated corpora, extrapolation from annotated corpora in one language to another language, using comparable corpora for terminology extraction.

● Feasibility study of applying machine learning techniques to the chosen formalism and the enriched lexicon. 

Grupo de investigación
Entidad Financiadora
STIC - Amsud