Designing and developing components of an innovative Natural Language Processing System

Starting date
July 8, 2021
Duration (months)
12
Departments
Computer Science
Managers or local contacts
Combi Carlo

The project consists in designing and integrating advanced components to an innovative System for Natural Language processing, based on recent Transformer Language Models.
The Transformers are replacing current Recurrent Neural Network models, with a higher parallelization and making training on larger data sets possible. After initial work consisting in comparing these models to classic NLP techniques, in this project the work of selection and evaluation of the most suitable models for a number of problems is completed, verifying application to a number of domains and industries.
The project scope also includes evaluation of smaller-scale models, to improve performance and applicability, to extend the range of problems they can be used on, using less resources, exploiting known solutions like "distilling" and "pruning", etc. but also applying hybrid solutions, combining models with symbolic rule-based approaches.
The project work is applied to both plain texts and structured documents, to try and apply solutions to the two main steps of document understanding: the structure/layout and the textual content itself.

Sponsors:

Expert.ai s.p.a.
Funds: assigned and managed by the department

Project participants

Carlo Combi
Full Professor
Stefano Simonazzi
Research Scholarship Holders
Research areas involved in the project
Sistemi informativi
Information systems applications

Activities

Research facilities