The project consists in designing and integrating advanced components to an innovative System for Natural Language processing, based on recent Transformer Language Models.
The Transformers are replacing current Recurrent Neural Network models, with a higher parallelization and making training on larger data sets possible. After initial work consisting in comparing these models to classic NLP techniques, in this project the work of selection and evaluation of the most suitable models for a number of problems is completed, verifying application to a number of domains and industries.
The project scope also includes evaluation of smaller-scale models, to improve performance and applicability, to extend the range of problems they can be used on, using less resources, exploiting known solutions like "distilling" and "pruning", etc. but also applying hybrid solutions, combining models with symbolic rule-based approaches.
The project work is applied to both plain texts and structured documents, to try and apply solutions to the two main steps of document understanding: the structure/layout and the textual content itself.