Ljubljana, Slovenia

A Uniform Meaning Representation for NLP Systems

when 31 July 2023 - 11 August 2023
language English
duration 2 weeks
fee EUR 490

Impressive progress has been made in many aspects of natural language processing (NLP) in recent years. Most notably, the achievements of transformer-based large language models such as ChatGPT would seem to obviate the need for any type of semantic representation beyond what can be encoded as contextualized word embeddings of surface text. Advances have been particularly notable in areas where large training data sets exist and it is advantageous to build an end-to-end training architecture without resorting to intermediate representations. For any truly interactive NLP applications, however, a more complete understanding of the information conveyed by each sentence is needed to advance the state of the art. Here "understanding entails the use of some form of meaning representation. NLP techniques that can accurately capture the required elements of the meaning of each utterance in a formal representation are critical to making progress in these areas and have long been a central goal of the field.
As with end-to-end NLP applications, the dominant approach for deriving meaning representations from raw textual data is through the use of machine learning and appropriate training data. This allows the development of systems that can assign appropriate meaning representations to previously unseen sentences. Generating training data that represents meaning, however, is a very different undertaking from collecting human translated text or transcribed speech as it is not ``naturally occurring. It requires the development of a consensus on formal meaning representations that can be used by humans to annotate significant amounts of data for the sole purpose of training the machine learning algorithms. This has been an elusive target because it is a delicate balancing act between a number of factors. It must provide fair and equal treatment for multiple languages, be intuitive enough to enable fast, consistent human annotation, and support the training of accurate, useful automatic modules that work well in downstream applications. A meaning representation that strikes the right balance would solve one of the long-standing intellectual problems in the field and have a transformative effect on NLP specifically, and on Artificial Intelligence in general.
In this course, we describe the framework of Uniform Meaning Representations (UMRs), a recent cross-lingual, multi-sentence incarnation of Abstract Meaning Representations (AMRs), that addresses these issues and comprises such a transformative representation. Incorporating Named Entity tagging, discourse relations, intra-sentential coreference, a partial treatment of negation and modality, and the popular PropBank-style predicate argument structures with semantic role labels, into a single directed acyclic graph structure, UMR builds on AMR and keeps the essential characteristics of AMR while making it cross-lingual and extending it to a document-level representation.
We introduce the basic structural representation of UMR and describe its application to multiple languages. We present a formal semantic interpretation of UMR incorporating a continuation-based semantics for scope phenomena involving modality, negation, and quantification. We describe how UMR encodes TAM (Tense Aspect Modality) information in multiple languages. We describe parsing algorithms that generate AMR and UMR representations over multiple languages. Finally, we introduce an extension to UMR for encoding gesture in multimodal dialogue, Gesture AMR (GAMR), which aligns with speech-based UMR to account for situated grounding in dialogue.

Course leader

Martha Palmer and James Pustejovsky

Target group

Students

Fee info

EUR 490: Early student registration
EUR 690: Early non-academic registration

Scholarships

There are several scholarship options which you can read about on our website.