Do Word Embeddings Really Understand Loughran-McDonald’s Polarities?

Talk / Overview

Market participants often would like to infer a sentiment from a text. It can be News, Analyst Reports, Earning Call transcripts, Regulatory Filings, etc. The “old way” to do it is to use a lexicon, like the Loughran-McDonald’s one, whereas the “new way” is to use embeddings that are trained in an unsupervised manner, relying on the quality of a “language model”. I will challenge these embeddings: can they really understand the polarity of a text? That for I need the natural probabilistic generative model for embeddings, to be able to generate synthetic texts and assess the identifiability of embeddings. It shows that they are not always good at making the difference between synonyms or antonyms. Moreover, training embeddings on different corpora (wikipedia, headlines of financial News, body of financial News), I will provide evidence not only that it may be impossible for them to capture sentiments polarities, but that they may attach polarities to terms that you want to be neutral, like names of companies.
For more details, the paper is there: https://arxiv.org/abs/2103.09813

Talk / Speakers

Charles-Albert Lehalle

Global Head - Quantitative Research & Development, Abu Dhabi Investment Authority (ADIA)

AMLD / Global partners

AMLD EPFL 2022 Do Word Embeddings Really Understand Loughran-McDonald’s Polarities?

Talk / Overview

Talk / Speakers

Charles-Albert Lehalle

AMLD / Global partners

Get tickets

AMLD EPFL 2024

AMLD Africa 2024

AMLD Generative AI