Brain Language Metrics on Earnings Calls Transcripts

Overview

The exploitation of textual unstructured content (news, company filings, earnings calls etc) in financial analysis is quickly expanding across both quantitative and discretionary strategies as demonstrated by the growing number of academic papers and products in this domain.

The Brain Language Metrics on Earnings Calls Transcripts (BLMECT) dataset has the objective of monitoring several language metrics the quarterly earnings call transcripts for approximately 7000 global stocks.

The global earnings calls are sourced from Aiera, a global platform that monitors events and communications for a large universe of listed companies.


The metrics calculation is reported separately for the following sections of the transcript: Management Discussion; Analysts’ Questions  and Management Answers to Analysts’ Questions.


Language Metrics 

The dataset is made of two parts; one includes the language metrics for the most recent earnings call transcript for each stock, namely:

  • Financial sentiment calculated using Brain proprietary Large Language Model approach
  • Percentage of words belonging to financial domain classified by language type (e.g. “litigious” language)
  • Readability scores
  • Lexical metrics such as lexical density and richness
  • Text statistics such as the transcript length
  • Similarity metrics between documents, also with respect to a specific language type (for example similarity with respect to “litigious” language or “uncertainty” language)
  • Differences of the various language metrics between documents (e.g. delta sentiment, delta readability score delta, delta percentage of a specific language type etc.)
Brain Language Metrics on Earnings Calls Transcripts

The dataset is updated daily as new earnings call transcripts are published. For each stock, data is updated quarterly with new earnings calls. Historical data is available starting from 2012. The data is delivered daily as CSV files published in an S3 bucket.