Brain Language Metrics on Earnings Calls Transcripts


The exploitation of textual unstructured content (news, company filings, earnings calls etc) in financial analysis is quickly expanding across both quantitative and discretionary strategies as demonstrated by the growing number of academic papers and products in this domain.

The Brain Language Metrics on Earnings Calls Transcripts (BLMECT) dataset has the objective of monitoring several language metrics the quarterly earnings call transcripts for 4500+ US stocks.

The metrics calculation is reported separately for the following sections of the transcript: Management Discussion; Analysts’ Questions  and Management Answers to Analysts’ Questions.

Language Metrics 

The dataset is made of two parts; one includes the language metrics for the most recent earnings call transcript for each stock, namely:

  • Financial sentiment 
  • Percentage of words belonging to financial domain classified by language type (e.g. “litigious” language)
  • Readability scores
  • Lexical metrics such as lexical density and richness
  • Text statistics such as the transcript length
  • Similarity metrics between documents, also with respect to a specific language type (for example similarity with respect to “litigious” language or “uncertainty” language)
  • Differences of the various language metrics between documents (e.g. delta sentiment, delta readability score delta, delta percentage of a specific language type etc.)
Brain Language Metrics on Earnings Calls Transcripts

The dataset is updated with a daily frequency since new earnings calls transcripts are published every day for some of the universe stocks. Clearly the data for each stock will change on a quarterly basis when new earnings calls are published. The historical dataset is available from year 2012.