Brain Language Metrics on Company Filings
Overview
The Brain Language Metrics on Company Filings dataset has the objective of monitoring several language metrics on 10-Ks and 10-Qs reports for 6000+ US stocks .
In recent papers there has been a growing attention towards the language analysis of company reports and the study of possible relations with firms’ future performance.
Some literature works claim inefficiencies in the market response to company filings information due to the increased complexity and length of such reports; over the last 20 years, the length of the company filings has in fact increased dramatically.
The dataset includes several language metrics calculated for the whole report and for specific sections (e.g. Risk Factors and MD&A sections). Some examples of calculated metrics are:
- Financial sentiment
- Percentage of words belonging to financial domain classified by language type (e.g. “litigious” language)
- Readability scores
- Lexical metrics such as lexical density and richness
- Similarity metrics between documents, also with respect to a specific language type (for example similarity with respect to “litigious” language or “uncertainty” language)
- Differences of the various language metrics between documents (e.g. delta sentiment, delta readability score delta, delta percentage of a specific language type etc.)