Brain Language Metrics on Company Filings
The Brain Language Metrics (BLM) on Company Filings dataset has the objective of monitoring several language metrics on 10-Ks and 10-Qs company reports for approximately the largest 1000 US stocks.
In recent papers there has been a growing attention towards the language analysis of company reports and the study of possible relations with firms’ future performance.
Some literature works claim inefficiencies in the market response to company filings information due to the increased complexity and length of such reports; over the last 20 years, the length of the average 10-K has in fact increased dramatically.
The dataset includes several language metrics calculated on company filings, for example:
- Financial sentiment
- Percentage of words belonging to financial domain classified by language type (e.g. “litigious” language)
- Similarity metrics between documents, also with respect to a specific language type (for example similarity with respect to “litigious” language or “uncertainty” language)
The dataset is updated with a daily frequency since new 10-Ks and 10-Qs reports are released every day for some of the universe companies. Clearly the largest update will be around February, April, August and November when the largest number of reports is released. The historical dataset is available from year 2007.