FIU Data Science Bootcamp Application Deadline is Oct. 3 - Apply Now

Financial Applications of Natural Language Processing

By Adam Wearne • June 25, 2019

Photo by Sebastian Scholz (Nuki) on Unsplash

This post was written by Adam Wearne, Sr. Data Scientist at Metis.

In an increasingly competitive financial landscape, gaining an informational edge is vital to constructing and maintaining a superior equity portfolio. Alternative data sources including satellite images, sensor data from IoT devices, text, and video are all becoming increasingly important sources of insight for active equity strategies. The volume of such sources of unstructured data is so profound that by some estimates, unstructured data accounts for over 90% of the entire digital universe. In this post, we'll highlight some of the common Natural Language Processing (NLP) techniques that are used in asset management.

Text data is, in particular, is one of the largest and fastest growing forms of alternative data. Uncovering investment insights requires not only domain knowledge of finance, but also a strong grasp of data science and machine learning principles. In the past, the volume and velocity of textual data were manageable enough to be manually analyzed by teams of human experts. But given the volume of text data being currently produced on a daily basis, it is now an untenable task for even a large team of fundamental researchers to wade through it all. Fundamental analysis assisted by NLP techniques is now a critical marriage to unlock the complete picture of how the experts and the masses feel about the market.

Sentiment analysis

Sentiment analysis is perhaps the most common methods for gaining investment insight from text. The intuition is pretty apparent here - if we want to gain insight about the expected future return of a stock, it makes sense to know how people feel about that company! The most common methods of sentiment analysis in finance can be largely divided into two camps: Lexical and statistical methods. Lexical approaches have an inherent psychological component that is built into the system. This typically involves a panel of domain experts defining annotating a dictionary of words with their associated semantic polarity and strength. The semantic polarity of a given term is highly domain dependent and care must be given when deciding what lexicon to apply to any given problem. Perhaps one of the most notable sentiment dictionaries used in the space of finance and investment management is the Loughran-McDonald dictionary.

Aside from the Lexical-based methods of sentiment analysis, statistical approaches are drawn from many of the standard supervised learning approaches you may have seen in the past. Techniques like logistic regression, ensemble methods, and deep-learning all fit the bill here. The interesting research challenges in this arena are less-so the problem of applying the aforementioned models, but rather, how can one reliably assign a sentiment score to a longer piece of text, and how do we go about determining relative sentiment? A news headline that reads: "Company X wins large lawsuit against Company Y" is certainly good news for Company X, and potentially very bad news for Company Y. An interesting problem in modern applications of NLP to quantitative finance lies in understanding how the same document may have very different implications for different companies. 

In addition to the specific method of sentiment analysis, one must also be mindful of the source text that is being analyzed. Are we looking at press releases? News headlines? Conference calls or corporate filings? Social media chatter? All of these sources contain potentially useful insights for generating alpha, but how information is conveyed across these different mediums is wildly different. What might be considered a very negative sentiment statement from Reuter's headline would look very different from a very negative tweet.

Beyond Sentiment

Of course, these statistical methods require us to have large labeled datasets. This effect is compounded if multiple emotional valences are being considered. So, if we're unable to produce such a large labeled dataset, are we stuck? Not at all! One can also employ unsupervised learning strategies along with some human-in-the-loop intervention to produce novel and sustainable investment strategies. Non-sentiment-based approaches may combine elements of topic modeling and clustering which have the potential for interesting investment applications. In the one approach, news headlines are first analyzed using modern dependency parsing and named-entity-recognition techniques in an attempt to determine what is happening and to whom, distilling their contents down into a simple Subject-Verb-Object (SVO) format. Taking the Verb-Object portion of this triplet across many news headlines, one can cluster them and begin to develop a picture of the effect they have on the Subject of a given headline by examining stock returns relative to the publication date of the headline.

Risk Modeling

Apart from using sentiment-like approaches to aid in forecasting returns, one can also incorporate text information from corporate filings to provide an alternative perspective on risk modeling. Publicly traded companies in the United States are required by the SEC to annually submit a form (10-K) which details information about the financial performance of the company. There are many standardized sections that companies are obliged to submit, including Risk Factors identified by the company. By running a topic model like LDA over the text in the risk section of corporate 10-Ks and examining how the distribution of topics overlaps between companies, one can gain insight as to what companies share common underlying risk factors. This method provides an alternate view of the portfolio risk that can be used to enhance standard returns-based approaches.

Perspectives on the future

There are still many open problems in the space of NLP applied to finance. Aspect-based sentiment analysis, coreference resolution, evaluating how novel or "surprising" a news article is, and many others. There is no shortage of interesting research problems in the space of NLP applications in finance. What's need is more creative minds and enthusiastic data scientists to help drive the field into the future!

Similar Posts

business resource
How to Build a Data Science Portfolio: The 5 Phases

By Carlos Russo • April 22, 2021

Effective corporate data science portfolios rely on a solid foundation built by identifying challenges, pitching ideas, scoping out pitches, and planning out paths that evolve into strategy success.

business resource
Scoping Data Science Projects

By Damien Martin • July 07, 2021

In February, Metis Sr. Data Scientist Damien Martin wrote a post on how to foster a data literate and empowered workforce, which allows your data science team to then work on projects rather than ad hoc analyses. In this post, he explains how to carefully scope those data science projects for maximum impact and benefit.

business resource
VIDEO: An AI4 Panel Discussion on The State of AI in Banking

By Carlos Russo • September 23, 2020

Metis Sr. Data Scientist Javed Ahmed recently took part in a panel discussion about The State of AI in Banking during an online Ai4 event. He and the other panelists talked about upskilling, challenges related to COVID-19, and more. Watch the recorded panel discussion here.