BLOG — May 20, 2024

Alternative data: The Nirvana of rock music?

The power of going beyond traditional metrics to extract signal in quantitative investment management remains at the forefront of financial analytics. At our recent S&P Global Market Intelligence Quantitative Investment Management Forum in New York, we continued the compelling discussions from our inaugural London Quantitative IM Forum exploring alternative data - not least, the lively topic of how do we define alternative data?

Alternative data = data

'Since when was Nirvana 'alternative rock'?' Seven Eight Co-founder Stephen Cash pondered on our Future of Alternative Data panel. Simply put, alternative data is just data, however for those still seeking a definition, alternative data is ultimately defined as data which 'doesn't have a ticker'. Whether defined as any data outside of standard, long-consumed sources such as financial statements and regulatory filings, or unstructured data scraped from sources such as social media, shipping or more recently synthetic data from LLMs, the consensus lay with our own Aditya Sharma's definition of alternative data as 'data not from its original asset class.'

Key to alternative data, explained Agostino Capponi, Professor of Financial Analytics at Columbia University, is the information it conveys, and how this flows into investment decision-making. For example, at a macro level alternative data such as S&P Global's Purchasing Managers' Index™ can be leveraged for nowcasting GDP, and our Panjiva Supply Chain Intelligence data leveraged to capture supply chain disruption. Alternative data enables both a speed advantage - faster processing of information, and a breadth advantage - the ability to process more information, and to combine a range of data types and sources rather than processing data in isolation. 'Big Data' is now far more accessible, and in partnership with Data Science tools to 'de-noise' data, alternative data enables investment managers to work at a faster pace and on a bigger scale, as well as across multiple shapes and sizes. Yuyu Fan of Alliance Bernstein highlights 'The Four Vs of Alternative Data': Volume, Velocity, Variety and Veracity - of which the last is most important: data quality is fundamental for ensuring accuracy and reliability of information.

Scale vs idiosyncrasy

Agostino Capponi underlined the importance of integrating both structured and unstructured data in an investment management process in order to maximise information extraction, and yet our Future of Alternative Data panel - led by Chris Petrescu of CP Capital - noted that alternative data can be very challenging to use, and that it is tough for alternative data to be used well. There can be challenges around managing huge databases. Conversely, there are also different risks working in smaller datasets with limited coverage and history. Investors must also be mindful of overfitting different market regimes, such as zero interest rate markets for example. And whilst alternative data is valuable for building larger portfolios of stocks, it can also be very niche, and require expertise and a lot of time to understand how value can be extracted. A dataset covering only 30 tickers must be approached idiosyncratically, but how do investment managers scale whilst being idiosyncratic? Tony Berkman, Managing Director at Two Sigma, raised this challenge and its inherent oxymoron.

In this vein, the perception of alternative data has evolved over the last few years. Whilst many quantitative funds previously required robust coverage, 15+ years of history and daily frequency, most have come to appreciate that these datasets are few and far between. As such, more firms are willing to compromise on these criteria if they are able to see an edge in the data. Quantitative investors typically prefer to make a lot of small bets across a broad universe, but recognize they are potentially missing out on alpha if they are not seeking new alternative data opportunities. Returning to macro data, our panel considered the significance of having some thesis around the macro-economic environment regardless of asset class, with Ben Cohen, Head of Data Strategy at WorldQuant, noting that it can be important to assess the impact of macro trends that could expose value-add datasets.

Interlinking: S&P Global DNA - Data Nourishing Alpha

As highlighted earlier, investment decisions are not made using discrete datasets - value is extracted by combining a variety of broad data. The foundation of our S&P Global data DNA, our S&P Global Cross Reference suite, contributes to generating alpha by seamlessly linking multiple datasets together via a shared identifier to organise, manage and provide structure to data, and thus to enrich and help our customers understand the data whilst minimizing manual processes.

Linking assets

Financial markets are becoming more interlinked, as S&P Global first explored more than a year ago whilst reviewing the relationship between the level of earnings call sentiment and changes in CDS spreads, and the 'spillage' between asset classes that our speakers considered at our London Quantitative IM Forum in Q4 2023. Furthermore, alternative data doesn't need to come from 'new' sources - traditional data with deep point in time history can be harnessed to create alternative data that adds alpha.

Over the last 12-18 months Fixed Income markets - given the heightened environment of interest rates and inflation, and Credit markets - where single-name CDS volumes have almost regained pre-covid levels, have become more impactful for Equity investors, and so - returning to our alternative data definition as 'data not from its original asset class' - S&P Global Market Intelligence has built a suite of cross-asset signals. Our Bond-Linked Equity Signals enable our customers to link the Equity and Fixed Income markets by leveraging our rich history of proprietary CDS and Bond Pricing data, along with our Cross-Reference mapping. We further extend our concept of linking by combining a broad variety of data, including our capital flow data such as our Equity Short Interest data, retail trade flow data and ETF compositions, as well as our proprietary macro indicators. Returning to 'Velocity', our latest Securities Finance research highlights how our new Intraday data is a Leading Indicator of End-of-Day Borrows in US Equities.

Linking companies

Alternative signals can also be created from linking companies, as demonstrated by S&P Global's new Company Connections: Detailed Estimates product based on traditional data - our deep and comprehensive S&P Capital IQ Sell-Side Analyst Estimates. Research shows that investors' inability to quickly update the asset prices of connected companies with new value-relevant information creates an investment opportunity. In the dynamic world of investing, companies are not isolated entities; they are interconnected. Our dataset and research provides a new way of looking at these relationships through a network of shared sell side analysts to create quantitative signals. As we look forward, we will continue to link, leveraging our vast S&P Global data estate to evolve our Company Connections suite with supply chain, human capital, textual meta and additional alternative assets to produce Equity and company alternative data signals.

Textual data

One of the key alternative data thematics for 2024 is textual data, given the increasing adoption of AI and in particular LLMs. As Yuyu Fan of Alliance Bernstein notes, many activities of asset managers are driven by textual data, and many alternative data trends require NLP capabilities to fully leverage their potential. Most textual data is unstructured, from sources such as emails, transcripts, articles and documents. These text files are usually difficult, time-consuming and expensive to analyse and utilize. S&P Global's Textual Data Suite, including our new machine-readable Nikkei News, identifies primary sources of textual information that can be parsed and structured for ease of use, bypassing the entire process of sourcing, cleansing and maintaining the data, while enabling metadata tagging and linking to other datasets such as financials and estimates, on top of which we run AI and NLP analysis. Yuyu Fan's own NLP techniques and research leverage S&P Global's Machine Readable Transcripts to extract insights for alpha generation, the resulting benefits of which are impactful proprietary investment signals leveraged across equity and fixed income strategies, as well as timely alerts sent directly to PMs and analysts. In addition, Yuyu's work leveraging S&P Global's Machine-Readable Filings asks the investment question as to whether changes in 10-Ks can help to identify risk, concluding that companies with low similarity scores between sequential filings significantly underperform.

AI

Leveraging technology and analytics is key for extracting actionable insights from data, and a core transformation right now is moving to a cloud-friendly, AI-ready view. As Michael Hoffmann of Kensho Technologies - S&P Global's AI hub - highlights, AI is a disruptive force for data. From human-readability, to machine-readability, and now AI-readability, data needs to be structured and pre-processed with AI use cases in mind. 'AI-ready data' is thus the new age genre taking traditional data, interconnectedness and delivery to a new level pitch optimized for LLMs. For tabular data, this increasingly looks like specialized LLM-ready APIs. Although LLMs must also be able to interpret unstructured, textual data, existing dense-vector search methods leave much to be desired. A robust design pattern for AI-ready textual data will likely take a longer time to mature. Yuyu Fan of Alliance Bernstein highlights the importance of pre-training LLMs, using vast corpora of text and in local languages, as well as providing context and well-defined prompts for problems/questions in order to improve the quality of results, concluding that competitive advantages can be created with expert annotations (human feedback) and fine-tuning. However the challenge with Generative AI is accuracy, and as such verifiability and auditability of data are important - 'Veracity' remains key.

Data democratised

In conclusion, the power of data and alternative data will continue to gain strength - especially as more firms adopt AI technology as mainstream, which will increasingly 'democratise' data. As more market participants gain access to AI tools, we will see Fundamental and Systematic strategies coming closer together - we already observe more quantamental approaches, quantitative strategies increasingly being applied to other asset classes such as Credit and derivatives, and also quantitative firms building discretionary teams to help inform them more about the data they are using. In future, how will firms monetise their proprietary data? As Tony Berkman, Managing Director at Two Sigma highlighted, proprietary data may soon be a line item listed on balance sheets, which could even for example be used as collateral against a loan. Meanwhile, many companies don't yet realise that they own valuable data. With ever more crowding, investment managers will need to be increasingly creative with how they leverage data, analytics and technology in order to squeeze alpha.

For all the rock music analogies, it is clear that alternative data will never be a case of never mind.

Please feel free to download a PDF version of this blog.


S&P Global provides industry-leading data, software and technology platforms and managed services to tackle some of the most difficult challenges in financial markets. We help our customers better understand complicated markets, reduce risk, operate more efficiently and comply with financial regulation.


This article was published by S&P Global Market Intelligence and not by S&P Global Ratings, which is a separately managed division of S&P Global.