S&P Global Offerings
Featured Topics
Featured Products
Events
S&P Global Offerings
Featured Topics
Featured Products
Events
S&P Global Offerings
Featured Topics
Featured Products
Events
Banking & Capital Markets
Economy & Finance
Energy Transition & Sustainability
Technology & Innovation
Podcasts & Newsletters
Banking & Capital Markets
Economy & Finance
Energy Transition & Sustainability
Technology & Innovation
Podcasts & Newsletters
S&P Global Offerings
Featured Topics
Featured Products
Events
BLOG — Nov 11, 2024
The potential of Large Language Models (LLMs) continues to captivate industries, from finance to healthcare, where the promise of better predictions, streamlined workflows, and nuanced analysis is revolutionizing decision making. However, with estimates suggesting that over 80% of AI projects fall short of their goals, where do organizations need to invest to succeed with LLMs?[1] This article draws from insights shared in our recent webinar, “Where to Invest to Maximize AI Success: Data or Tech.”
High-quality data is essential for LLMs, which thrive on a blend of structured and unstructured information to deliver accurate results. Beyond sheer volume, the standardization and relevance of data determine the model's effectiveness.
Standardization and Anonymization
High-quality data can be challenging to source and even harder to standardize across sectors, and it is especially critical in sectors like finance, where sensitive information must be carefully managed. Alex Kim, a PhD candidate at the University of Chicago Booth School of Business, emphasizes the need for data standardization and anonymization, particularly when working with sensitive financial data. In his study on financial statement analysis using LLMs1, Kim utilized anonymized balance sheets and income statements from Compustat Financials to ensure the model’s predictions were not influenced by identifiable company data, avoiding issues like “look-ahead bias.” This ensures LLMs provide insights based purely on the data provided, without pre-existing knowledge influencing outcomes.
Combining Contextual and Tabular Data
LLMs excel when they can analyze a mix of textual and tabular data. For example, S&P Global’s machine readable textual data, including earnings transcripts and filings, can be paired with quantitative data like financial ratios. This multi-layered data approach allows models to create a rich “data fabric,” deeper and more accurate insights. However, data quality doesn’t stop at collection. Rigorous cross-checks, linking to identifiers, and standardizing across languages are essential steps for ensuring consistency, especially for international companies.
Structuring Data to Be AI-Ready
Preprocessing data for AI consumption isn’t a trivial task. Organizations often face hurdles in linking various data types, such as syncing qualitative insights from earnings calls with quantitative performance data. This preprocessing involves not only formatting data but also aligning data from disparate sources—such as combining analyst sentiment with company financials for predictive modeling. S&P Global’s NLP-ready datasets aim to reduce the preprocessing burden by embedding metadata that tags and categorizes key information, helping you move from data to analysis more quickly. “The burden of preparing data for your use case should sit primarily with us,” says Kevin Zacharuk, Product Management Director at S&P Global, emphasizing the goal of letting teams focus directly on high-value analysis.
With metadata tagging, these datasets make it easier to identify trends, track emerging themes, and uncover insights without the need for extensive data preparation. These datasets can also integrate with other sources, such as financials and estimates, providing a more comprehensive view and allowing for interconnected insights across datasets. By simplifying the initial steps, NLP-ready enables teams to spend more time interpreting and applying insights rather than managing data.
Language-Agnostic Capabilities
For global applications, LLMs benefit from language-agnostic capabilities. By training models to process data in multiple languages, organizations can unlock insights across diverse markets without needing to rely on machine translation. This multilingual approach allows for direct analysis of regional data, offering deeper and often less biased insights.
The utility of LLMs extends across sectors, but finance provides compelling examples of their transformative impact. Whether identifying potential M&A candidates, tracking competitive risks, or analyzing investor sentiment, the fusion of LLM technology and robust data sets offers actionable insights that were previously unattainable.
Use Case: Financial Statement Analysis
In his research, Alex Kim demonstrated that LLMs could analyze financial statements and make predictions about future earnings. Notably, his study found that with the appropriate prompts, GPT models could emulate human analyst reasoning and predict earnings trends with a similar level of accuracy as dedicated financial analysts. This finding underscores the value of LLMs as both data analyzers and strategic advisors—able to support decision-making alongside traditional analysts rather than replace them entirely.
Use Case: Earnings Call Analysis
Professor Ronen Feldman, from the Hebrew University Business School, shared insights on LLM applications in analyzing earnings calls. By assigning polarity and importance to each phrase, the model can interpret the underlying sentiment and intentions of executives and analysts, allowing for more predictive insights. For example, changes in an executive’s tone on a topic like R&D spending can offer early indicators of a company’s strategic shifts, a level of granularity and immediacy traditional analysis might miss.
Regional Analysis and Language Agnosticism
LLMs also open up cross-regional data analysis. With language-agnostic LLMs, firms can analyze earnings calls or filings in Mandarin, Japanese, or Spanish, deriving insights without the need for translation. For investors or firms looking to expand internationally, this capability provides a clear advantage, offering accurate and culturally nuanced analysis.
The capabilities of LLMs will likely continue to expand as new technological advancements emerge. Multimodal models capable of processing not only text but also images, graphs, and tables are already on the horizon, promising even deeper insights. In areas like ESG, for example, LLMs can interpret sustainability reports with tables and graphs, offering a more comprehensive view of a company’s impact and future risks.
Yet, as LLMs grow in complexity, ensuring that data privacy and quality controls keep pace is critical. Enterprises increasingly demand that data remains within secure environments, especially as sensitive financial and proprietary information comes into play. As Feldman notes, S&P Global’s approach to data privacy—ensuring no data is retained post-processing—helps meet this need without compromising analytical power.
Maximizing the success of LLMs requires a balanced investment in both high-quality data and advanced technology infrastructure. For organizations ready to adopt LLMs, prioritizing standardized, structured data is as crucial as selecting the right tools to preprocess and interpret that data effectively. As the field evolves, the most successful applications will likely be those that combine human expertise with machine intelligence, leveraging LLMs as both analytical partners and strategic amplifiers.
For a more in-depth discussion on where to focus your investments—data or technology—watch the webinar, “Where to Invest to Maximize AI Success: Data or Tech.”
[1] RAND Corporation. (2024). Challenges in AI project outcomes: A study of 65 AI engineers and data scientists. RAND Corporation.
Theme
Products & Offerings
Segment