The Ultimate Guide to Alternative Data 2021
Alternative data is rapidly becoming the main identifier of untapped alpha in today’s stock market. Investors and hedge fund managers are using alternative data to analyze factors which determine the potential risk and return of investing in a financial entity. Traditional sources of information about financial entities can only provide a limited depth of investment insights. Using an alternative data set allows traders to analyze portfolios and funds at a granular level of detail, with information from a huge range of data categories and industries.
What is alternative data?
Alternative data is an umbrella term which refers to any information gathered about a financial instrument which doesn’t come from conventional sources. Conventional sources include SEC filings, media reports, press releases and financial records. Data derived from any other sources, including ESG, news sentiment, location, satellite, location, weather and credit card data, is considered ‘alternative’. It’s a data category relevant to multiple use cases, but alternative data is primarily used in finance to gain unique insights on a company’s expected performance over time. Alternative data providers supply investors with competitive analytics into investment opportunities. Alternative data sources can be used to enrich an investor’s pre-existing datasets to beat average market returns and unlock new sources of alpha.
What size is the alternative data market?
The alternative data market is growing - spending on alternative data is set to climb from $232 million in 2016 to a projected $1.7 billion by the end of 2020. This explosive growth is thanks to private-equity firms, hedge funds, principal investors and other financial service providers turning to alternative data to gain an edge over their competitors.
This growth in demand for alternative data is being reflected in supply. On Datarade’s marketplace, we’re seeing more and more alternative data providers and vendors offering their ready-to-use datasets, and we’re updating our list of alternative data products on a daily basis.
What are some examples of alternative data?
The attributes of alternative data can be separated into three categories: data related to people and user input, data related to business processes and transactions, and data collected by sensors.
Alternative data examples include:
User Input - Social Media Sentiment, News Sentiment, Web Traffic, App Usage, Survey
Alternative data providers use web scraping tools to estimate public sentiment based on content from social media platforms and news websites. Web traffic and app usage data are other examples of alternative data. They indicate consumer intent and behavioral patterns. These insights are valuable for marketers and investors to make timely decisions and accurate forecasts. Alternative data providers aggregate all consumer information so that it’s privacy-compliant, and ready-to-use.
Business Processes - Credit and Debit Card, POS, Consumer Receipts, E-Receipts, Supply Chain Relationships, Industry Data
The attributes of alternative data for investment and finance are often linked to business processes. Consumer transaction and POS data reveals how a product, brand or company is performing in terms of sales. This then provides indicators for traders about investment profitablility.
Similarly, supply chain relationships are insights derived from the fulfillment records of toll stations, freight airports, and maritime ports. Supply chain alternative data indicates how well-connected a distribution network is and what the rate of trade between locations is. This can be used by manufacturers and haulage companies for global vessel tracking, and to streamline supply line operations. Investors can also use it to predict stock performance.
In the automotive industry, car dealerships prepare quarterly reports that reveal sales numbers. Car insurance policies, through a partnered company, can predict these numbers very accurately. This kind of alternative data is only available through an alternative data vendor, and allows financial institutions to buy or sell shares before any public announcement has been made.
Sensors - Geospatial, Satellite, Weather
Alternative data with attributes referring to location and geography is collected via various sensors and signals. These attributes are linked to a variety of use cases, from finance decisions to environmental monitoring.
Satellite images can be monitored closely to predict how a crop is going to perform over the course of a year. A prediction about market supply of the crop can then be made. This will in turn affect demand, and determine the prices of various products made out of these crops. Crop yields are largely dependent on environmental factors like bush fires, droughts, and cloud coverage, which are of interest to environmental analysts.
So, this is a relevant and vital source of information for companies that manufacture financial products out of these crops. It is also crucial for stocks of companies that depend on raw materials.
What are the sources of alternative data?
Again, there are 3 main sources of alternative data: user input, business processes and sensors. Across the alternative data market, these sources are used by providers to collect clean data and offer it via datasets and APIs.
Alternative data from the online activity of individual users is collected through search engines like Google, Bing, and Yahoo. It’s also collected from social media sites and e-commerce stores using web scraping tools.
As a source of alternative data, user input is most useful when it’s collected at scale. The more consumer profiles in a dataset, the more comprehensive the insights will be.
Credit and debit card transactions, sales records, insurance records, and data collected by government agencies all refer to alternative data derived from business processes. It’s sometimes called ‘exhaust data’, as this type of data is usually a by-products of standard business processes. Business process data is highly valued by companies, but they can be expensive depending on collection methods and licensing fees.
Alternative data collected via businesses processes is usually highly-accurate because it tends to be collected digitally and in real-time.
Where the Internet of Things is progressing at a rapid rate, sensors are becoming more prevalent across the world. CCTV, POS systems, beacons, electric toll gate transmissions, and various smart home devices - all are sensors which are constantly emitting signals and providing data for analysis.
Satellite data imaging is improving every day and the accuracy and definition of geo-locator devices is also on the rise.
This type of alternative data is crucial for monitoring weather patterns. Satellite data is also important for analyzing the shopping behavior of consumers and vessel tracking.
Though sensors offer impressive scale and accuracy, the alternative data collected from them is not always easy to analyze and put into immediate action. That’s where alternative data providers offer a solution. An alternative data provider will deliver datasets which are integration-ready, machine-readable, and instantly actionable.
Assessing the quality of financial alternative data is challenging due to 3 major factors.
Understanding the value of the data collected from a certain source yields transparency and credibility. Some of the datasets that get collected have no research or track record behind them. It is difficult to predict if this data can be effectively used in financial decision making and services.
Moreover, much of the information that is collected from the very basic level – say from the product sales or service – cannot be effectively translated. There is no predicting how valuable this data is going to be for tradable securities.
The relevance of a particular dataset is important for investment factors. Most of the data collected from an alternative data source need a lot of processing through neural language or artificial intelligence (AI)-driven machine languages. These datasets are collected in an unstructured format and need severe processing to become intelligible.
The issue with this kind of dataset is that they are also very limited. A lot of data must be collected to conduct proper back-testing. Vendors must wait and collect enough data to create a historical archive for it to become relevant.
As this data is unstructured, a lot of formatting is also required for getting content that has integrity. The format should be useful for companies that are investing in the data. All these factors make data collection and processing difficult; thus, vendors must be able to comply with all these demands to have credibility.
The value of the data depends on capacity. Capacity is the determining factor on how much you can even trade, and whether the investment is worth the insight received from data providers. Niche data that only caters to a small number of stocks or technology, healthcare, or retail can be of limited value. The more users target these niche data sets, the less valuable it becomes because of arbitrage. Therefore, vendors also have sophisticated enough data models to detect these trend changes.
How is alternative data collected?
As alternative data seeks to be a unique and non-traditional source of information, the collection methods vary quite radically;
Alt. insights via Web scraping
Web scraping or web harvesting is the practice of collecting data from various websites on the internet. Scrapers or bots visit various web pages and download relevant information which is then processed through a collection of text processing functions.
This information can then be extracted and transported in a spreadsheet or transformed into a form that can be very easy to understand. Web scrapers can extract contacts and other details from a page and export this in excel sheets, or other formats.
Web scraping is prevalently used in lead generation, market analysis, price comparison, and competition monitoring, gathering data from multiple sources for analysis.
Acquisition of raw data
Raw data is a collection of data that must be processed to be used, but unintelligible in the original form. Sensors are a great source of raw information that must be cleaned for noise, interference, or other contaminants before it can be effectively used to gather market intelligence.
3rd party licensing
Some companies can get licenses for collecting exhaust data. This is the data that is a by-product of a business process. Different companies can have different rates for selling licensed exhaust data such as POS transactions, debut or credit card transaction details, etc. This data is then processed in a structured format that is sold to various companies. Major players in this field include organizations like Quandl, YipitData, and iResearch.
What are the challenges with alternative data?
As stated above, the collection of quality and valuable alternative data is perhaps the main challenge. There are various sources of alternative data extraction.
Non-traditional data sources – Collecting digital exhaust from web traffics. Gathering logistics data that can quantify the shipping activities of a company are usually non-traditional.
Unstructured data sources – This kind of data can be collected from web scraping, social media, surveys, etc. This data is highly unstructured and significant investment needs to be made to the processing of this data through machine learning or neural language processing.
Aggregated transactions – This is financial transaction data, which has high licensing fees.
And lastly satellite imaging, or geolocational data collected from various sensors.
The issue that is common with all these sources is the collection. Not only the collection methods are expensive, but they also require a lot of computing power. Each day, there is 2.5 exabytes of data being generated, which requires a huge storage server, processing capacity, computing power, and analytical resources. And this is not even a fixed amount. The amount of data that gets created doubles every 40 months, so the collection will always remain the biggest problem with alternative data.
Alternative data is unstructured
Since the collection methods are varied and have such a huge volume, this data is also very loosely structured. It cannot be received and integrated into its raw form. It definitely cannot be utilized until the vendors provide a very highly processed, quality content version of the original data.
Now, some of this data is a little more structured than the rest of the data. They could have patterns that can make categorizing them easier.
However, the real volume of unstructured data usually does not have any patterns, labels that can easily categorize it. This could be an audio, video or social media-related data.
This unstructured data cannot be consumed without transformation through various analytical platforms. This could mean expensive proprietary algorithms, advanced technology, a combination of multiple data sources that transform the data into a structured format, etc.
The main challenge is narrowing down the data, cutting the noise and interference, enhancing the connection of various data points. All this also needs to be done transparently and dynamically which instills confidence in investors.
Incomplete or unverifiable data
The problem with unstructured data is that it also tends to be incomplete. For example, the slightest gap in a time-dependent series can make the complete dataset useless for conducting a historical analysis. The unstructured nature of data also makes it difficult to perform quantitative analysis with the data.
Whereas structured data sets that can be gathered from website activities of users over some time would be easier to implement in a design. Easier to test and create insights that could be valuable for trading. However, for unstructured to become useful a deep archive is needed that can be utilized for quantitating analysis.
The data might be of limited use
Incomplete data has a very limited scope. In a recent project, building a scoring model outside the US – it was discovered that alternative data attributed to 8% of the ROC.
ROC is the measure that tells how powerful a predictive model is. The higher the ROC of a predictive model – the better is the accuracy. As alternative data contributed 8% to it, it cannot be considered completely useless.
However, it is nothing compared to the 92% predictive power that came from data sourced by vendors using traditional means of data collection. This limits the application power of alternative data severely.
Privacy concerns can be crucial
Most of the alternative data sources and collection methods we have discussed are a result of commercial activities by users. This presents a huge concern related to user security.
Customers can have severe security concerns. Previously, there were only cursory restrictions on the usage of third-party data to protect it. Originators would operate with much more freedom concerning their internal datasets.
With the General Data Protection Regulation, the EU has tightened up data security to consumers greatly as of 2018. This is the case for many other countries as well. Violators are punished with severe financial penalties.
Potential lack of transparency among data providers
Even if all this, if an investment firm has been able to procure a data set that is deemed desirable, the problem of sourcing remains. The alternative data sector is still new, and the data firms or owners can be pretty inexperienced.
They also have proprietary algorithms used to process the data that is not transparent. This is a concern with many investment firms.
Alternative data collection deals with crossing a lot of bureaucratic barriers with the companies that own the raw data.
How to assess the quality of alternative data?
Alternative data quality can be evaluated against the following attributes:
In the entire data set, there is only one unique entry. This is the basic idea. So, consider that you collect the health records of 100 patients from a hospital, and you see that there are over 100 list items. This indicates that there is either irrelevant data in the dataset or that data has been replicated for any of the patients.
Business requirements and situations can largely depend on how well the data has been analyzed, and any duplication in the data can skew the results, and provide an inaccurate outcome. Ideally, 100% uniqueness of data is desirable, however, in a real-world scenario, it can be difficult to achieve that with an alternative date.
Time has a gigantic impact on data. Let’s look at a certain kind of data. Previous sales, product launches are such scenarios where collected data tends to be accurate only for a certain period. However, driving real decisions based on data relies not only on gathering the correct information, but also the timeliness of collection. The accuracy and value of data can fade away over time.
The number of traffic accidents happening 5 years back would not have relevance in today’s world when a company is trying to make decisions on what infrastructure is required now and in the future.
Data sets collected through various traditional sources go through a lot of standardized testing. The validity of data establishes how the data items can be connected to the source, in case there is a need to understand how credible the source is. Data items must be connected with real-world contexts or the data is simply not adequate in its integrity.
The accuracy of datasets can be determined by connecting them with the method of identification. There can be a version of established truth. This is used as a reference point, and any deviation from this reduces the accuracy of data items. This real-world reference can be based on various business requirements. Any data item that accurately reflects the characteristics displayed by real-world objects is credited to be accurate.
Data must align with a preconceived pattern in the majority of scenarios. If we look at a collection of birth dates, for example, we will see that in the US the date format followed is MM/DD/YYYY. Whereas in the rest of the world, the usage format is DD/MM/YYYY. Any fluctuation in this consistency will make the entire dataset subject to invalidity.
What are the use cases for alternative data?
ESG Risk Management
ESG performance analysis assigns investment portfolios with an ESG score. Higher scores indicate a higher level of corporate activism and sustainable operations. A financial entity’s market volatility (beta) is often linked to its ESG performance. Increasingly, corporate and private investors are using ESG consensus scores to calculate the risk of an investment based on the likelihood of the entity in question breaching ESG regulations, or having global sanctions imposed on it.
Alternative data for investment purposes
Geographic Revenue Exposure
Geographic revenue exposure is the process of displaying a company’s revenue according to its geographical attributes. In other words, it tracks how the revenue of the company will be affected by geopolitical and macroeconomic factors in a specific country or region. Data points from across alternative datasets can be used to calculate the company’s revenue and identify any trends or risks in the given geographical area.
Alternative data is used to enhance a financier’s equity research procedures and proprietary stock market datasets. The unique insights captured in an alternative dataset allow researchers to build more accurate predictive models and conduct more detailed analytics into a stock or company.
Real Estate Investment
Alternative data is being used more and more often for real estate investment and property valuation. Alternative data providers can offer information about building permits, solar permits, construction starts, and public contracts. This helps property investors spot opportunities which wouldn’t be obvious if they’re analyzing conventional data alone.
An explosion in the natural gas hub of Austria in 2017 impacted the availability and supply of gas all across the EU. This created huge uncertainty in the Futures, with prices skyrocketing.
Dataminr revealed that the clients already had an insight into this foreseeable event. They had every chance to take action before the market moved. The company received over half a billion dollars in funding in 9 rounds. The series E funding itself raised $391 million for the company.
One of the biggest issues in regulated institutions is the possibility of insider trading, or dumping of shares before any major events. This is very hard to prevent because communications between traders happen over calls, and they are extremely hard to monitor.
Digital Monitoring is a company that is powered by AI. They can process language and deliver human-centric insights that can detect risk. This product monitors a large amount of data (which is majorly unstructured) and then uses AI to transform it into intelligible data points, so institutions can take informed action.
Traditional tools can be very unpredictable when it comes to determining how the financial market will react to certain current events. The amount of data that has to be taken into consideration is huge. This is where artificial intelligence has come to the rescue of data providers, and buyers.
Kensho is a company that combines artificial intelligence, natural language processing, and GUIs. They used secure cloud computing models that provide a lot of tools that can process this amount of data.
Kensho is capable of analyzing millions of data points by scanning hundreds and thousands of customized variables. This could be linked to economic reports, earning releases or company product launches. The product performs very similar to search engines like Google, except it is specialized for financial analysis. And it allows traders to ask questions in English.
Humans can understand the difference when they see images before and after a natural calamity has ravaged a place. But it is simply impossible for them to process millions of such images to conclude promptly. But this kind of data set is plenty and readily available from satellite images, drones or other flying vehicles.
This is a lucrative source of information that financial institutions can use to forecast future events, and turn trades into profits.
Orbital is one such company that makes this possible. Orbital makes it possible to source, process and analyze satellite and geo-spatial images and other data. Then various government agencies and businesses can use the processed information and take actions based on that. This product can be used in other sectors like energy, agriculture, etc.
Alternative credit scoring
Banks and fintech companies often face a difficult situation where they do not have a lot of credit information on small business or low-income households.
Without the availability of traditional credit scores and history, they file off these cases in the rejection pile. Or these people get charged a very high-interest rate that is much higher compared to what people with good credit ratings can avail. However, this is no definite proof that companies or people will default on loans, or they will have arrears if they have a poor credit history.
Aire is a start-up that started in London in 2014, that created a credit score for such people with the help of alternative data. This can help small businesses and individuals qualify for credit. The company employs an algorithmic formula to generate a score, based on the character and capacity of the candidate through machine learning. This imitates human intelligence. Whenever candidates do not have enough insight that can generate a credit score, the API of Aire starts working.
It is integrated into various lending platforms, and it creates virtual interviews with the help of financial maturity, career, potential or lifestyle of the applicant.
What are the best practices for alternative data use?
Assessing the value of alternative data sets
Quality checks are important when procurement teams are collecting the data. This could be anything from verification of the backtesting models, to the underlying code.
Internal users should also be consulted on the potential value of the data. Although this is a time-consuming task, some tools can help with this and should be treated with high importance.
Receiving alternative data or non-traditional data in a standard format remains one of the highest priorities of investment managers. This could be accomplished with various industry-accepted integrations that can send data from one platform to another. The same mechanisms or integrations can be used to get earnings estimates, rating information, and market prices.
The data teams at various institutions are employing the use of analytics languages like Python or R. there are various tools like Excel, Tableau or Matlab that can be used to create a consumption experience and make the life of an analyst much easier. In the end, the goal of using integration methodologies is to make data transmission, and handling easier.
Ensuring data quality
The quality of a data set speaks to how accurate and complete the data is. How timely it is. This will determine whether it fits the requirement for an intelligence job. The data owner should be capable of collecting data from the source continuously. Enough measures should be taken to ensure the physical collection and delivery of the data.
The data collection practices of the source should also be monitored closely. This could validate the entire data collection process by enabling mechanisms such as employing time-stamps. It should also be taken into consideration that selection bias is removed to ensure data quality.
Dealing with inexperienced data sources
Often, the issue with the data set is that it comes from inexperienced sources. People use various data sources to monetize it when they don’t even know how valuable the data is and how it can be used.
Many kits and guidelines can help understand the mechanics of commercializing data sets. People are becoming aware of this rapidly. Even 5 years back, there was not much help in this sector, but that scenario is changing.
Sell-side and other third-party integrators
Firms want to speak to the data sources directly. Being in direct contact with the data source gives them leverage over their competitors. Third-party integrators can do a great deal in identifying and delivering credible data sets.
However, the value of this data can be greatly reduced if everyone can access this. People who are seeking to finish distribution deals might want to lock in on exclusive, one on one deals with the sources.
Securing reliable data is hard work
Processing the data into a format that can be consumed by all is not a simple task. There need to be additional sources to confirm the validity of many of the data sets. And whereas some of them need to be anonymized before they can be sent for consumption by investment managers – this makes the process pretty unverifiable. Suppliers need to understand what quality of data they are receiving if they are to have a transparent relationship with the vendors.
How is alternative data priced?
There are two categories of buyers of alternative data.
Portfolio managers reap a tiny fraction of the alpha, from various data sets. This is the process through which they can make diverse strategies, aiming at making a lot of small bets.
The second category is investment managers who reap huge sections of the alpha from a small number of data sets. These funds tend to have concentrated portfolios, which means they are aiming to make large, high conviction bets.
Alternative data price largely depends on the source, type of information, a plethora of data set, the type of data, and the firm itself.
The budget for alternative data is on the rise. In 2018, nearly 2/3rd of the market had zero or near-zero budget for alternative data. However, there are more buyers in the market, and there has been an 8.8% surge in demand. A quarter of these buyers had a budget just above $1 million in 2018, however, in 2019, this number became 53%.
The potential value hidden within alternative data is practically incalculable, and with so much data changing every day, the potential for an alternative data source might become a goldmine could occur at any time.
If you want to stay ahead of the game in the stock market, check out the top alternative data providers on Datarade to start your hunt for the right data for you.
Who are the best Alternative Data providers?
Finding the right Alternative Data provider for you really depends on your unique use case and data requirements, including budget and geographical coverage. Popular Alternative Data providers that you might want to buy Alternative Data from are FinPricing, Gravy Analytics, Alqami, Sensefolio, and QueXopa.
Where can I buy Alternative Data?
Data providers and vendors listed on Datarade sell Alternative Data products and samples. Popular Alternative Data products and datasets available on our platform are Sentifi ESG Alternative Data based Intelligence Analytics (45k global equities) by Sentifi, FACTSET Alternative Data (Global Coverage) - Novel ESG, Market & Business Intelligence by FACTSET, and Alqami Human Mobility Data Global | location data, map data, clustering or contextualisation by Alqami.
How can I get Alternative Data?
You can get Alternative Data via a range of delivery methods - the right one for you depends on your use case. For example, historical Alternative Data is usually available to download in bulk and delivered using an S3 bucket. On the other hand, if your use case is time-critical, you can buy real-time Alternative Data APIs, feeds and streams to download the most up-to-date intelligence.
What are similar data types to Alternative Data?
Alternative Data is similar to Stock Market Data, ESG Data, Merger & Acquisition Data, Proprietary Market Data, and Commodity Data. These data categories are commonly used for Hedge Funds and Alternative Data analytics.