Skip to main content

The Transformative Power of Alternative Data To Source Untapped Alpha

The transformation underway in investment decision-making is being led by the quest for untapped alpha in new data sets that are being generated every second, en masse. As the volume and frequency of alternative data grows and machine learning (ML) technologies to assess alternative data becomes more mature, this transformation will only accelerate.

Traditionally, systematic funds relied on technical data to gain an information edge whilst discretionary funds used fundamental data. Today, these distinctions are fast disappearing as both fund types blend multiple data sets including alternative data like asset sentiment in social media, news and blogs to make investment decisions.

Do all social media and news sentiment detection efforts produce reliable investment signals? No, they do not. Factors that are important in generating quality investment signals includes evaluating the source of an event report to validate credibility, evaluating event reports related to an asset for materiality, comparing intraday sentiment to historical levels at the asset and asset/event level and relating sentiment to signals from more traditional data sets to make alpha generating decisions. This is where machine learning technologies can help to make these assessments in real time. Randomly listening to anyone who may be tweeting about a stock, for example, and merely looking for positive or negative words to detect sentiment within a message is unlikely to produce reliable investment signals.

Sophisticated sentiment analytics do not simply capture emotions associated with an event report for an asset. Rather, these analytics help overcome alpha decay problems in traditional data sets by evaluating the shift in perception around the risk levels associated with an asset, country or region, the degree of abnormal shifts in attention, and sentiment towards an asset and enable benchmarking real time. Real time data is critical as it improves the speed and accuracy of investment decisions.

Traditional data sets can have time lags and lower frequency levels, whilst sentiment in social media around an asset is being generated and accessible real time. Incorporating these analytics in allocation machine learning models can surface optimal allocation levels to deliver above benchmark returns.

A clear example of this is seen in Illustration 1. Sentiment scores, attention and attention buzz (above normal sentiment levels) along with derivative sentiment analytics (e.g. attention weighted sentiment score) for constituents of the S&P500 were aggregated along with S&P 500 prices. A machine learning model was then trained to optimise allocation levels to the S&P500 with those alternative and traditional data sets.

Over 480 rule-based strategies were created with corresponding back tests. 80% of these strategies outperformed the S&P500 with the best strategy (rebalancing every two weeks within 3 days of the Sentifi S&P 500 sentiment signal) outperforming the S&P500 with cumulative returns extending to 104.30% for this strategy vs cumulative returns of 38.17% over the same time period for the S&P500 (excludes transaction costs).

Illustration 1: Outperformance of the benchmark with Sentifi S&P500 sentiment data (26-Feb-2017 to 28-Oct-2020)


Source: Sentifi. Sentifi data goes back to February 2015. The start date of 26 February 2017 represents two years of training the machine learning model on Sentifi data before the backtest begins.

Whilst fundamental data in earnings reports continues to be an important factor in investment decision making, alternative data highlights when there is a change in the perception of an asset due to evolving market conditions. For example, material sourcing is traditionally not considered to be a common factor which impacts stock prices in the healthcare sector. However, during Covid, as vaccine development becomes the priority, social media captured the changing perception of a healthcare stock due to a company’s perceived ability to develop a Covid vaccine quickly and ability to source materials for that vaccine development.

Equally, whilst economic data can be reliable during times of economic stability to assess growth, employment and prices, during a global pandemic, we have seen that higher frequency insights in social media, news and blogs around country-level Covid restrictions surface the anticipated impact to sectors and companies to close the time lag gap in traditional data sets.

Throughout history, sentiment has been a driver in evaluating asset valuation. However, as the proliferation of data in social media, news and blogs grows, tapping alpha means capturing momentum shifts in this medium.


Marina Goche is CEO at Sentifi


The views expressed in this article are those of the author and do not necessarily reflect the views of AlphaWeek or its publisher, The Sortino Group

Content role

© The Sortino Group Ltd

All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or scanning or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency or other Reprographic Rights Organisation, without the written permission of the publisher. For more information about reprints from AlphaWeek, click here.