Skip to main content

Newsdai model

Can news trending foreshadow market impact and returns?

TL;DR: I developed a simple model of finding and measuring rate of propagation of financial memes in news. I also tested correlation between number of news trending up with a stock return momentum continuing, especially if it's a negative return. I called this model - Newsd.ai!

Scenario

Here is a practical example, during and in the immediate aftermath of sudden market moves, we need to distinguish between the following scenarios resulting from breaking news coming about the company or instrument:
  • price move is instant and atomic, leaving no time to react
  • price move becomes a trend, as story keeps developing


Newsdai model breaks stories into 2 groups. Most of the stories are restatement of the event post-factum or trivial statement of facts that nonetheless leads to a singular "atomic" price move that creates a jump in return. This category leaves no time to form an actionable strategy or trade. If the event happen abruptly and became widely published, it may be too late to take action on it, or the signal could be too weak to bother. 

A lot more interesting is second category where the story keeps developing. The timing could be just right for a profitable trade. Newsdai model identifies those story threads by extracting keywords belonging to an event category from the thread and keeping track of keyword frequency, i.e. term frequency.

The escalation of the event is determined by estimating the rate of propagation of established keywords in the media and its effects. In essence Newsdai measures rate of propagation of financial memes that are most likely to create a significant market impact on the underlying mentioned securities.

As an example let's look at housing crisis and find what were the market impacting keywords that were trending 6-12 months before the crisis hit.

The plot below show most trending keywords with horizontal axis in dates and vertical axis is term frequency of keywords found summarily in published on the internet news.

For perspective, the term frequencies are plotted against scaled Standard and Poor index.

The keywords trending in months before 2008 crisis (Sept 18 Lehman bankruptcy happened) were:
  • crunch - which expands to "credit crunch"
  • write-down - has to do with banks writing down their investments due to bad financial decisions
  • turmoil - expands to "market turmoil"
  • subprime - expands to credit subprime

In the graph below I fit term frequency curve with a simple polynomial to check where fitted curve crosses thresholds set at multiples of standard deviation from the noise. When the curve is crossing the high water mark of the threshold that's an indication of a danger territory.


From the graph the threshold have been breached in Feb 2008 more than 6 months before the actual crisis.

Of course those examples are unrealistic in terms of time frame, almost no one forms a market view or enters a trade months before the event.

In order to test Newsdai meme propagation rate technique I created a simple test.

NewsdAI news-escalation backtesting

In this backtest, I looked at available corpus of news stories linked reliably to the financial instruments, most of them are US company stock.

To test my hypothesis that the NewsdAI model that identifies trending stories can also forecast trend in the stock price I developed a following backtesting strategy. I was looking at the opposite of "No news is good news" event. I scanned a corpus for a news story about a company followed by significant market move, followed by one or more stories within a week after the market move happened. That would be an indication that the story keeps developing and the initial market move would continue in the same direction.

There were tree types of backtesting strategies: trend, refined_trend and null.

Trend Strategy

  1. Identify all the significant market moves that exceeds 5% preceding by a large (>1000 words) story within previous 10 business days.
  2. If there is a follow up to the story (>1000) words, NEXT business day enter a position in the company in the same direction as proceeding market move. If the stock price went up, long the stock at the closing price NEXT business day after the news, if down, short it at the NEXT day closing price.
  3. Liquidate the position after 10 business days.
This "play" strategy could be further refined by filtering into subcategories by requiring certain keywords that would represent this category to be present in the body of the story or in the headline.

Refined Trend Strategy

  • Find a list of all the words that create significant market impact of more than 2.5%, start with tokenized and stemmed list of subject, verb and object combinations. Aggregate impact grouped by the keyword and take 1500 top (positive) and 1500 bottom (negative) keywords.
  • Clean the list, there will be around 2000 keywords remaining.
  • Repeat steps 1 and 2 in above strategy but only enter a trade if either of preceding or following market impact stories contained a match for a keyword.
  • Liquidate similarly after 10 business days after the trade.

Null Strategy

To test against the null hypothesis, I also considered the same strategy without any news input:
  • Identify all the significant market moves that exceeds 5%
  • If the stock price went up, long the stock at the closing price NEXT business day after the market move, if down, short it at the NEXT day closing price.
  • Liquidate the position after 10 business days.
Below is the table with results:

Name

Return

sample size

sharpe

trend

1.6%

1680

0.05

refined_trend

6.5%    

52

1.3

null

-1.2%

33512

-0.01



Notice that the third “null” strategy have negative total return, which indicates that general tendency of the market is to recover after large market moves in what is called reversal pattern.


In first two trend detection and refined trend strategy the return is positive and reversal becomes momentum based pattern. Still in terms of significance the total return still remains practically zero. 


Only after refining the strategy with matching for keywords: “approv”;”licens";"propos";"clearanc";"approv" that the return gets anywhere close to being practical. 


The keywords being chosen as an example. They are usually indication of some sort of deal between company and a vendor or a regulatory body. Those keywords belong to the classification category the NewsdAI technology assigns dynamically.

Technology stack

The technology I have used in order to extract keywords are mostly based around python libraries.
Initial prototype is based on 
  • News database with the  news scraped off the internet with assigned company codes
  • Front end with search like interface
    • kx based web sockets
    • bokeh server with embedPy
  • kdb database with prices and volumes
  • python based NLP
  • Simple pluggable strategies as python scripts
  • Machine learning starting with simple caching


Conclusion


Newsdai model of detecting and measuring escalation of news trend correlates positively with the market return developing a trend within time horizon of 10 business days. Narrowing the strategy with additional filters on keywords one can increase the signal and start getting noticeable returns.


Next Steps

Possible next steps to improve quality of the model are:
  • Pluggable corpus specific to a certain sector or industry
  • Consider intraday effects by analyzing 1-min bar price/volume data.
  • Consider negative/positive sentiment of the news

Comments

Popular posts from this blog

PnL.ai Intro

Part of my covid project and part of my long obsession with prediction markets, I have created a web page that displays and allows to compare best and worst performing trading strategies. TL;DR: best stocks + best strategies -> the list of top and bottom performing trading algorithms.  Product Typically, trading newsletters and stock-scanners display only price return for top market gainers and losers. I have forever been interested in inspecting top and bottom performing trading strategies for a given set of securities and could not find any websites that do that. So, I decided to create a tool of my own. I wanted the tool that would help me to answer questions like if there is a better strategy than buy and hold, should I follow greed and fear indicator of the market or do the opposite. Top and bottom performing securities do not tell you if a stock is going to go up or down, but they do alert you to rapidly changing market conditions, such as change in the competitive landscape,

Can Crypto Find a Purpose? A Blockchain Approach to Optimizing Neural Networks

Authors: Igor Arsenin and Arturas Vaitaitis Can Crypto Find a Purpose? A Blockchain Approach to Optimizing Neural Networks As training ever-larger transformer-based models encounters diminishing returns, a novel blockchain protocol could advance AI by emphasizing the optimization of neural network architectures, harnessing the decentralized computational power of blockchain technology. The innovative protocol would replace arbitrary decryption tasks in the proof-of-work concept with a focus on enhancing benchmark scores of AI models on standardized datasets, utilizing interfaces like the Open Neural Network Exchange (ONNX) protocol to define architectures. The economic potential of blockchain technology could draw a diverse range of players into the field, sparking a competitive drive for the development of more efficient and effective neural networks, potentially giving blockchain a purpose beyond digital currency while democratizing the field of AI. The remarkable progress in large l

Chronicles of Alma Mater in April and Other Phystech Stories

"I am a student, I am glad I am a student. Only two month ago I was a schoolboy, Mathematics and Physics were always of interest to me.That's why I am here. " From Phystech 1 year English textbook. As a result of a natural process of clutter accumulation in my brain my stories are loosing colors and details and some of them disappeared all together. The only way to preserve some of those silly, yet dear memories is to put them in writing. First few big words: Back in the old country, the proud name of Phystech had stood for the Excellency in physics and math. It shaped the minds of several generations of Russian scientists. In simple words, our school kicked ass. This Excellency, as all good things in life, did not come for free. It was not the difficult exams or the rats in our dorm showers or god-awful food in the campus cafeteria. We had very few women. They say that a sum of looks and smarts form a constant. In my year among 90 or so, borderline genius guys