One Unified Global Perspective
Communications with a Global Perspective
Home
Intro
Contact Us
Voice over IP
PBX Solutions
Services
Support
Glossary
Open Source
Blog
Forum

WebMail





2008 Jun 16 - Mon

Keyword Matching (non-text streams)

In a previous blog article, I presented the beginnings of a Keyword Lookup class. This article takes that work, turns it into a template and makes it useful for longest match lookups.

I'll have to compare this lookup library with what a C++ map or unordered_map does for lookup speed. I'm hoping that when a comparison is made, that this routine is indeed faster. I'm thinking it might be because, even though a map will do a binary search through it's map, complete strings are compared at each step. With this library, keyward patterns are added to a rooted tree of characters, possibly reducing the amount of time spent mactching.

I must admit that at each match step, there is a linear search performed through a list of likely character candidates. To imporve the search, I've been thinking that once the pattern tree has been created, it could be sorted so that each step search can be done with a binary search. This will be something for next time.

In the meantime, this routine does work for finding maximum matches. For example when performing a long distance rate lookup, this will find the most specific rate code from a list of various length candidates, something which is difficult to do with a map.

This keyword match algorithm is template based. 'class T' is the type to be returned upon a successful match. It can be an index, a pointer, a number, or anything else suitable. The class constructor requires an initializer... basically a value to be given the equivalent meaning of NULL. On no match, this value is returned. Use 'AddPattern' to add patterns and their associated 'meanings'. Use FindMatch to perform the lookup.

// this is kind of a subset of Aho Corasick algorithm
// only full keyword matching, no text searches
// no on failure coding

#ifndef CKEYWORDMATCH_H_
#define CKEYWORDMATCH_H_

#include <string>
#include <vector>
#include <stdexcept>
#include <iostream>

template<class T> class CKeyWordMatch {
public:
  explicit CKeyWordMatch<T>( T initializer, size_t size );
  virtual ~CKeyWordMatch(void);
  void ClearPatterns( void );
  void AddPattern( const std::string &sPattern, T object );
  T FindMatch( const std::string &sMatch );
  size_t size( void ) { return m_vNodes.size(); };
protected:
    T m_Initializer;
  struct structNode {
    size_t ixLinkToNextLevel;  // next letter of same word
    size_t ixLinkAtSameLevel;  // look for other letters at same location
    T object;  // upon match, (returned when keyword found)
    char chLetter;  // the letter at this node
    explicit structNode( T initializer ) : ixLinkToNextLevel( 0 ), ixLinkAtSameLevel( 0 ), 
      object( initializer ), chLetter( 0 ) {};
  };
  std::vector<structNode> m_vNodes;
private:
};

template<class T> CKeyWordMatch<T>::CKeyWordMatch( T initializer, size_t size )
: m_Initializer( initializer )
{
  m_vNodes.reserve( size );
  ClearPatterns();
}

template<class T> CKeyWordMatch<T>::~CKeyWordMatch(void) {
  m_vNodes.clear();
}

template<class T> void CKeyWordMatch<T>::ClearPatterns() {
  m_vNodes.clear();
  structNode node( m_Initializer );
  m_vNodes.push_back( node ); // root node with nothing
}

template<class T> void CKeyWordMatch<T>::AddPattern( 
              const std::string &sPattern, T object ) {
  std::string::const_iterator iter = sPattern.begin(); 
  if ( sPattern.end() == iter ) {
    throw std::invalid_argument( "zero length pattern" );
  }
  size_t ixNode = 0;
  size_t ix;
  bool bDone = false;
  while ( !bDone ) {
    char ch = *iter;
    ix = m_vNodes[ ixNode ].ixLinkToNextLevel;
    if ( 0 == ix ) { // end of chain, so add letter
      structNode node( m_Initializer );
      node.chLetter = ch;
      m_vNodes.push_back( node );
      ix = m_vNodes.size() - 1;
      m_vNodes[ ixNode ].ixLinkToNextLevel = ix;
      ixNode = ix;
    }
    else { // find letter at this level
      bool bLevelDone = false;
      size_t ixLevel = ix;  // set from above
      while ( !bLevelDone ) {
        if ( ch == m_vNodes[ ixLevel ].chLetter ) { 
          // found matching character
          ixNode = ixLevel;
          bLevelDone = true;
        }
        else {
          // move onto next node at this level to find character
          size_t ixLinkAtNextSameLevel 
            = m_vNodes[ ixLevel ].ixLinkAtSameLevel;
          if ( 0 == ixLinkAtNextSameLevel ) {
            // add a new node at this level
            structNode node( m_Initializer );
            node.chLetter = ch;
            m_vNodes.push_back( node );
            ix = m_vNodes.size() - 1;
            m_vNodes[ ixLevel ].ixLinkAtSameLevel = ix;
            ixNode = ix;
            bLevelDone = true;
          }
          else {
            // check the new node, nothing to do here
            // check next in sequence
            ixLevel = ixLinkAtNextSameLevel;
          }
        }
      }
    }
    ++iter;
    if ( sPattern.end() == iter ) {
      if ( m_Initializer != m_vNodes[ ixNode ].object ) {
        throw std::domain_error( "Pattern already present" );
      }
      m_vNodes[ ixNode ].object = object;  // assign and finish
      bDone = true;
    }
  }
}

template<class T> T CKeyWordMatch<T>::FindMatch( const std::string &sPattern ) {
  // traverse structure looking for matches, object at longest match is returned
  std::string::const_iterator iter = sPattern.begin(); 
  if ( sPattern.end() == iter ) {
    throw std::runtime_error( "zero length pattern" );
  }
  T object = m_Initializer;
  size_t ixNode = 0;
  size_t ix;
  bool bDone = false;
  while ( !bDone ) {
    char ch = *iter;
    ix = m_vNodes[ ixNode ].ixLinkToNextLevel;
    if ( 0 == ix ) {
      bDone = true;  // no more matches to be found so exit
    }
    else {
      // compare characters at this level
      bool bLevelDone = false;
      size_t ixLevel = ix;  // set from above
      while ( !bLevelDone ) {
        if ( ch == m_vNodes[ ixLevel ].chLetter ) {
            if ( m_Initializer != m_vNodes[ ixLevel ].object )
                object = m_vNodes[ ixLevel ].object;
          ixNode = ixLevel;
          bLevelDone = true;
        }
        else {
          ixLevel = m_vNodes[ ixLevel ].ixLinkAtSameLevel;
          if ( 0 == ixLevel ) {  // no match so end
            bLevelDone = true;
            bDone = true;
          }
        }
      }
    }
    ++iter;
    if ( sPattern.end() == iter ) {
      bDone = true;
    }
  }
  return object;
}

#endif /*CKEYWORDMATCH_H_*/

[/OpenSource/Programming] permanent link


Mean Reversion Thoughts

While still putting together the code for a trading solution, I've been thinking about what algorithms to implement for a trading strategy. I have access to live intra-day tick and quote data, so mean-reversion aka contrarian strategies seem like interesting candidates.

In the course of manual trading, I've learned that one needs to keep track of a number of items: current portfolio costs, current holding costs, existing profit/losses, expected market direction, current market location, external influences. This is a lot to do manually. Hence the desire to implment tools to automate, or even semi-automate the process.

A paper by Subramanian Ramamoorthy called A strategy for stock trading based on multiple models and trading rules discusses a state space mechanism for determining how to manage the portfolio composition. Another item he brings to the foreground is a description of the Sharpe Ratio, a ratio which helps one to keep profit consistent rather than widely dynamic.

Using different terminology, the makers of NeoTicker have a blog with an article called Counter-Trend Trading with Simple Range Exhaustion System. The key point, which could be hard to do, is "most counter-trend traders will try to time their entries as close to the extreme reversal points as possible to maximize the profits and minimize the risk exposures". Using multiple time frame charts, and reading the tape, along with some possibly helpful technical analysis tools, it might be possible to home in on the zones of reversal.

Working my way into a little scalping in the futures, an older article at Interactive Brokers explains the birth of the Dow Mini Futures. Some interesting points:

  • "try to identify the leader in a group and how its price movement can help us predict movement in others in the group"
  • "we start to trade it by hand so we can get a better understanding of the nuances in that particular trade"
  • "We have a trader and a programmer trade together for a while and then we start the process of automation. We define our risk parameters and write the rules that we feel give us an opportunity to be profitable."
  • "In our back testing we saw that if we were patient it would be profitable for us. The hard part was learning to be patient because our other successful trades were very high frequency. In the mini-sized Dow we may be in and out of 5 to 10 trades in a less than minute."
  • hedge the mini dow with the underlying basket of stocks
  • "We don't have scalping targets. We generate a theoretical value and make markets based purely on that value If we our pricing is accurate and we should naturally be able to scalp."
  • "In the Dow because the bid-ask spread is so tight most of our profits are generated from trading."
  • "he dow has a much tighter spread compared to the mini-spu. Also it is much easier to watch the stocks in the underlying basket to ascertain their effect on the future."
  • "The Russell tends to be trendier than other indices."

[/Trading/AutomatedTrading] permanent link



Blog Content ©2012
Ray Burkholder
All Rights Reserved
ray@oneunified.net
(441) 500-7292
Available for Contract Work
Resume

RSS: Click to see the XML version of this web page.

twitter
View Ray 
Burkholder's profile on LinkedIn
technorati
Add to Technorati Favorites



June
Su Mo Tu We Th Fr Sa
16
         


Main Links:
Monitoring Server
SSH Tools
QuantDeveloper Code

Special Links:
Frink

Blog Links:
Sergey Solyanik
Marc Andreessen
HotGigs
Micro Persuasion
... Reasonable ...
Chris Donnan
BeyondVC
lifehacker
Trader Mike
Ticker Sense
HeadRush
TraderFeed
Stock Bandit
The Daily WTF
Guy Kawaski
J. Brant Arseneau
Steve Pavlina
Matt Cutts
Kevin Scaldeferri
Joel On Software
Quant Recruiter
Blosxom User Group
Wesner Moise
Julian Dunn
Steve Yegge
Max Dama

2008
Months
Jun




Mason HQ

Disclaimer: This site may include market analysis. All ideas, opinions, and/or forecasts, expressed or implied herein, are for informational purposes only and should not be construed as a recommendation to invest, trade, and/or speculate in the markets. Any investments, trades, and/or speculations made in light of the ideas, opinions, and/or forecasts, expressed or implied herein, are committed at your own risk, financial or otherwise.