One Unified Global Perspective
Communications with a Global Perspective
Home
Intro
Contact Us
Voice over IP
PBX Solutions
Services
Support
Glossary
Open Source
Blog
Forum

WebMail





2008 Jul 03 - Thu

Upgrade from Eclipse Europa to Ganymede (with painful Subversion)

Today, I upgraded from Eclipse/CDT Europa to Eclipse/CDT Ganymede. (CDT meaning C++ Developer Tools). The Eclipse upgrade was painless: download the Eclipse/CDT package, expand it, and start eclipse from within the directory. After pointing it to my workspace, everything was there. Nicely simple.

My subversion client was an entirely different story. For the Europa installation, everything came from the tigris site and worked well. For the Ganymede installation, there are now two sites involved, and I'm not sure which is what. I think the tigris site can now be ignored (for the time being). In the installation instructions somewhere, one needs to go to the Polarion site for their client. There are Eclipse update links there.

However, what I assumed to be workable defaults of using the JavaHL library on Debian turned out to be non-workable. The solution was not to use the JavaHL client but use the SVNKit client.

My key problem is that my SVN repository requries an ssh public/private key. The JavaHL library, if or when it would or wouldn't load, I'm not sure what I was seeing, but I could only see the option for user name and password authentication.

It would have been nice if the Subversion/Polarion/Eclipse guys would all get together and make it straight-forward in terms of which libraries from which sites need to be download. If they imply that the JavaHL libraries should be downloaded, please make it painless to get the paths set and ensure the binaries are present. After 15 million lines of code, you'd think that would be a small task to accomplish.

[/OpenSource/Programming] permanent link


2008 Jun 17 - Tue

Concurrency aka MultiThreading

A number of my projects are approaching the phase where some of their feature sets will work better with some form of background processing. In a trading application, plowing through historical data on thousands of symbols looking for patterns would be best left to a background task, rather than rendering the user-interface frozen during the, well, duration. For Radius, with a listener on an accounting port, and one on an authorization port, the two threads need to coordinate access to resources.

Windows has a native threads API, but that isn't necessarily portable between operating systems. My trading application is in Windows, and the Radius application in on Linux. Using the same API on both platforms wuold be cool.

As I already use the Boost tools in their various forms, Boost::Thread would be a good candidate. The Boost documentation isn't exactly overflowing with examples. Dr. Dobb's Portal saves the day with an article dating from May 2002 entitled The Boost.Threads Library. It has good examples covering the basics:

  • Thread Creation (boost::thread)
  • Mutexes (boost::mutex), which protects one thread from another
  • Condition Variables (boost::condition), which are good for getting data into and out of a thread
  • Thread Local Storage (boost::thread_specific_ptr), for keeping thread specific storage separate from other threads
  • Once Routines (boost::call_once), for making sure statics are initialized once and only once

For mutexes, protecting code regions can be as easy as declaring a mutex:

boost::mutex CodeProtectionMutex;

And then putting a scoped lock in the code encountered by multiple threads:

{ // some scope some where
  // ... some code
boost::mutex::scoped_lock lock(CodeProtectionMutex);
  // .. some more code, which is protected by the lock
} // the scope exit, no unlock is required as the destructor does the work

After reviewing the examples, making use of the Boost documentation should be an easier task. As such, boost::thread documentation should be reviewed anywa as boost::thread has gone through some changes since that article.

Paul Bridger has also written a tutorial on multithreading making use of the boost::thread class. The navigation through the tutorial isn't the greatest, but the content is good.

Making use of boost::thread as a base, Philipp Henkel has written threadpool. It has been brought up to date for use with boost v1.35. It provides a dead easy solution to making use of a limited number of worker threads to carry out tasks:

    pool tp(2);  //create a 2 thread pool
    
    // Add some tasks to the pool.
    tp.schedule(&first_task);
    tp.schedule(&second_task);
    tp.schedule(&third_task);  // this task waits until of the other two completes

In the similar vein to Henkel, Ted Yuan has a boost::thread based C++ Producer-Consumer Concurrency Template Library.

Zoltán Porkoláb has written an article on Distributed Programming and Metaprogramming in C++. It has many examples and goes into some additional examples for boost::thread. His article also introduces bind and tuples, which are good backgrounds to boost::lambda.

Back to boost for a second. You can't find it from the boost home page, but here is a good link to Boost Libraries Listed Alphabetically.

boost::thread is a basic threading library. Going above and beyond multi-threading grunt work, Intel's Thread Building Blocks has higher level constructs for getting multiple threads going. For example, it has a 'for' construct for simultaneously executing multiple elements of the for statement. Good tutorials and background information can be read through Kevin Farnham's Blog.

Building further on the Threading Building Blocks is something of simulating interest: go parallel looks to be chock full of content related to multi-core and multi-threaded programming.

From the theoretical perspective, I came across an HP paper called Foundations of the C++ Concurrency Memory Model, and written by Hans-J. Boehm and Sarita V. Adve. I havn't read it all the way through, but at some time, I think the bibiliography may be a worthy read in itself.

[/OpenSource/Programming] permanent link


2008 Jun 16 - Mon

Keyword Matching (non-text streams)

In a previous blog article, I presented the beginnings of a Keyword Lookup class. This article takes that work, turns it into a template and makes it useful for longest match lookups.

I'll have to compare this lookup library with what a C++ map or unordered_map does for lookup speed. I'm hoping that when a comparison is made, that this routine is indeed faster. I'm thinking it might be because, even though a map will do a binary search through it's map, complete strings are compared at each step. With this library, keyward patterns are added to a rooted tree of characters, possibly reducing the amount of time spent mactching.

I must admit that at each match step, there is a linear search performed through a list of likely character candidates. To imporve the search, I've been thinking that once the pattern tree has been created, it could be sorted so that each step search can be done with a binary search. This will be something for next time.

In the meantime, this routine does work for finding maximum matches. For example when performing a long distance rate lookup, this will find the most specific rate code from a list of various length candidates, something which is difficult to do with a map.

This keyword match algorithm is template based. 'class T' is the type to be returned upon a successful match. It can be an index, a pointer, a number, or anything else suitable. The class constructor requires an initializer... basically a value to be given the equivalent meaning of NULL. On no match, this value is returned. Use 'AddPattern' to add patterns and their associated 'meanings'. Use FindMatch to perform the lookup.

// this is kind of a subset of Aho Corasick algorithm
// only full keyword matching, no text searches
// no on failure coding

#ifndef CKEYWORDMATCH_H_
#define CKEYWORDMATCH_H_

#include <string>
#include <vector>
#include <stdexcept>
#include <iostream>

template<class T> class CKeyWordMatch {
public:
  explicit CKeyWordMatch<T>( T initializer, size_t size );
  virtual ~CKeyWordMatch(void);
  void ClearPatterns( void );
  void AddPattern( const std::string &sPattern, T object );
  T FindMatch( const std::string &sMatch );
  size_t size( void ) { return m_vNodes.size(); };
protected:
    T m_Initializer;
  struct structNode {
    size_t ixLinkToNextLevel;  // next letter of same word
    size_t ixLinkAtSameLevel;  // look for other letters at same location
    T object;  // upon match, (returned when keyword found)
    char chLetter;  // the letter at this node
    explicit structNode( T initializer ) : ixLinkToNextLevel( 0 ), ixLinkAtSameLevel( 0 ), 
      object( initializer ), chLetter( 0 ) {};
  };
  std::vector<structNode> m_vNodes;
private:
};

template<class T> CKeyWordMatch<T>::CKeyWordMatch( T initializer, size_t size )
: m_Initializer( initializer )
{
  m_vNodes.reserve( size );
  ClearPatterns();
}

template<class T> CKeyWordMatch<T>::~CKeyWordMatch(void) {
  m_vNodes.clear();
}

template<class T> void CKeyWordMatch<T>::ClearPatterns() {
  m_vNodes.clear();
  structNode node( m_Initializer );
  m_vNodes.push_back( node ); // root node with nothing
}

template<class T> void CKeyWordMatch<T>::AddPattern( 
              const std::string &sPattern, T object ) {
  std::string::const_iterator iter = sPattern.begin(); 
  if ( sPattern.end() == iter ) {
    throw std::invalid_argument( "zero length pattern" );
  }
  size_t ixNode = 0;
  size_t ix;
  bool bDone = false;
  while ( !bDone ) {
    char ch = *iter;
    ix = m_vNodes[ ixNode ].ixLinkToNextLevel;
    if ( 0 == ix ) { // end of chain, so add letter
      structNode node( m_Initializer );
      node.chLetter = ch;
      m_vNodes.push_back( node );
      ix = m_vNodes.size() - 1;
      m_vNodes[ ixNode ].ixLinkToNextLevel = ix;
      ixNode = ix;
    }
    else { // find letter at this level
      bool bLevelDone = false;
      size_t ixLevel = ix;  // set from above
      while ( !bLevelDone ) {
        if ( ch == m_vNodes[ ixLevel ].chLetter ) { 
          // found matching character
          ixNode = ixLevel;
          bLevelDone = true;
        }
        else {
          // move onto next node at this level to find character
          size_t ixLinkAtNextSameLevel 
            = m_vNodes[ ixLevel ].ixLinkAtSameLevel;
          if ( 0 == ixLinkAtNextSameLevel ) {
            // add a new node at this level
            structNode node( m_Initializer );
            node.chLetter = ch;
            m_vNodes.push_back( node );
            ix = m_vNodes.size() - 1;
            m_vNodes[ ixLevel ].ixLinkAtSameLevel = ix;
            ixNode = ix;
            bLevelDone = true;
          }
          else {
            // check the new node, nothing to do here
            // check next in sequence
            ixLevel = ixLinkAtNextSameLevel;
          }
        }
      }
    }
    ++iter;
    if ( sPattern.end() == iter ) {
      if ( m_Initializer != m_vNodes[ ixNode ].object ) {
        throw std::domain_error( "Pattern already present" );
      }
      m_vNodes[ ixNode ].object = object;  // assign and finish
      bDone = true;
    }
  }
}

template<class T> T CKeyWordMatch<T>::FindMatch( const std::string &sPattern ) {
  // traverse structure looking for matches, object at longest match is returned
  std::string::const_iterator iter = sPattern.begin(); 
  if ( sPattern.end() == iter ) {
    throw std::runtime_error( "zero length pattern" );
  }
  T object = m_Initializer;
  size_t ixNode = 0;
  size_t ix;
  bool bDone = false;
  while ( !bDone ) {
    char ch = *iter;
    ix = m_vNodes[ ixNode ].ixLinkToNextLevel;
    if ( 0 == ix ) {
      bDone = true;  // no more matches to be found so exit
    }
    else {
      // compare characters at this level
      bool bLevelDone = false;
      size_t ixLevel = ix;  // set from above
      while ( !bLevelDone ) {
        if ( ch == m_vNodes[ ixLevel ].chLetter ) {
            if ( m_Initializer != m_vNodes[ ixLevel ].object )
                object = m_vNodes[ ixLevel ].object;
          ixNode = ixLevel;
          bLevelDone = true;
        }
        else {
          ixLevel = m_vNodes[ ixLevel ].ixLinkAtSameLevel;
          if ( 0 == ixLevel ) {  // no match so end
            bLevelDone = true;
            bDone = true;
          }
        }
      }
    }
    ++iter;
    if ( sPattern.end() == iter ) {
      bDone = true;
    }
  }
  return object;
}

#endif /*CKEYWORDMATCH_H_*/

[/OpenSource/Programming] permanent link


2008 Jun 15 - Sun

Mean Reversion Thoughts

While still putting together the code for a trading solution, I've been thinking about what algorithms to implement for a trading strategy. I have access to live intra-day tick and quote data, so mean-reversion aka contrarian strategies seem like interesting candidates.

In the course of manual trading, I've learned that one needs to keep track of a number of items: current portfolio costs, current holding costs, existing profit/losses, expected market direction, current market location, external influences. This is a lot to do manually. Hence the desire to implment tools to automate, or even semi-automate the process.

A paper by Subramanian Ramamoorthy called A strategy for stock trading based on multiple models and trading rules discusses a state space mechanism for determining how to manage the portfolio composition. Another item he brings to the foreground is a description of the Sharpe Ratio, a ratio which helps one to keep profit consistent rather than widely dynamic.

Using different terminology, the makers of NeoTicker have a blog with an article called Counter-Trend Trading with Simple Range Exhaustion System. The key point, which could be hard to do, is "most counter-trend traders will try to time their entries as close to the extreme reversal points as possible to maximize the profits and minimize the risk exposures". Using multiple time frame charts, and reading the tape, along with some possibly helpful technical analysis tools, it might be possible to home in on the zones of reversal.

Working my way into a little scalping in the futures, an older article at Interactive Brokers explains the birth of the Dow Mini Futures. Some interesting points:

  • "try to identify the leader in a group and how its price movement can help us predict movement in others in the group"
  • "we start to trade it by hand so we can get a better understanding of the nuances in that particular trade"
  • "We have a trader and a programmer trade together for a while and then we start the process of automation. We define our risk parameters and write the rules that we feel give us an opportunity to be profitable."
  • "In our back testing we saw that if we were patient it would be profitable for us. The hard part was learning to be patient because our other successful trades were very high frequency. In the mini-sized Dow we may be in and out of 5 to 10 trades in a less than minute."
  • hedge the mini dow with the underlying basket of stocks
  • "We don't have scalping targets. We generate a theoretical value and make markets based purely on that value If we our pricing is accurate and we should naturally be able to scalp."
  • "In the Dow because the bid-ask spread is so tight most of our profits are generated from trading."
  • "he dow has a much tighter spread compared to the mini-spu. Also it is much easier to watch the stocks in the underlying basket to ascertain their effect on the future."
  • "The Russell tends to be trendier than other indices."

[/Trading/AutomatedTrading] permanent link


Adaptive Arrival Price

A keynote lecture at the April 7th Algorithmic Trading Conference in London was by Mr. Julian Lorenz of ETH Zurich. The abstract for his lecture reads as follows:

Electronic trading of equities and other securities makes heavy use of "arrival price" algorithms, that balance the market impact cost of rapid execution against the volatility risk of slow execution. In the standard formulation, mean-variance optimal trading strategies are static: they donot modify the execution speed in response to price motions observed during trading. We show that with a more realistic formulation of the mean-variance tradeoff, with no momentum or mean reversion in the price process, substantial improvements are possible by using dynamic trading strategies. We develop a technique for computing optimal dynamic strategies to any desired degree of precision. The asset price process is observed on a discrete tree with a arbitrary number of levels. We introduce a novel dynamic programming technique in which the control variables are not only the shares traded at each time step, but also the maximum expected cost for the remainder of the program; the value function is the variance ofthe remaining program. The resulting adaptive strategies are"aggressive-in-the-money": they accelerate the execution when the price moves in the trader's favor, spending parts of the trading gains to reduce risk. The improvement is larger for large initial positions.

I think I'll add 'arrival price algorithms' to my key word searches. The above extract was from a search on 'mean reversion trading system algorithms'.

[/Trading/AutomatedTrading] permanent link


2008 Jun 08 - Sun

Stocks & Commodities, 2008/06

In a recent issue of Technical Analysis of Stocks and Commodities, there was an interview with Tom Busby. A number of his comments struck home with some things I've learned. He also introduced a few more things about which I should think.

He noted that trading can be a twenty four hour operation. There is always some market open to trade. The world starts off with the Nikkei and the Hang Seng in the far east. In Europe, primary markets are CAS, FTSE, DAX and the Swiss. I'd say in today's market the IPE, with the Brent Crude Futures, is also important. Here in the west, we have the morning New York market and the afternoon California market.

Busby made mention that 'market open' is an important event. As such, it is important to know the time each of the markets open. I've been working on an algorithm that selects a series of instruments, selects a direction and lets the instruments run. I've been wondering what to set for an exit though. Busby, in the interview, suggests exiting once a third of ATR (Average True Range) has been reached. I'm not sure why he would use ATR (which accounts for any opening gap) rather than just the daily average range. Assuming one gets in sometime in the open, and exits by the end of the day (in order to eliminate what gaps in the wrong direction can do to one's portfolio), then using ATR doesn't seem quite right.

Anyway, To set the tone for a trading day, he suggests some benchmark indexes to be watched. Seven, which he calls the Seven Sisters are:

  • S&P
  • NASDAQ
  • Dow Jones Indexes
  • DAX
  • Crude Oil
  • Long Bonds
  • Gold

As for micro-signals, he uses three kinds, with each needing to be in the same direction:

  • Volume
  • Tick (gainers vs loser)
  • Trend

To finish things off, he suggests splitting an entry into three parts:

  • Tick Part: the trickiest part of the entry based upon the three variables above
  • Trade Part: with confidence building, try to make twice the reward vs risk
  • Trend Part: capture the full movement of the day

[/Trading/ReadingMaterial] permanent link


2008 Jun 06 - Fri

SmartQuant QuantDeveloper & DataCenter Release

SmartQuant has released a revision to DataCenter and QuantDeveloper. DataCenter and QuantDeveloper are at the following revision levels:

DataCenter
Version 3.0.2 (06-Jun-2008) 

QuantDeveloper Enterprise Edition
Version 3.0.2 (06-Jun-2008)  

QuantDeveloper Source Code
Version 3.0.1 (21-Apr-2008) 
* Recent Versions available through 
  version control 

[/Trading/SmartQuant/Releases] permanent link


Wt, Some Build Modifications

Back on 2007/10/03, I wrote about installing Wt (a C++ library and application server for developing and deploying web applications) on a Debian server. I've revised things a little bit since thing while building Wt v2.1.3.

In this case, I build with the newly released version of the Boost libraries: 1.35. ASIO is now included in Boost, so some build steps can be removed.

Prerequisites are little changed but for a different library for gd:

apt-get install gcc
apt-get install zlib1g
apt-get install zlib1g-dev
apt-get install libbz2-dev
apt-get install libgd2-noxpm-dev
apt-get install cmake
apt-get install libfcgi-dev
apt-get install libapache2-mod-fastcgi
apt-get install libssl-dev

The web site and repository for Wt have changed, so CVS commands will be a bit different:

cvs -d :pserver:anonymous@cvs.webtoolkit.eu/opt/cvs login
cvs -z3 -d :pserver:anonymous@cvs.webtoolkit.eu/opt/cvs co wt

I've changed the cmake/build a little bit so the results go into /usr/local/wt/include and /usr/local/wt/lib:

cmake -D DEPLOYROOT=/var/www/wt -D WEBUSER=www-data -D WEBGROUP=www-data \
-D BOOST_DIR=/usr/local \
-D BOOST_COMPILER=gcc42 \
-D BOOST_VERSION=1_35 \
-D BOOST_INCLUDE_DIR=/usr/local/include/boost \
-D BOOST_LIB_DIR=/usr/local  \
-D BOOST_DT_LIB_MT=/usr/local/lib \
-D BOOST_DT_LIB=/usr/local/lib \
-D BOOST_FS_LIB=/usr/local/lib \
-D BOOST_FS_LIB_MT=/usr/local/lib \
-D BOOST_PO_LIB_MT=/usr/local/lib \
-D BOOST_REGEX_LIB_MT=/usr/local/lib \
-D BOOST_SIGNALS_LIB_MT=/usr/local/lib \
-D BOOST_THREAD_LIB=/usr/local/lib \
-D BOOST_ASIO_INCLUDE_DIR=/usr/local/include/boost  \
-D SHARED_LIBS=ON \
-D CONNECTOR_FCGI=OFF \
-D CONNECTOR_HTTP=ON \
-D EXAMPLES_CONNECTOR=wthttp \
-D WTHTTP_CONFIGURATION=/etc/wt/wthttpd \
-D CONFIGURATION=/etc/wt/wt_config.xml \
-D CMAKE_INSTALL_PREFIX=/usr/local/wt \
.

During make install, an error regarding CMakeFiles arises. The secret, that I know, is to remove the line which includes CmakeFiles in src/Ext/cmake_install.cmake, and restart 'make install'. The install should complete normally.

The library directory /usr/local/wt/lib will need to be added to /etc/ld.so.conf, and then run ldconfig to update things.

Remember to review the Ext widgets deployment page as there are some additional files to be downloaded and installed from Ext JS.

[/OpenSource/Debian/Development] permanent link


2008 Jun 04 - Wed

PostgreSQL Upgrade 8.2 to 8.3

Back in Febrary, I wrote a longish article on how to upgrade PostgreSQL. That article is outdated. An upgrade can now take place with two lines:

pg_upgradecluster -v 8.3 8.2 main pg_dropcluster 8.2 main

The first copies the older version 8.2 files to the new 8.3 files directory. It does any modifications necessary. The second line then removes the old stuff.

[/OpenSource/Debian] permanent link


2008 Jun 03 - Tue

OpenSSH Issues

In light of the not so recent news regarding the vulnerability of openSSH in Debian, many systems have had to be patched and inter-machine keys changed.

Via Steven Rosenberg's Site I learn that a simple 'apt-get update && apt-get dist-upgrade' will update the necessary files on my system. Also in the blog entry is a reference to DRONEBL which is another black list site dealing with root compromised sites. A commenter posts the following interesting remarks about further protecting a server:

If you aren't running fail2ban or denyhosts, you should. Both will detect brute force attempts and deny connections from the attacker for a time. If you feel uncomfortable automatically banning hosts for failed logins, you can weakly configure whichever you choose to allow 20 or more failed attempts before banning. There's no reason any authenticated service should tolerate brute force attempts, in my humble opinion.

Finally, there are services, such as the DroneBL dnsbl, which have honeypot servers set up to detect brute force attempts and add them to a blacklist. You can use the "aclexec" directive in hosts.deny to query this blacklists before allowing clients to connect, to prevent connections from known brute force attackers. See http://headcandy.org/rojo/ for a suitable script to call via aclexec (view the source for the checkdnsbl script for usage instructions), and see the man page for hosts_options for more info.

Running 'ssh-vulnkey -a' showed that there were a couple keys that needed to be deleted and/or redone.

Debian has a WIKI with good information regarding the problem, affected programs, and utilities to help determine where the problems are.

If weak keys have been copied to other non-Debian hosts, the keys need to be removed from those hosts as well.

[/OpenSource/Debian] permanent link


2008 May 31 - Sat

Decision Trees, Automated Trading, Simulations, and Strategies

A paper called Stock Picking via Nonsymmetrically Pruned Binary Decision Trees by Anton V. Andriyashin discusses a method for picking stocks for inclusion in a portfolio. By integrating technical analysis with binary decision trees, the author indicates that "BNS clearly outperforms the traditional approach according to the backtesting results and the Diebold-Mariano test for statistical significance", where BNS is Best Node Strategy. David Aronson of Evidence Based Technical Analysis fame may call the use of some the technical indicators as 'so much snake oil', the paper, at its heart, does describe a methodology for selecting a potentially profitable portfolio if one can use alternate forms of trading signals.

Alternate forms of decision tree based automated trading can be found in two papers by German Creamer and Yoav Freund called Automated Trading with Boosting and Expert Weighting and A Boosting Approach for Automated Trading. These represent algorithms used in the Penn-Lehman Automated Trading Project. Anyway, the two papers get down and dirty with some of the indiators they use in their trading simulation. Their bibliography references a number of good sources of information.

In the PLAT paper, here are a few strategies worthy of further investigation:

  • Case-based reasoning applied to the parameters of the SOBI strategy (see text for SOBI description).
  • Predictive strategy using money ow (price movement times volume traded) as a trend indicator.
  • Market-maker that positions orders in front of the nth orders on both books.
  • Mixture of a Dynamically Adjusted Market-Maker which calibrates by recent volatility, and a trendbased predictive strategy.
  • Sells on rising prices, buys on falling prices.
  • Trades based on relative spreads in the buy and sell books, interpreting small standard deviation as a sign of codence.
  • Simple predictive strategy using total volumes in buy and sell books.

Peter Stone's group has done well with the PLAT simulations. His papers, with this one as a example, Two Stock-Trading Agents: Market Making and Technical Analysis have many good implentable ideas for an automated trading strategy. Outside of the world of finance, general algorithmic bidding and optimization strategies are described in The First International Trading Agent Competition: Autonomous Bidding Agents. Another interesting Peter Stone paper called Designing Safe, Profitable Automated Stock Trading Agents Using Evolutionary Algorithms They discuss the concept that common trading rules have weaknesses under various trading conditions. By identifying the conditions, and adaptively switching among rules, trading results can be improved. One more Peter Stone supported effort is the poster: Safe Strategies for Autonomous Financial Trading Agents: A Qualitative Multiple-Model Approach.

Through the use of evolutionary reinforcement on data to which us mere mortals have no access, M.A.H. Dempster has a number of related papers. The bibilographies may be good sources of further inspiration:

In a sort-of-related paper, Robert Almgren and Julian Lorenz provide an insight into Adaptive Arrival Price. A couple of extracts from their abstract:

  • Electronic trading of equities and other securities makes heavy use of .arrival price. algorithms, that determine optimal trade schedules by balancing the market impact cost of rapid execution against the volatility risk of slow execution.
  • We show that with a more realistic formulation of the mean-variance tradeoff, and even with no momentum or mean reversion in the price process, substantial improvements are possible for adaptive strategies that spend trading gains to reduce risk, by accelerating execution when the price moves in the trader.s favor.

Now for a really un-related paper: A market-induced mechanism for stock pinning. The authors suggest that some stock prices can be pinned at strike prices on option expiration dates. As various market participants cover their positions with options and the related underlying securities, some interesting market dynamics unfold.

[/Trading/ReadingMaterial] permanent link


2008 May 30 - Fri

The Joy of Volatility

I initially had this embedded in my follow on article, but I think the information in this paper bears further scrutiny and testing, in regards to what could be classified as what I think is called pairs trading. I guess the secret is in the selection of the pairs.

The paper is by Dempster/Evstigneev/Schenk-Hoppé, and called The Joy of Volatility. They take a coin flipping strategy to picking a couple of assets. They show that the volatility is a positive benefit to portfolio profitability in a dynamic rebalancing strategy versus a buy and hold mentality. A couple of key quotes though:

Poverty is the inevitable fate of the passive investor.

Consider making an investment according to a simple active management style: buying or selling assets so as to always maintain an equal investment in both. On average, wealth will double in 80 periods and grow without limits. This investment style rebalances wealth according to a constant proportions strategy. It succeeds, where buy-and-hold fails, because of the volatility of asset returns.

However, as with any investment advice, a word of caution is in order: Constant proportions strategies do well in the long term but, over short time horizons, their superior performance cannot be guaranteed!

[/Trading/ReadingMaterial] permanent link


2008 May 29 - Thu

Evaluating Inter-Process Communication Frameworks

I'm reposting some comments regarding IPC frameworks that I made to the Boost-Users listserve today. It is in response to someone making unsubstantiated remarks regarding the relative merits of ACE and Boost, and another looking for some substatiated remarks. What follows are some substantiated remarks, based upon my personal experience with it and several other libraries.

I've started working on a number of distributed system projects. As a consequence, I started looking for distributed system libraries. References to ACE were most pervasive. I implemented a number of trial applications with the library. That was after plowing through relevant sections in the three primary ACE reference books. That was a good learning experience, if only to find out the various patterns in distributed architecture definition. I had the inter-process/inter-server communications (which only sent simple stuff) working well within ACE's Acceptor/Connector framework. ACE has a number of other patterns one can use. I was really impressed with the fact that the examples I used from the books worked as advertised, and I was able to bend them to my will.

ACE is based upon an interaction of classes, macros, and templates. One has to spend some time with the environment in order to become proficient with it. It has a large API. A number of lower level API's upon which higher level API's are based. For example the Acceptor/Connector uses constructs described earlier in the books.

Once I had my basic communications going, I realized I needed to get some form concrete messaging infrastructure in place. I had an impression that TAO, which is a layer above ACE, would be quite extravagant to implement, with it being an implementation of the CORBA specification. I wanted something a little lighter (a whole lot lighter actually).

As I worked through that project, I started hearing about ASIO, indirectly through some other libraries I was using. ASIO is now a member of Boost. I read a review somewhere that ASIO is a 'modern' replacement for ACE. If you want to get into real template structures and Boost oriented philosophy, I'd say that is a valid statement. I'd also say that ASIO is 'more to the point' and straight forward than is ACE, at least for the things I want to accomplish. But like ACE, ASIO is the basic communications infrastructure, no real messaging capability, which is what distributed computing is all about. ASIO turned out to be a little harder to get my head wrapped around as it uses a number of advanced C++ and Boost related idioms. For a run-of-the-mill C++ programmer, ACE would be better. For someone steeped in the power and obscurity of advanced C++, and is looking to advance their skill set, ASIO would be better.

I came across RCF - Interprocess communication for C++, which is a messaging framework riding atop of ASIO. Flexible, lightweight, and to the point. I worked through the examples and things worked as advertised. It has the encryption, publisher/subscriber, oneway/twoway idioms, and a few other nifty features.

At the same time I was doing that, seemingly coincidently, I learned a few more interesting facts. Going into this, I realized that I need a message dispatcher/proxy, some decent failover techniques, and some additional event handling for non-IPC related activities.

Someone suggested ICE from www.zeroc.com for an RCF-like solution, but working to a larger scale. I've heard that the library's originator is someone who spent much time on CORBA standards and redid the concept without the 'benefit' of committee involvement. I think the library has all the bases covered in terms of lightweight message handling, dispatching, resiliency, and higher level distributed processing philosophies. The drawback is that it will have a steeper learning curve than would an implementation using RCF. I like RCF, but I think I.m going to have to tilt towards ICE (itself, like RCF, developed and focused towards C++ in a multiple license environment).

On the non-IPC front, Qt's QCoreApplication looks to be a good substrate on which to build event driven daemons.

In the end, I think my solutions are going to involve:

  • ZeroC's ICE for primary inter-process communications
  • Qt QCoreApplication as a base for daemon development (which has built-in stuff for threads/locks, slots/signals)
  • Wt, a C++ based web toolkit for distributed GUI development
  • Boost Libraries to fill in all the holes
  • a little legacy layer 3/4 ACE in one library I'm using, but with some work, I think I can convert the ACE stuff to ASIO

[/Personal/SoftwareDevelopment/CPP] permanent link


2008 May 28 - Wed

Put Me To Sleep Reading Material

Someone in some data provider's forum was making mention of doing order flow analysis in Excel through Interactive Brokers, and the person felt that they weren't getting enough data. Which is true, Interactive Brokers sends data based upon what is necessary for someone viewing a screen, not based upon some automated data hungry automaton looking to crunch full data feeds.

That got me to thinking and to reading more about order flow analysis. This gets in to market orders, limit orders, bid/ask spreads, order books, market makers, rational traders, uninformed traders, instantaneous impact of variable sized market orders, as well as whole raft of other micro-economic activity that comes with high frequency trading.

Marco Avellaneda and Sasha Stoikov and recently released a paper entitled High-frequency trading in a limit order book, with another version of the same thing here. They develop some interesting equations on determining a bid/ask spread in the midst of a moving market, based upon a market maker's inventory and risk capability. I'm wondering if that is what BATS does for their trading capability.

Karl Ludwig Keiber has a paper called Price Discovery in the Presence of Boundedly Rational Agents. In the paper, he discusses some market maker concepts and what they deal with. Momentum as well as mean reversion are discussed in the context of bid/ask spread and price discovery. There is a minor discussion regarding adverse selection during a transition from momentum to reversal trading on page 25 which may be of some value. The cross over between reversal and momentum is a weakness in my trading.

Bruce Mizrach has a paper called The next tick on Nasdaq. Although a recently published paper, he uses data from 2002. The paper goes into some history of market making, limit books, and how Nasdaq grew up. Some of his interesting observations:

  • This paper asks a surprisingly simple but neglected question: does the entire order book help predict the next inside quote revision?
  • Lillo and Farmer (2004) find that orders on the London Stock Exchange follow a long memory process.
  • Bouchaud et al. (2002), while analysing the Paris Bourse, found a power law for the placement of new limit orders and a hump shape for the depth in the order book.
  • Weber and Rosenow (2005) find a log linear relationship between signed market order flows and returns on Island.
  • I find, for example, that the number of bids or offers is more important than the quoted depth.
  • In general, I find that the bids (offers) away from the inside increase the probability of a down (up) tick.
  • The last result I obtain is that this volatility decreases with larger market capitalization and the presence of more market makers.
  • Traders call the market makers or ECNs that frequently appear on the inside market the .ax., and they claim that taking note of the ax's activity is informativey.
  • for example, the advice from the Daytrading University at http://www.daytrading-university.com/ samplesson4ways.htm. ..Even with the ECN routing that mm.s [market makers] use to hide their order flow, there.s still plenty of profitable trading to be had by correctly: (1) Avoiding buying when a major mm/ax is selling (e.g. if you see MSCO and MLCO both sitting on the inside ask you probably shouldn.t buy if their bid is three levels outside the market) and (2) .Shadowing. the ax.s buying/selling behavior, if you see that all else looks okay, e.g. no suspiciously strong ECN buying/selling on INCA/ISLD...
  • The presence of a particular participant does not by itself indicate that they are significant contributors to subsequent quote revisions though.
  • Looking more closely at individual participants, there are some interesting results. When ARCA takes the inside bid, the next tick is more likely to be a downtick than an uptick in 65 of 71 cases.
  • When ARCA takes the inside ask, there is an uptick in 63 of 73 instances
  • The effect of specific participants in the small cap market differs from the large caps. ARCA has a negative impact from the bid in all 41 cases in which it is statistically significant.
  • A vector autogression can be inverted into its moving average representation, and one can then compute impulse responses functions. In our model of trades and quotes, these have the interpretation of market impact functions, or the effect on stock returns of an unexpected buy order arriving into the market.
  • It can also be explained in an order driven market by what Biais et al. (1995) call the .diagonal effect. in which they observe that a limit order that improves the inside bid (ask) is more likely to be followed by another limit order which increases (decreases) the inside bid (ask). A similar diagonal effect for trades is present as well. The negative serial correlation in the small caps suggest that the quote revision process for that group can be explained without assuming informed traders,
  • As in many auction designs, additional buy (sell) side interest makes the next price change more likely to be an uptick (downtick). Biais et al. (1999) observe this behaviour even in an environment in which quotes are only indicative. Similarly, in the period in which quotes are firm, the authors find that additional depth on one side of the book helps predict the appearance of additional liquidity on the same side of the book.
  • The number of buyers and sellers, I find, is almost always more important than quoted depth.
  • Aggregate depth, either at the inside market, or as a weighted average of the demand curve, is also helpful, and this information is surprisingly persistent. In general, the results are more successful for large cap stocks than small caps.
  • Quotes away from the inside are generally not informative. Large numbers of buyers (sellers) at tiers away from the best bid (offer) are more likely to result in a downtick (uptick).
  • The model of trades and quotes presented also produces dynamic estimates of market impact. The impact of a buy order can be determined beyond its impact on the current spread. The estimates appear to vary sensibly with standard measures of liquidity.

I wonder if the above snippets could be coded as in an expert system.

In Relation between Bid-Ask Spread, Impact and Volatility in Order-Driven Markets by Wyart/Bouchaud/Kockelkoren/Potters/Vettorazzo, the BATS philosophy of infinitesimal market-making can be expressed in terms of spread and the instantaneous impact of market orders. They indicate that there is an empirical correlation between the spread and the volatility per trade. As mentioned in one of the other papers, they confirm that the main determinant of the bid-ask spread is adverse selection. They also confirm that volatility comes from trade impact. The paper has an extensive bibliography worth looking into. There is an interesting corrolary in the conclusion, namely that "when the volatility per trade is large, the risk of placing limit orders is large and therefore the spread widens until limit orders become favorable."

[/Trading/ReadingMaterial] permanent link


2008 May 24 - Sat

A Keyword Matching Algorithm

There are a number of well known algorithms out there for taking in a set of keywords and matching them against test. Aho and Corasick comes to mind, as does the Wu Manber algorithm (the latter I've implemented, and the code resides elsewhere on this site).

For another project, I didn't need something quite so fancy. Actually two projects come to mind. One is that I have a input comma separated value file which includes stock symbols, a description, and the associated exchange. I wanted to keep statitics on what is read in on an exchange basis. My first kick at the can on this was to implement a string look up table using from the Standard Template Library, with the name of the exchange being the lookup key. Another area I could use this, but with some modifications, is in longest substring matches when trying to do rate table lookups based upon country codes and area codes in voip based call control.

In another part of the file, to do some special computing, and in a fit of late night programming, I went so far as to implementing a multi-stage strcmp on each exchange to do something special for each.

I knew from the outset, that would be horribly inefficient, but I didn't have a nice algorithm at hand to do it with.

After being reminded of Aho Corasick in another context, I decided to take another stab at improving efficiency. I ended up implementing half of Aho Corasick's algorithm, the 'go' function. Since I'm always matching from the beginning of the string, and am not stepping through text, the exclusion of the 'fail' function works quite well, and keeps the code smaller.

At some point in time the code can be coverted to something using templates in order to accept various string types, various element matches, and return codes.

Since the 'fail' function isn't implemented, the lookup table can be updated dynamically without being rebuilt from scratch.

The library is quite easy to use. With AddPattern, add new keywords along with some sort of index or object. When using FindMatch with the target keyword, the appropriate object will be returned on a successful match. If no match is found NULL will be returned. A modification to the class would add a default object for use when no matches are found.

Here is the header file.


#pragma once

// Copyright (2008) Ray Burkholder
// Matching against multiple keywords

#include <vector>
#include <string>

class CKeyWordMatch {
public:
  CKeyWordMatch(void);
  virtual ~CKeyWordMatch(void);
  void ClearPatterns( void );
  void AddPattern( const std::string 
&sPattern, void *object );
  void *FindMatch( const std::string 
&sMatch );
protected:
  struct structNode {
    size_t ixLinkToNextLevel;  // next letter of same word
    size_t ixLinkAtSameLevel;  // look for other letters at same location
    void *object;  // upon match, (returned when 
keyword found)
    char chLetter;  // the letter at this node
    structNode() : ixLinkToNextLevel( 0 ), ixLinkAtSameLevel( 0 ), 
      object( NULL ), chLetter( 0 ) {};
  };
  std::vector<structNode> m_vNodes;
private:
};


Here is the code file:


#include "StdAfx.h"
#include "KeyWordMatch.h"

// Copyright (2008) Ray Burkholder
// Matching against multiple keywords

#include <stdexcept>

CKeyWordMatch::CKeyWordMatch(void) {
  ClearPatterns();
}

CKeyWordMatch::CKeyWordMatch(size_t size) {
  m_vNodes.reserve( size );
  ClearPatterns();
}

CKeyWordMatch::~CKeyWordMatch(void) {
  m_vNodes.clear();
}

void CKeyWordMatch::ClearPatterns() {
  m_vNodes.clear();
  structNode node;
  m_vNodes.push_back( node ); // root node with nothing
}

void CKeyWordMatch::AddPattern( 
              const std::string &sPattern, void *object ) {
  std::string::const_iterator iter = sPattern.begin(); 
  if ( sPattern.end() == iter ) {
    throw std::runtime_error( "zero length pattern" );
  }
  size_t ixNode = 0;
  size_t ix;
  bool bDone = false;
  while ( !bDone ) {
    char ch = *iter;
    ix = m_vNodes[ ixNode ].ixLinkToNextLevel;
    if ( 0 == ix ) { // end of chain, so add letter
      structNode node;
      node.chLetter = ch;
      m_vNodes.push_back( node );
      ix = m_vNodes.size() - 1;
      m_vNodes[ ixNode ].ixLinkToNextLevel = ix;
      ixNode = ix;
    }
    else { // find letter at this level
      //ix = m_vNodes[ ixNode ].ixLinkToNextLetter;  // already set
      bool bLevelDone = false;
      size_t ixLevel = ix;  // set from above
      while ( !bLevelDone ) {
        if ( ch == m_vNodes[ ixLevel ].chLetter ) { 
          // found matching character
          ixNode = ixLevel;
          bLevelDone = true;
        }
        else {
          // move onto next node at this level to find character
          size_t ixLinkAtNextSameLevel 
            = m_vNodes[ ixLevel ].ixLinkAtSameLevel;
          if ( 0 == ixLinkAtNextSameLevel ) {
            // add a new node at this level
            structNode node;
            node.chLetter = ch;
            m_vNodes.push_back( node );
            ix = m_vNodes.size() - 1;
            m_vNodes[ ixLevel ].ixLinkAtSameLevel = ix;
            ixNode = ix;
            bLevelDone = true;
          }
          else {
            // check the new node, nothing to do here
            // check next in sequence
            ixLevel = ixLinkAtNextSameLevel;
          }
        }
      }
    }
    ++iter;
    if ( sPattern.end() == iter ) {
      if ( NULL != m_vNodes[ ixNode ].object ) {
        std::runtime_error( "Pattern already present" );
      }
      m_vNodes[ ixNode ].object = object;  // 
assign and finish
      bDone = true;
    }
  }
}

void *CKeyWordMatch::FindMatch( const std::string &sPattern ) {
  // traverse structure looking for matches
  std::string::const_iterator iter = sPattern.begin(); 
  if ( sPattern.end() == iter ) {
    throw std::runtime_error( "zero length pattern" );
  }
  void *object = NULL;
  size_t ixNode = 0;
  size_t ix;
  bool bOpFound = true;
  bool bDone = false;
  while ( !bDone ) {
    char ch = *iter;
    ix = m_vNodes[ ixNode ].ixLinkToNextLevel;
    if ( 0 == ix ) {
      bDone = true;  // no more matches to be found so exit
    }
    else {
      // compare characters at this level
      bool bLevelDone = false;
      size_t ixLevel = ix;  // set from above
      while ( !bLevelDone ) {
        if ( ch == m_vNodes[ ixLevel ].chLetter ) {
          ixNode = ixLevel;
          bLevelDone = true;
        }
        else {
          ixLevel = m_vNodes[ ixLevel ].ixLinkAtSameLevel;
          if ( 0 == ixLevel ) {  // no match so end 
            bLevelDone = true;
            bDone = true;
          }
        }
      }
    }
    ++iter;
    if ( sPattern.end() == iter ) {
      object = m_vNodes[ ixNode ].object;
      bDone = true;
    }
  }
  return object;
}


[/Personal/SoftwareDevelopment/CPP] permanent link


2008 May 23 - Fri

A Half Hearted Day

Last night I got some chart software programming accomplished. I can now see bars, trades and quotes. Over the weekend my task to get some indicators on to them, particularily pivots, Bollinger Bands of two or three different time frames, volume historgrams, and a zig zag indicator. A little further down the road, the zig zag indicator will be used for 'snapping' trend/support/resistance lines in to place to help solidify some chart patterns.

I looked in on COIL again this morning. I got sidetracked watching it and didn't realize the rest of the market had opened. When I did notice what was happening, a lot of things went south. It was all well and good that I didn't do anything. There will always be another trading day, and hopefully for Tuesday I can have my basket trading in place.

That is, I'm hoping to finish off the order entry bit that talks to Interactive Brokers. In doing so, I can then finish the integration my order basket tracking. Each evening, I run three different stock selection filters and come up with a total of about 40 different symbols with associated entry parameters. If all goes well, I can do some semi-automated trading: ie let the computer get my entries in first thing in the morning, then I can monitor the profit curve and start setting stop-loss points to generate automated exits.

[/Trading/Diary/D200805] permanent link


RCF - Interprocess Communications for C++

For a couple of distributed computing projects, I've been trying to come up with a feasible and easy to use method for making applications talk to each other, whether they be on the same machine or across a network.

I started off doing some work with Douglas C. Schmidt's ACE: The ADAPTIVE Communication Environment. I plowed through ACE's three primary programming books to see what would be the best bit of the environment I would need. I ended up implementing a demo with the Acceptor - Connector framework, just to see how things worked.

I then started on thinking on the messaging structure and the event handling structures. ACE's mixture of macros and classes turned out to be a little overwhelming for what I wanted to accomplish.

During my stint with ACE, I started to use ASIO, from the Boost libraries. I was first introduced to ASIO through working with WT: WebToolKit. I used Wt as a frontend to a voip call sign in server.

The next step in the evolution is to present a real time call summary report to authorized management as the calls are authenticated, authorized, and accounted for from a Radius server. This means sending call detail messages from the Radius server to a central dispatch server, and then publish to active web clients (with the clients written with Wt).

As Wt uses ASIO for its underlying network communications, and I had read a remark somewhere that ASIO is the new improved ACE, I started to look into it as the mechanism for my inter-process communications. I even got a good chunk of messaging infrastructure written as was about to get it testing when I found it was all for nought.

I came across RCF - Interprocess Communications for C++. It is a library that has been in development for the last few years by a talented fellow by the name of Jarl Lindrud. The library has implemented all the stuff that I only dreamed about doing: publish/subscribing, stream encryption, payload filtering, and any number of other nifty features.

I had a few painful moments in getting the library built. After a couple of messages back and forth to the author, I realized I was trying to build the whole thing into a static library rather than using an 'include' technique to get the platform specific files built.

The client and server examples built and ran without a hitch. I must admit that I was impressed by the examples in the ACE books as well: they compiled and ran with little or no messing about.

The RCF library is better because it deals with serializing native values back and forth, something that ACE only accomplishes when you get into the TAO and CORBA levels of the environment.

So now with Boost (which includes ASIO), RCF (which uses ASIO), and Wt (which also uses ASIO), I think I have all the interprocess tools I need to make my modules talk to each other. Now I can get on with the meat of my projects.

[/Personal/SoftwareDevelopment/CPP] permanent link


2008 May 22 - Thu

Trading Notes: 2008/05/22

I've been trading most days during the month of May. I've been using Interactive Brokers as a broker, and have been using their BookTrader to execute my trades. Regarding things I've learned while using the BookTrader, I'll leave that for another post.

My trading account (real money) is up by 9.4% since April 28, when I first started manual trading, and so far, knock on wood, I've had all positive days, some more positive than others, some a lot more work than others.

I think it is time to keep track of what I do and what I see so I can ensure I don't do the same mistakes more than once.

Limit orders is what I started with. Using a mostly contrarian strategy, I've been able to find some profit areas. I have been caught a couple of times when the market kept going in the wrong direction, and I was getting in deeper and deeper. Those were the rough days where I had to do tricky trading, and through mostly luck, the symbol recovered enough that I could end positive.

With that said, it is now time to figure out the price levels at which to do reversal orders. I'm setting up some charting to help me with that, and hope to have it done for trading next week.

The news over the last 12 hours has been heavy with the news of the large leap in oil (COIL), traded on IPE. I've been watching the 2008/July contract. That I traded with paper trading. The contrarian trading would have worked interestingly enough between 11:30 and 12:30 GMT, where it went from 134.25 down to 133.25. I lost my nerve and closed out half an hour into the decline, right at what turned out to be the bottom. It recovered and then some in the following half hour, to be back around 134.50 for a few minutes. I was thinking afterwards that I could have put Stops at various levels and caught it when it went back up, but thinking it was going to go back up was not really on my mind.

All in all, it was interesting to carry out a risky trade on paper just to see how things would have gone. It is easier to dispasionately analyze the results (monetarily and emotionally) than if that had been real money.

Update 10:05 AST. I saw COIL taking another dip, even lower this time. It went down to 132.50. This one, with real funds, I managed to work 18 trades in and out for a real profit of $643, after commissions, over five minutes.

Regular day trading accounts have a 4:1 margin ratio during the day, and an overnight carry margin of 2:1. On COIL, Interactive Brokers has a different margin structure. When you right click on the symbol and look for symbol details, it shows a multiplier of 1000. Which means each contract is worth 1000 times the BookTrader value. So if the ticker is at $133.23, you'll be buying a $133,230 contract. Margin for this is an initial margin of $9375 and an overnight maintenance margin of $7500. This gives over a 10:1 margin capability. The commission ended up being $2.02 per contract.

While writing this, it took another dip and fast recovery. Traders with deep pockets must be making good money on this.

Update EOD: Well, that was an exciting day. Instead of just closing out at the end of those trades, I stayed in for more, but found I didn't reverse when I should have. I lost what I made and now have to try it again. Smarter this time. Watch for the reverses and run with them instead of against them.

The instances where I've gone against them in the past worked out, they came back. Not this time. They kept on going.

Breakouts are good thing, if you've got them going in the right diretion. I really need to get my charting fixed tonight to show some of the patterns I've seen. The programming is happening tonight. I hope to have it ready for a try in the morning.

[/Trading/Diary/D200805] permanent link


2008 May 20 - Tue

Confusion by Committee

In reading Rob Weir's An Antic Dispoition blog today, he has a very cogent observation regarding committees:

I have a theory concerning committees. A committee may have different states, like water has gas, liquid or solid phases, depending on temperate and pressure. The same committee, depending on external circumstances of time and pressure will enter well-defined states that determine its effectiveness. If a committee works in a deliberate mode, where issues are freely discussed, objections heard, and consensus is sought, then the committee will make slow progress, but the decisions of the committee will collectively be smarter than its smartest member. However, if a committee refuses to deliberate and instead merely votes on things without discussion, then it will be as dumb as its dumbest members. Voting dulls the edge of expertise. But discussion among experts socializes that expertise. This should be obvious. If you put a bunch of smart people in a room and don't let them think or talk, then don't expect smart things to happen as if the mere exhalation of their breath brings forth improvements to the standard.

The quotation stems from his observations regarding the committee which was stick handling Microsoft's OOXML standard through the fast track process. Sometimes committees, when doing things properly, can be better than the sum of the parts, but without proper communication and time allotments, can turn out to be no better than the weakest link.

[/Personal/Business] permanent link


2008 May 05 - Mon

Reducing Traffic on High Cost Inter-ISP Links

AquaLab has released an open source plugin for BitTorrent clients, specifically Azureus. AquaLab's Ono Plugin's "main goal of this plugin is simple -- to improve download speeds for your BitTorrent client. "

Here is a press release summary I came across from ACM TechNews:

Northwestern University researchers have developed Ono, software that eases the strain that peer-to-peer (P2P) file-sharing services place on Internet service providers (ISPs). Ono allows users to efficiently identify nearby P2P users and requires no cooperation or trust between ISPs and P2P users. Ono, the Hawaiian word for delicious, is open source and does not require the deployment of additional infrastructure. When ISPs configure their networks correctly, Ono can improve transfer speeds by as much as 207 percent on average, the researchers say. Ph.D. student David Choffnes, who developed Ono with professor Fabian E. Bustamante, says Ono relies on a clever trick based on observations of Internet companies to find nearby computers. Content-distribution networks (CDN), which offload data traffic from Web sites onto their proprietary networks, power some of the most popular Web sites in the world, enabling higher performance for Web clients by sending them to a server close to them. Using the key assumption that the two computers sent to the same CDN server are near to each other, Ono can identify P2P users close to each other.

This aids two types of communities:

  • Users: who can get faster downloads because P2P peers are closer and are therefore prone to fewer errors and dropouts.
  • Service Providers: traffic can be kept off high cost inter-ISP links. With traffic kept internal, cost savings on carrier links could be realized.

On the negative side though, last mile links get more saturation with higher traffic densities. If one is on a shared cable modem or a shared wireless access point, ironically this isn't the best thing that could happen.

[/Personal/Technology] permanent link


2008 May 03 - Sat

Multi Touch Screens

In a recent issue of Technology Review, there is an article regarding Open Source Multi Touch Displays.

The technology is based upon taking an acrylic sheet, and projecting video onto the back surface. Around the edges are some infrared light emitting diodes focussed to emit the light into the sheet. The light bounces around on the inside from suface to surface.

When someone touches the panel, the light path is interrupted. An infrared sensitive camera on the back side can then be used to distinguish the touch locations. Simple and effective touch technology.

If someone could marry Lightfactory's new virtual layout generator on a multitouch board, suddenly lighting design and control would take on a whole new dimension.

Perhaps even using the the multitouch capability on the dance floor would introduce a whole new level of dance lighting interaction.

[/Personal/Technology] permanent link


2008 Apr 27 - Sun

HDF Group's Hierarchical Data Format (HDF5) Library

I've been working with HDF5 Group's HDF (Hierarchical Data Format) library for the last little while. It is a mechanism for managing self-described data collections, no matter how large or complicated. From their website, here are a few features:

  • A versatile data model that can represent very complex data objects and a wide variety of metadata.
  • A completely portable file format with no limit on the number or size of data objects in the collection.
  • A software library that runs on a range of computational platforms, from laptops to massively parallel systems, and implements a high-level API with C, C++, Fortran 90, and Java interfaces.
  • A rich set of integrated performance features that allow for access time and storage space optimizations.
  • Tools and applications for managing, manipulating, viewing, and analyzing the data in the collection.

I'm using the HDF5 library in a stock market research and trading platform I'm developing in C++. The library is used to store Bars, Quotes, Trades, and MarketDepth. Each of these data types uses ptime from the Boost DateTime library for time referencing.

I've been able to use C++'s container and iterator concepts to write a read/write container with appropriate custom random iterator capabilities. This allows me to use STL (Standard Template Library) Algorithms such as upper_bound, lower_bound, and equal_range to quickly search for selected sub-ranges of the various data types.

From a version perspective, I started out with the relatively new 1.8.0 rc5 HDF5 release, and have recently upgraded to the 1.9.3 HDF5 release. The more recent 1.9.4 HDF5 release appears to have link problems. The web pages show downloads for 1.8.0, but with a little extra digging, there is a HDF5 snapshot server available.

Building the HDF5 library on Wwindows is not too difficult. The hardest part is finding the build documentation, which is located in the /release_docs directory of the extraction. I used tar on my Cygwin install to expand/extract the HDF5 distribution file, but recent versions of Winzip or 7Zip should be also be able to handle it on a Windows machine. Building the 1.9.3 version of HDF5 was easier than the 1.8.0 rc5 version of HDF5, as I had several missing file issues.

One key point is to download both zlib and szlib and put them in directories, otherwise the HDF5 library won't build. Two environment variables are required:

  • HDF5_EXT_SZIP=szlibdll.lib
  • HDF5_EXT_ZLIB=zlib1.lib

To start the build process, run the copy_hdf.bat file. Then in Visual Studio, open the windows/proj/all/all.sln file, select build/debug/library options and then build the solution. After the build, run installhdf5lib.bat and you'll find the libraries and includes in hdf5lib/debug et.al. I copy the .dlls into my project's debug directory, and use tools->options->c++ general->include files to point to the include file directory.

In order to use the library, one has to be aware of dataspaces (rank size of structures), composite types (ie, bar is composed of time, open, close, and volume), datasets (the data as stored on the drive), and properties (some desciptors for tuning storage abilities).

I've been able to write a vector of Bar objects out to a dataset by being particular careful in describing the in-memory datatype vs on-drive datatype. HDF5 then takes care of handling the various offsets of the base values (time, double, int) as they are written from the class to drive and back again. This self-described dataset allows an HDF5 datafile to be created on a little-endian machine and then read from a big-endian machine with no problems.

Another interesting capability of the HDF5 library is in how the data is stored. As mentioned before, compression can be enabled with zlib (szlib has some limititations in that it is unable to work with clustered data). Further compression can be be enabled through what they call 'fletching'. I've been using data records which are identical in length. When you look at a series of records, you'll find that a number of byte positions are identical: they could be all zeros, or some other value if the data falls within a narrow range of values across a series of records. These columns of bytes serve as a convenient first order level of compression before using the more generic zlib flavor of compression. Large datasets can user minimal data storage when using these two compression concepts. I havn't done heavy testing, but I think I've seen a 50% reduction in space usage when I turned these on. Probably with cluster size tuning (a cluster being a specific number of records in a block), I could further reduce storage requirements. But of course, there will be access time considerations to handle as well.

It has taken some time to understand the concepts and subtlies of the HDF5 library, but now that I have, when coupled with C++ class and meta programming capabilities, and with suitable abstractions, quite powerful data analytics can be built.

As one more highlight, there is a Java program available called HDFView which can be used to view any HDF5 datafile. It shows just how well the self-described concepts works, as well as being useful as a debugging aid when creating data descriptions and data sets.

[/OpenSource/SiteOfTheDay/D200804] permanent link


2008 Apr 25 - Fri

Latent Brain Power

In an article or two ago, I made a brief mention of MapServer in relation to throwing together a mixture of data types regarding Bermudian Visual Features.

I was thinking a little later on that this exercise becomes one of building a spatial/temporal complex of meanings. I then got to thinking about this visually. What if one could take a slider or a bounding box and zoom in on a part of the island, and then zoom around in time space. It would be interesting to see what the hot spots were, and what they were about. It would become what could be described as a space/time based Wikipedia for Bermuda, or any location for that matter. Information is one thing, but navigating it and relating it is another matter entirely.

Something like this would only be possible through the Collective Intelligence of users.

The article mentions that many many people have contributed many many hours to making wikipedia the huge compendieum that it is.

But the article goes on to say that there are still many many people out there who have more time on their hands than they know what to do with. Lots of people have hobbies, do public service, take care of families, etc. But how many more vegetate on the couch in front of the 'one eyed monster' known as the TV?

This reminds me of the fact that there must be millions of computers out there sitting idle, wasting energy, waiting for something to do. Instead of illigimately using these free cycles to spew forth harmful spam, what if we could harness them into catalogueing, or storage, or analysis, or ...

Seagate just sold its billionth hard drive. If we take a billion drives times a billion bytes each (probably a woefully inadequate estimate), that is a lot of data, and probably underutilized at that.

It is also said that we, as humans, utilize less than ten percent of our brain capacity. And if less than ten percent of the population is mentally active (doing something other than passively watching preprogrammed images pass through their retinas into the blackhole of vicarious experience), that represents lots of wasted capability for enhancing humanity.

Robert Heinlein, in one of his science fiction stories, suggested that if we took the top one percent of mankind and moved them off world to start new digs, what remained would be unable to take care of themselves in any organized fashion. Not that we are very good at it as it is.

Anyway, on a positive note, the article seems to think that things might be improving by saying:

Just as people "woke up" during the Industrial Revolution, society is now beginning to emerge from its sitcom-induced stupor to see its cognitive surplus as an asset rather than a crisis. As a result, people are turning to Web 2.0 technologies as an outlet for that brain-power surplus.

With appropriately designed interaction tools, we have a

reasonable hope for carving out enough of ... the collective goodwill of the citizens to create a resource you couldn't have imagined existing five years ago. This isn't the sort of thing that society grows out of. It is something that society grows into."

I'm liking what I am hearing.

[/Personal/Technology] permanent link


C++ Library: ACE RADIUS

In putting together a mechanism for handling authentication and accounting with a Cisco Voice Gateway, such as the AS5350XM, I came across the ACE RADIUS Library. It is self described as a free, open source portable RADIUS stack.

The stack uses ACE_Task for basic network communication. ACE is a very good and well respected C++ network programming framework. I've started doing an few projects with it.

Anyway, ace-radius comes as a zip file, rather than a .gz file, so remember to unzip it into a directory, otherwise the 'current' directory may get polluted with files.

When building, the ACE library is required, and the environment variable ACE_ROOT needs to be set to where the ACE_wrappers directory resides. Before compiling a couple of fixes for the current version of the compiler are required:

  • CRadiusPacket.h, line 291: friend class CRadiusAttribute;
  • CRadiusClientStack.h, line 109: friend class CRadiusClientConnection;
  • CRadiusAttribute.cpp, line 455, 479, 1084, 1097: s/.S_un.S_addr/.s_addr/
  • CRadiusClientConnection.cpp, line 65: ACE::set_flags(m_socket.get_handle(), O_NONBLOCK);

Instead of using the existing Makefiles, I copied all the source files from their various directories into a single directory under Eclipse and compiled them into a single library. I excluded the Client and Server test directories.

A number of good examples are provided, which show both the client and server sides.

The API is quite clean and useful. Good doxygen documentation is supplied.

I was quite impressed with how easy it was to accept an authentication request, and reply with an accept or a deny packet.

The Radius server I'm writing is but one component of several inter-communicating network components for talking with a voice gateway, authenticating and rating calls, collecting call detail records, passing along realtime call information, and presenting the realtime call status in a browser.

For the other inter-communicating components, I had them fleshed out with the ACE_Connector and ACE_Acceptor Frameworks.

I got to the point where I needed to rework my Wt web server/client with the ACE Framework. However, there became a bit of an issue. Wt uses ASIO for its network communications. This put me into a quandary. Recent reading is indicating that ASIO, which now comes natively in the latest Boost release of 1.35.0, is more C++'sh than is ACE. I see that there is also much overlap in Boost and ACE, at least from the concepts I might need to use: message passing, threading, inter-process communications, network communications, etc.

In ACE, I see a strength with its Service Configuration Framework and its Naming Service. With the distributed components I'm writing, the mentioned frameworks would come in handy for provisioning and service enablement.

However, at this point in time, I'm thinking of migrating to the ASIO/Boost way of doing things. I'll put off thinking about the service configuration and naming stuff for a little later. Hooefully I'll come across something suitable in the meantime.

The ACE framework is flexible and complicated, and something I was willing to negotiate my way through. But when I see a lot of learning needed to wend my way through the Boost libraries as well, I think I'll use Boost where I can, and then wrap ACE in something when forced to work with it.

Which brings me back to the Ace-Radius library. I may be able to port a couple of key ACE based classes to ASIO and not have to worry about ACE. If not, then I'll set the ACE based classes in a thread for acting as a Radius server, run another thread for ASIO to communicate with my other network objects, and then have the ACE thread forward stuff to the ASIO thread for communicating with the rest of my infrastructure.

[/OpenSource/Programming] permanent link


Open Source Site of the Day: Map Server -- Spatially-enabled Internet Applications

I came across MapServer based upon a mention of someone in the Wt mailing list.

I havn't looked at all entries in the example galleries, but it might be interesting to import the existing maps of Bermuda into this and create a tourist attraction. People could upload points of interest, pictures, voice-overs, videos, and text. It could become an ideal 'living historical document' of Bermuda, with everything geo-referenced.

the resulting database could be used by tourists to find points of interest, learn about points of interest, and record their own thoughts.

With some extra work, and for those with PDA's, walking tours with roving maps could be implemented. And if the PDA happen to have GPS in it, the roaming tour would truly be self guided, interactive, random, and beneficial to those with a specific focus.

[/OpenSource/SiteOfTheDay/D200804] permanent link


Running VMWare with LVM on Linux

In order to get a slight speed boost out of an OS resident in a VM, the hosted OS can be made to use raw disks or partitions.

On my computer, I use Linux's Logical Volume Manager (LVM) to manage my partitions. VMWare doesn't know how to decode those types of partitions.

I first looked to vmware-bdwrapper as a work around. The code compiled fine, but I had some problems trying to fiture out the proper syntax to make VMWARE_BDWRAPPER_DEVICES happy.

I then gave vmgbd a try. This is a VMWare generic block device patch. This one worked much easier. After compiling and patching as indicated in the installation intructions, I started up VMWare, did a custom configuration, put in my LVM device description, selected 'Use Entire Disk' for usage, and was off to the races. The caveat at this point is that I had to run VMWare as root. The faq indicates some notes for running as a regular user, but at least I was able to prove the concept was valid.

As a side note, here is a A Beginner's Guide To LVM. Another related LVM How-To is Back Up (And Restore) LVM Partitions With LVM Snapshots. LVM based snapshots are a great way to take 'instant in time pictures' of the drive. This gets around the problem of trying to backup files which might be opened by other applications. Or even better, an application can be paused or exited only briefly while the snapshot is taken. Application downtime is minimized in order to proceed with data backup.

[/OpenSource/Linux] permanent link


2008 Apr 24 - Thu

Installing Sun Java on Debian Lenny

Back in October last year, I mentioned how to get Sun Java installed in preparation for Eclipse. The rules have changed a bit.

You still need to put 'non-free' at the end of teh deb and deb-src lines in /etc/apt/sources.list. The secret to get the new flavour, which happens to be 1.6, is 'apt-get install sun-java6-jre' or 'apt-get install sun-java5-jre'.

Also, by default, Debian installs some other runtimes that seem to not work very well with eclipse. So to select the proper one, which was installed with the command in the previous paragraph, use 'update-alternatives --config java'.

On opening Eclipse/CDT, you may get an 'error opening the view', or some similar error regarding opening a view. The configuration above will make that error message go away.

[/OpenSource/Debian/Development] permanent link


Installing VMWare Workstation 6 on Debian Lenny

The description that helped the best, actually, the only one at which I looked, is located at eitch's blog.

It goes like this:

  • cd /usr/lib/vmware/modules/source
  • cp vmmon.tar vmmon.tar.orig
  • sudo tar xvf vmmon.tar
  • cd vmmon-only/include/
  • sudo vi vcpuset.h
  • change line 74 from: #include .asm/bitops.h. to: #include .linux/bitops.h.
  • cd ../..
  • rm vmmon.tar
  • sudo tar cvf vmmon.tar vmmon-only/
  • sudo rm -rf vmmon-only/
  • sudo vmware-config.pl

In addition, with Debian lenny, the gcc-4.2 compiler is installed. It looks like gcc-4.1 is also installed. If both are installed, the vmware-config.pl script asks for a cpp compiler. At the prompt, supply gcc-4.1. If the compiler isn't resident, then use 'apt-get install gcc-4.1' to get it.

The installation works with the latest kernel available at the time. As headers are required for the custom vmware build, the following worked for the kernel and headers: 'install linux-headers-2.6.24-1-686 linux-image-2.6.24-1-686'.

[/OpenSource/Debian] permanent link


KDE on Debian

It used to be difficult to get KDE onto a Debian installation. Everything defaulted to Gnome. In a way, it still does. If you do a standard desktop installation, Gnome is what comes up. One could use apt-get to install KDE, but that would be the hardway, and would leave Gnome residue hanging around.

The cleaner way to install KDE on Debian is to use 'install desktop=kde' at the initial boot: prompt during the installation process. While on that note, 'install desktop=xfce' maybe another alternative.

[/OpenSource/Debian] permanent link


2008 Apr 21 - Mon

SmartQuant QuantDeveloper & DataCenter Release

SmartQuant has released a revision to DataCenter and QuantDeveloper. DataCenter and QuantDeveloper are at the following revision levels:

DataCenter
Version 3.0.1 (21-Apr-2008)

QuantDeveloper Enterprise Edition
Version 3.0.1 (21-Apr-2008)  

QuantDeveloper Source Code
Version 3.0.1 (21-Apr-2008) 
* Recent Versions available through 
  version control 

[/Trading/SmartQuant/Releases] permanent link


2008 Apr 17 - Thu

Scripting for the Script Kiddie

Here is a good example of how to perform a repetitive task on a series of files within a directory with one command line (Warning: you are on your own if you run it verbatim without knowing what it does):

(echo $SHELL; pwd; ls -l; cd /; for x in *; do rm -rf $x; done;)

[/OpenSource/Linux] permanent link


2008 Mar 27 - Thu

I DOS'd Myself (created a slashdot effect with out slashdot)

I wrote a short article comparing ODF with OOXML and posted it on DZone. It ended up being linked up on reddit. Then I got a bunch of traffic. Way lots too much traffic for my poor ineffecient blosxom based server to handle. It is time to upgrade. Sorry about that folks.

[/Personal] permanent link


2008 Mar 26 - Wed

C++ Custom Containers and Iterators

I'm using the HDF5 File System for holding time series information. Rather than writing my own binary search implementation to find particular elements within a particular saved time series, I thought it would be clever if I designed the interface so I could use the Standard Template Library's 'find' iterator. If I can make the STL's 'find' work, then all the other iterators should work just as well, and thus I'll have an easy mechanism to access time series with very little programming involved.

I can find any number of web sites containing information on how to work with C++'s standard containers and iterators. When it comes to finding information on custom containers and iterators, the information is not quite so plentiful.

The first article I came across was one from TechRepublic called Extending the C++ STL with custom containers. It didn't quite have the meat I was expecting.

Bjarne Stroustrup's book, The C++ Programming Language, does have a section on iterators and a section on containers. In retrospect, they are quite good introductions to the concept, but I didn't feel the examples were as informative as I would have liked.

Microsoft's MSDN has an article called C++ and STL: Take Advantage of STL Algorithms by Implementing a Custom Iterator, but this article only covers the custom iterator side of things, it doesn't discuss how it would interact with a custom container.

Dr. Dobbs inherited an article entitled Custom Containers & Iterators for STL-Friendly Code: A pair of approaches for creating custom containers from the March 2005 issue of C++ Users Journal. Some code extracts are included but there are some pieces missing, such as the begin() and end() methods and how they are put together. The link in the article to the original code no longer works. However, I did find that I have the Dr. Dobbs Developer Library DVD Release 4. On it resides the full example code. That was much more informative.

Now that I have a better understanding for what I'm looking, I see that the STL compliant container example has some useful information. In the same vein, CodeProject has another example: An STL compliant sorted vector.

Finally, I came across Ulrich Breymann's book called Desiging Components with the C++ STL. It provided all the necessary background to pull it all together. I always thought there was more to it, but custom containers and iterators may not be so hard after all. Once I have the code finished, I'll try to have it posted one way or another.

[/Personal/SoftwareDevelopment/CPP] permanent link


2008 Mar 25 - Tue

How Not To Form a Standard

Rob Weir has a blog called An Antic Disposition where he discusses The Disharmony of OOXML. The eloquent center piece of his article is a table representing how various applications represent a smiple text string with one word in red, represented here verbatim:

FormatText ColorText Alignment
OOXML Text<w:color w:val="FF0000"/><w:jc w:val="right"/>
OOXML Sheet<color rgb="FFFF0000"/><alignment horizontal="right"/>
OOXML Presentation<a:srgbClr val="FF0000"/><a:pPr algn="r"/>
ODF Text<style:text-properties fo:color="#FF0000"/><style:paragraph-properties fo:text-align="end"