2008 Jul 03 - Thu
Upgrade from Eclipse Europa to Ganymede (with painful Subversion)
Today, I upgraded from Eclipse/CDT Europa to Eclipse/CDT Ganymede.
(CDT meaning C++ Developer Tools). The Eclipse upgrade was painless:
download the Eclipse/CDT package, expand it, and start eclipse from
within the directory. After pointing it to my workspace, everything
was there. Nicely simple.
My subversion client was an entirely different story. For the Europa installation,
everything came from the tigris site and worked well. For the Ganymede installation,
there are now two sites involved, and I'm not sure which is what. I think
the tigris site can now be ignored (for the time being). In the
installation instructions somewhere, one needs to go to the
Polarion site for their client. There
are Eclipse update links there.
However, what I assumed to be workable defaults of using the JavaHL library on
Debian turned out to be non-workable. The solution was not to use the JavaHL
client but use the SVNKit client.
My key problem is that my SVN repository requries an ssh public/private key.
The JavaHL library, if or when it would or wouldn't load, I'm not sure what I was
seeing, but I could only see the option for user name and password authentication.
It would have been nice if the Subversion/Polarion/Eclipse guys would all get
together and make it straight-forward in terms of which libraries from
which sites need to be download. If they imply that the JavaHL libraries should
be downloaded, please make it painless to get the paths set and ensure the binaries
are present. After 15 million lines of code, you'd think that would be a
small task to accomplish.
[/OpenSource/Programming]
permanent link
2008 Jun 17 - Tue
Concurrency aka MultiThreading
A number of my projects are approaching the phase where some of their feature sets
will work better with some form of background processing. In a trading application,
plowing through historical data on thousands of symbols looking for patterns would be
best left to a background task, rather than rendering the user-interface frozen during
the, well, duration. For Radius, with a listener on an accounting port, and one on an
authorization port, the two threads need to coordinate access to resources.
Windows has a native threads API, but that isn't necessarily portable between operating systems.
My trading application is in Windows, and the Radius application in on Linux. Using the same API on both
platforms wuold be cool.
As I already use the Boost tools in their various forms, Boost::Thread would be a good candidate.
The Boost documentation isn't exactly overflowing with examples. Dr. Dobb's Portal
saves the day with an article dating from May 2002 entitled
The Boost.Threads Library.
It has good examples covering the basics:
- Thread Creation (boost::thread)
- Mutexes (boost::mutex), which protects one thread from another
- Condition Variables (boost::condition), which are good for getting data into and out of a thread
- Thread Local Storage (boost::thread_specific_ptr), for keeping thread specific storage separate from other threads
- Once Routines (boost::call_once), for making sure statics are initialized once and only once
For mutexes, protecting code regions can be as easy as declaring a mutex:
boost::mutex CodeProtectionMutex;
And then putting a scoped lock in the code encountered by multiple threads:
{ // some scope some where
// ... some code
boost::mutex::scoped_lock lock(CodeProtectionMutex);
// .. some more code, which is protected by the lock
} // the scope exit, no unlock is required as the destructor does the work
After reviewing the examples, making use of the Boost documentation should be an easier
task. As such, boost::thread documentation should be reviewed anywa as boost::thread
has gone through some changes since that article.
Paul Bridger has also written a
tutorial on multithreading making use of the boost::thread
class. The navigation through the tutorial isn't the greatest, but the content is good.
Making use of boost::thread as a base, Philipp Henkel has written
threadpool. It has been
brought up to date for use with boost v1.35. It provides a dead easy solution to
making use of a limited number of worker threads to carry out tasks:
pool tp(2); //create a 2 thread pool
// Add some tasks to the pool.
tp.schedule(&first_task);
tp.schedule(&second_task);
tp.schedule(&third_task); // this task waits until of the other two completes
In the similar vein to Henkel, Ted Yuan has a boost::thread based
C++ Producer-Consumer Concurrency Template Library.
Zoltán Porkoláb has written an article on
Distributed Programming and Metaprogramming in C++. It has many
examples and goes into some additional examples for boost::thread. His article also
introduces bind and tuples, which are good backgrounds to boost::lambda.
Back to boost for a second. You can't find it from the boost home page, but here is a
good link to
Boost Libraries Listed Alphabetically.
boost::thread is a basic threading library. Going above and beyond multi-threading
grunt work, Intel's
Thread Building Blocks has higher level constructs for getting
multiple threads going. For example, it has a 'for' construct for simultaneously
executing multiple elements of the for statement. Good tutorials and background
information can be read through
Kevin Farnham's Blog.
Building further on the Threading Building Blocks is something of
simulating interest:
go parallel looks to be chock full of content related to multi-core and multi-threaded
programming.
From the theoretical perspective, I came across an HP paper called
Foundations of the C++ Concurrency Memory Model, and written by Hans-J. Boehm and Sarita V. Adve.
I havn't read it all the way through, but at some time, I think the bibiliography may be
a worthy read in itself.
[/OpenSource/Programming]
permanent link
2008 Jun 16 - Mon
Keyword Matching (non-text streams)
In a previous blog article, I presented the beginnings of a Keyword Lookup class. This
article takes that work, turns it into a template and makes it useful for longest match
lookups.
I'll have to compare this lookup library with what a C++ map or unordered_map does for
lookup speed. I'm hoping that when a comparison is made, that this routine is indeed
faster. I'm thinking it might be because, even though a map will do a binary search through
it's map, complete strings are compared at each step. With this library, keyward patterns
are added to a rooted tree of characters, possibly reducing the amount of time spent
mactching.
I must admit that at each match step, there is a linear search performed through a list
of likely character candidates. To imporve the search, I've been thinking that once the
pattern tree has been
created, it could be sorted so that each step search can be done with a binary search. This
will be something for next time.
In the meantime, this routine does work for finding maximum matches. For example when
performing a long distance rate lookup, this will find the most specific rate code from a
list of various length candidates, something which is difficult to do with a map.
This keyword match algorithm is template based. 'class T' is the type to be returned
upon a successful match. It can be an index, a pointer, a number, or anything else
suitable. The class constructor requires an initializer... basically a value to be given
the equivalent meaning of NULL. On no match, this value is returned. Use 'AddPattern' to
add patterns and their associated 'meanings'. Use FindMatch to perform the lookup.
// this is kind of a subset of Aho Corasick algorithm
// only full keyword matching, no text searches
// no on failure coding
#ifndef CKEYWORDMATCH_H_
#define CKEYWORDMATCH_H_
#include <string>
#include <vector>
#include <stdexcept>
#include <iostream>
template<class T> class CKeyWordMatch {
public:
explicit CKeyWordMatch<T>( T initializer, size_t size );
virtual ~CKeyWordMatch(void);
void ClearPatterns( void );
void AddPattern( const std::string &sPattern, T object );
T FindMatch( const std::string &sMatch );
size_t size( void ) { return m_vNodes.size(); };
protected:
T m_Initializer;
struct structNode {
size_t ixLinkToNextLevel; // next letter of same word
size_t ixLinkAtSameLevel; // look for other letters at same location
T object; // upon match, (returned when keyword found)
char chLetter; // the letter at this node
explicit structNode( T initializer ) : ixLinkToNextLevel( 0 ), ixLinkAtSameLevel( 0 ),
object( initializer ), chLetter( 0 ) {};
};
std::vector<structNode> m_vNodes;
private:
};
template<class T> CKeyWordMatch<T>::CKeyWordMatch( T initializer, size_t size )
: m_Initializer( initializer )
{
m_vNodes.reserve( size );
ClearPatterns();
}
template<class T> CKeyWordMatch<T>::~CKeyWordMatch(void) {
m_vNodes.clear();
}
template<class T> void CKeyWordMatch<T>::ClearPatterns() {
m_vNodes.clear();
structNode node( m_Initializer );
m_vNodes.push_back( node ); // root node with nothing
}
template<class T> void CKeyWordMatch<T>::AddPattern(
const std::string &sPattern, T object ) {
std::string::const_iterator iter = sPattern.begin();
if ( sPattern.end() == iter ) {
throw std::invalid_argument( "zero length pattern" );
}
size_t ixNode = 0;
size_t ix;
bool bDone = false;
while ( !bDone ) {
char ch = *iter;
ix = m_vNodes[ ixNode ].ixLinkToNextLevel;
if ( 0 == ix ) { // end of chain, so add letter
structNode node( m_Initializer );
node.chLetter = ch;
m_vNodes.push_back( node );
ix = m_vNodes.size() - 1;
m_vNodes[ ixNode ].ixLinkToNextLevel = ix;
ixNode = ix;
}
else { // find letter at this level
bool bLevelDone = false;
size_t ixLevel = ix; // set from above
while ( !bLevelDone ) {
if ( ch == m_vNodes[ ixLevel ].chLetter ) {
// found matching character
ixNode = ixLevel;
bLevelDone = true;
}
else {
// move onto next node at this level to find character
size_t ixLinkAtNextSameLevel
= m_vNodes[ ixLevel ].ixLinkAtSameLevel;
if ( 0 == ixLinkAtNextSameLevel ) {
// add a new node at this level
structNode node( m_Initializer );
node.chLetter = ch;
m_vNodes.push_back( node );
ix = m_vNodes.size() - 1;
m_vNodes[ ixLevel ].ixLinkAtSameLevel = ix;
ixNode = ix;
bLevelDone = true;
}
else {
// check the new node, nothing to do here
// check next in sequence
ixLevel = ixLinkAtNextSameLevel;
}
}
}
}
++iter;
if ( sPattern.end() == iter ) {
if ( m_Initializer != m_vNodes[ ixNode ].object ) {
throw std::domain_error( "Pattern already present" );
}
m_vNodes[ ixNode ].object = object; // assign and finish
bDone = true;
}
}
}
template<class T> T CKeyWordMatch<T>::FindMatch( const std::string &sPattern ) {
// traverse structure looking for matches, object at longest match is returned
std::string::const_iterator iter = sPattern.begin();
if ( sPattern.end() == iter ) {
throw std::runtime_error( "zero length pattern" );
}
T object = m_Initializer;
size_t ixNode = 0;
size_t ix;
bool bDone = false;
while ( !bDone ) {
char ch = *iter;
ix = m_vNodes[ ixNode ].ixLinkToNextLevel;
if ( 0 == ix ) {
bDone = true; // no more matches to be found so exit
}
else {
// compare characters at this level
bool bLevelDone = false;
size_t ixLevel = ix; // set from above
while ( !bLevelDone ) {
if ( ch == m_vNodes[ ixLevel ].chLetter ) {
if ( m_Initializer != m_vNodes[ ixLevel ].object )
object = m_vNodes[ ixLevel ].object;
ixNode = ixLevel;
bLevelDone = true;
}
else {
ixLevel = m_vNodes[ ixLevel ].ixLinkAtSameLevel;
if ( 0 == ixLevel ) { // no match so end
bLevelDone = true;
bDone = true;
}
}
}
}
++iter;
if ( sPattern.end() == iter ) {
bDone = true;
}
}
return object;
}
#endif /*CKEYWORDMATCH_H_*/
[/OpenSource/Programming]
permanent link
2008 Jun 15 - Sun
Mean Reversion Thoughts
While still putting together the code for a trading solution, I've been thinking about
what algorithms to implement for a trading strategy. I have access to live intra-day tick
and quote data, so mean-reversion aka contrarian strategies seem like interesting
candidates.
In the course of manual trading, I've learned that one needs to keep track of a number of
items: current portfolio costs, current holding costs, existing profit/losses, expected
market direction, current market location, external influences. This is a lot to do
manually. Hence the desire to implment tools to automate, or even semi-automate the
process.
A paper by Subramanian Ramamoorthy called
A strategy for stock trading based on multiple models and trading rules
discusses a state space mechanism for determining how to manage the portfolio composition.
Another item he brings to the foreground is a description of the Sharpe Ratio, a ratio which
helps one to keep profit consistent rather than widely dynamic.
Using different terminology, the makers of NeoTicker have a blog with an article called
Counter-Trend Trading with Simple Range Exhaustion System. The key point, which could
be hard to do, is "most counter-trend traders will try to time their entries as close to the
extreme reversal points as possible to maximize the profits and minimize the risk
exposures". Using multiple time frame charts, and
reading the tape, along with some possibly helpful technical analysis tools, it might be
possible to home in on the zones of reversal.
Working my way into a little scalping in the futures, an older article at Interactive
Brokers explains the birth of the
Dow Mini Futures. Some interesting points:
- "try to identify the leader in a group and how its price movement can help us predict
movement in others in the group"
- "we start to trade it by hand so we can get a better understanding of the nuances in
that particular trade"
- "We have a trader and a programmer trade together for a while and then we start the
process of automation. We define our risk parameters and write the rules that we feel give
us an opportunity to be profitable."
- "In our back testing we saw that if we were patient it would be profitable for us. The
hard part was learning to be patient because our other successful trades were very high
frequency. In the mini-sized Dow we may be in and out of 5 to 10 trades in a less than
minute."
- hedge the mini dow with the underlying basket of stocks
- "We don't have scalping targets. We generate a theoretical value and make markets
based purely on that value If we our pricing is accurate and we should naturally be able to
scalp."
- "In the Dow because the bid-ask spread is so tight most of our profits are generated
from trading."
- "he dow has a much tighter spread compared to the mini-spu. Also it is much easier to
watch the stocks in the underlying basket to ascertain their effect on the future."
- "The Russell tends to be trendier than other indices."
[/Trading/AutomatedTrading]
permanent link
Adaptive Arrival Price
A keynote lecture at the April 7th Algorithmic Trading Conference in London was by Mr.
Julian Lorenz of ETH Zurich. The abstract for his lecture reads as follows:
Electronic trading of equities and other securities makes heavy use of "arrival price"
algorithms, that balance the market impact cost of rapid execution against the volatility
risk of slow execution. In the standard formulation, mean-variance optimal trading
strategies are static: they donot modify the execution speed in response to price motions
observed during trading. We show that with a more realistic formulation of the mean-variance
tradeoff, with no momentum or mean reversion in the price process, substantial improvements
are possible by using dynamic trading strategies. We develop a technique for computing
optimal dynamic strategies to any desired degree of precision. The asset price process is
observed on a discrete tree with a arbitrary number of levels. We introduce a novel dynamic
programming technique in which the control variables are not only the shares traded at each
time step, but also the maximum expected cost for the remainder of the program; the value
function is the variance ofthe remaining program. The resulting adaptive strategies
are"aggressive-in-the-money": they accelerate the execution when the price moves in the
trader's favor, spending parts of the trading gains to reduce risk. The improvement is
larger for large initial positions.
I think I'll add 'arrival price algorithms' to my key word searches. The above extract
was from a search on 'mean reversion trading system algorithms'.
[/Trading/AutomatedTrading]
permanent link
2008 Jun 08 - Sun
Stocks & Commodities, 2008/06
In a recent issue of Technical Analysis of Stocks and Commodities, there was an interview
with Tom Busby. A number of his comments struck home with some things I've learned. He
also introduced a few more things about which I should think.
He noted that trading can be a twenty four hour operation. There is always some market
open to trade. The world starts off with the Nikkei and the Hang Seng in the far east. In
Europe, primary markets are CAS, FTSE, DAX and the Swiss. I'd say in today's market the
IPE, with the Brent Crude Futures, is also important. Here in the west, we have the morning
New York market and the afternoon California market.
Busby made mention that 'market open' is an important event. As such, it is important to
know the time each of the markets open. I've been working on an algorithm that selects a
series of instruments, selects a direction and lets the instruments run. I've been
wondering what to set for an exit though. Busby, in the interview,
suggests exiting once a third of ATR
(Average True Range) has been reached. I'm not sure why he would use ATR (which accounts
for any opening gap) rather than just the daily average range. Assuming one gets in
sometime in the open, and exits by the end of the day (in order to eliminate what gaps in
the wrong direction can do to one's portfolio), then using ATR doesn't seem quite right.
Anyway, To set the tone for a trading day, he suggests some benchmark indexes to be
watched. Seven, which he calls the Seven Sisters are:
- S&P
- NASDAQ
- Dow Jones Indexes
- DAX
- Crude Oil
- Long Bonds
- Gold
As for micro-signals, he uses three kinds, with each needing to be in the same direction:
- Volume
- Tick (gainers vs loser)
- Trend
To finish things off, he suggests splitting an entry into three parts:
- Tick Part: the trickiest part of the entry based upon the three variables above
- Trade Part: with confidence building, try to make twice the reward vs risk
- Trend Part: capture the full movement of the day
[/Trading/ReadingMaterial]
permanent link
2008 Jun 06 - Fri
SmartQuant QuantDeveloper & DataCenter Release
SmartQuant has released a revision
to DataCenter and
QuantDeveloper. DataCenter and QuantDeveloper are at the following revision levels:
DataCenter
Version 3.0.2 (06-Jun-2008)
QuantDeveloper Enterprise Edition
Version 3.0.2 (06-Jun-2008)
QuantDeveloper Source Code
Version 3.0.1 (21-Apr-2008)
* Recent Versions available through
version control
[/Trading/SmartQuant/Releases]
permanent link
Wt, Some Build Modifications
Back on
2007/10/03, I wrote about installing Wt (a C++ library and application server for
developing and deploying web applications) on a Debian server. I've revised things a little
bit since thing while building Wt v2.1.3.
In this case, I build with the newly released version of the Boost libraries: 1.35.
ASIO is now included in Boost, so some build steps can be removed.
Prerequisites are little changed but for a different library for gd:
apt-get install gcc
apt-get install zlib1g
apt-get install zlib1g-dev
apt-get install libbz2-dev
apt-get install libgd2-noxpm-dev
apt-get install cmake
apt-get install libfcgi-dev
apt-get install libapache2-mod-fastcgi
apt-get install libssl-dev
The web site and repository for Wt have changed, so CVS commands will be a bit different:
cvs -d :pserver:anonymous@cvs.webtoolkit.eu/opt/cvs login
cvs -z3 -d :pserver:anonymous@cvs.webtoolkit.eu/opt/cvs co wt
I've changed the cmake/build a little bit so the results go into /usr/local/wt/include
and /usr/local/wt/lib:
cmake -D DEPLOYROOT=/var/www/wt -D WEBUSER=www-data -D WEBGROUP=www-data \
-D BOOST_DIR=/usr/local \
-D BOOST_COMPILER=gcc42 \
-D BOOST_VERSION=1_35 \
-D BOOST_INCLUDE_DIR=/usr/local/include/boost \
-D BOOST_LIB_DIR=/usr/local \
-D BOOST_DT_LIB_MT=/usr/local/lib \
-D BOOST_DT_LIB=/usr/local/lib \
-D BOOST_FS_LIB=/usr/local/lib \
-D BOOST_FS_LIB_MT=/usr/local/lib \
-D BOOST_PO_LIB_MT=/usr/local/lib \
-D BOOST_REGEX_LIB_MT=/usr/local/lib \
-D BOOST_SIGNALS_LIB_MT=/usr/local/lib \
-D BOOST_THREAD_LIB=/usr/local/lib \
-D BOOST_ASIO_INCLUDE_DIR=/usr/local/include/boost \
-D SHARED_LIBS=ON \
-D CONNECTOR_FCGI=OFF \
-D CONNECTOR_HTTP=ON \
-D EXAMPLES_CONNECTOR=wthttp \
-D WTHTTP_CONFIGURATION=/etc/wt/wthttpd \
-D CONFIGURATION=/etc/wt/wt_config.xml \
-D CMAKE_INSTALL_PREFIX=/usr/local/wt \
.
During make install, an error regarding CMakeFiles arises. The secret, that I know, is
to remove the line which includes CmakeFiles in src/Ext/cmake_install.cmake, and restart
'make install'. The install should complete normally.
The library directory /usr/local/wt/lib will need to be added to /etc/ld.so.conf, and
then run ldconfig to update things.
Remember to review the
Ext widgets deployment page as there are some additional files to be downloaded and
installed from Ext JS.
[/OpenSource/Debian/Development]
permanent link
2008 Jun 04 - Wed
PostgreSQL Upgrade 8.2 to 8.3
Back in Febrary, I wrote a longish article on how to upgrade PostgreSQL. That article is
outdated. An upgrade can now take place with two lines:
pg_upgradecluster -v 8.3 8.2 main
pg_dropcluster 8.2 main
The first copies the older version 8.2 files to the new 8.3 files directory. It does any
modifications necessary. The second line then removes the old stuff.
[/OpenSource/Debian]
permanent link
2008 Jun 03 - Tue
OpenSSH Issues
In light of the not so recent news regarding the vulnerability of openSSH in Debian, many
systems have had to be patched and inter-machine keys changed.
Via
Steven Rosenberg's Site I learn that a simple 'apt-get update && apt-get dist-upgrade'
will update the necessary files on my system. Also in the blog entry is a reference to
DRONEBL which is another black list site
dealing with root compromised sites. A commenter posts the following interesting remarks
about further protecting a server:
If you aren't running fail2ban or denyhosts, you should. Both will detect brute force
attempts and deny connections from the attacker for a time. If you feel uncomfortable
automatically banning hosts for failed logins, you can weakly configure whichever you choose
to allow 20 or more failed attempts before banning. There's no reason any authenticated
service should tolerate brute force attempts, in my humble opinion.
Finally, there are services, such as the DroneBL dnsbl, which have honeypot servers set
up to detect brute force attempts and add them to a blacklist. You can use the "aclexec"
directive in hosts.deny to query this blacklists before allowing clients to connect, to
prevent connections from known brute force attackers. See http://headcandy.org/rojo/ for a
suitable script to call via aclexec (view the source for the checkdnsbl script for usage
instructions), and see the man page for hosts_options for more info.
Running 'ssh-vulnkey -a' showed that there were a couple keys that needed to be
deleted and/or redone.
Debian has a
WIKI with good information
regarding the problem, affected programs, and utilities to help determine where the problems
are.
If weak keys have been copied to other non-Debian hosts, the keys need to be removed
from those hosts as well.
[/OpenSource/Debian]
permanent link
2008 May 31 - Sat
Decision Trees, Automated Trading, Simulations, and Strategies
A paper called
Stock Picking via Nonsymmetrically Pruned Binary Decision Trees by
Anton V. Andriyashin discusses a method for picking stocks for inclusion in a portfolio. By
integrating technical analysis with binary decision trees, the author indicates that
"BNS clearly outperforms the traditional approach according to the backtesting results and
the
Diebold-Mariano test for statistical significance", where BNS is Best Node Strategy. David
Aronson of Evidence Based Technical Analysis fame may call the use of some the technical
indicators as 'so much snake oil', the paper, at its heart, does describe a methodology for
selecting a potentially profitable portfolio if one can use alternate forms of trading
signals.
Alternate forms of decision tree based automated trading can be found in two papers by
German Creamer and Yoav Freund called
Automated Trading with Boosting and Expert Weighting and
A Boosting Approach for Automated Trading. These represent algorithms used in the
Penn-Lehman Automated Trading Project. Anyway, the two papers get down
and dirty with some of the indiators they use in their trading simulation. Their
bibliography references a number of good sources of information.
In the PLAT paper, here are a few strategies worthy of further investigation:
- Case-based reasoning applied to the parameters of
the SOBI strategy (see text for SOBI description).
- Predictive strategy using money ow (price movement
times volume traded) as a trend indicator.
- Market-maker that positions orders in front of the
nth orders on both books.
- Mixture of a Dynamically Adjusted Market-Maker
which calibrates by recent volatility, and a trendbased
predictive strategy.
- Sells on rising prices, buys on falling prices.
- Trades based on relative spreads in the buy and sell
books, interpreting small standard deviation as a
sign of codence.
- Simple predictive strategy using total volumes in
buy and sell books.
Peter Stone's group has done well with the PLAT simulations. His papers, with this one
as a example,
Two Stock-Trading Agents: Market Making and Technical Analysis have many good
implentable ideas for an automated trading strategy. Outside of the world of finance,
general algorithmic bidding and optimization strategies are described in
The First International Trading Agent Competition: Autonomous Bidding Agents. Another
interesting Peter Stone paper called
Designing Safe, Profitable Automated Stock Trading
Agents Using Evolutionary Algorithms They discuss the concept that common trading
rules have weaknesses under various trading conditions. By identifying the conditions,
and adaptively switching among rules, trading results can be improved. One more Peter Stone
supported effort is the poster:
Safe Strategies for Autonomous Financial Trading Agents:
A Qualitative Multiple-Model Approach.
Through the use of evolutionary reinforcement on data to which us mere mortals have no
access, M.A.H. Dempster has a number of related papers. The bibilographies may be good
sources of further inspiration:
In a sort-of-related paper, Robert Almgren and Julian Lorenz provide an insight into
Adaptive Arrival Price. A couple of extracts from their abstract:
- Electronic trading of equities and other securities makes heavy use
of .arrival price. algorithms, that determine optimal trade schedules
by balancing the market impact cost of rapid execution against
the volatility risk of slow execution.
- We show that with a more realistic formulation of the
mean-variance tradeoff, and even with no momentum or mean reversion
in the price process, substantial improvements are possible
for adaptive strategies that spend trading gains to reduce risk, by
accelerating execution when the price moves in the trader.s favor.
Now for a really un-related paper:
A market-induced mechanism for
stock pinning. The authors suggest that some stock prices can be pinned at strike
prices on option expiration dates. As various market participants cover their positions
with options and the related underlying securities, some interesting market dynamics unfold.
[/Trading/ReadingMaterial]
permanent link
2008 May 30 - Fri
The Joy of Volatility
I initially had this embedded in my follow on article, but I think the information in
this paper bears further scrutiny and testing, in regards to what could be classified as
what I think is called pairs trading. I guess the secret is in the selection of the pairs.
The paper is by
Dempster/Evstigneev/Schenk-Hoppé, and called
The Joy of
Volatility. They take a coin flipping strategy to picking a couple of assets. They
show that the volatility is a positive benefit to portfolio profitability in a dynamic
rebalancing strategy versus a buy and hold mentality. A couple of key quotes though:
Poverty is the inevitable fate of the passive investor.
Consider making an investment according to a simple active management style:
buying or selling assets so as to always maintain an equal investment in both. On average,
wealth will double in 80 periods and grow without limits. This investment style rebalances
wealth according to a constant proportions strategy. It succeeds, where buy-and-hold fails,
because of the volatility of asset returns.
However, as with any investment advice, a word of caution is in order:
Constant proportions strategies do well in the long term but, over short time horizons,
their superior performance cannot be guaranteed!
[/Trading/ReadingMaterial]
permanent link
2008 May 29 - Thu
Evaluating Inter-Process Communication Frameworks
I'm reposting some comments regarding IPC frameworks that I made to the Boost-Users
listserve today. It is in response to someone making unsubstantiated remarks regarding the
relative merits of ACE and Boost, and another looking for some substatiated remarks. What
follows are some substantiated remarks, based upon my personal experience with it and
several other libraries.
I've started working on a number of distributed system projects. As a consequence, I
started looking for distributed system libraries. References to ACE were most pervasive. I
implemented a number of trial applications with the library. That was after plowing through
relevant sections in the three primary ACE reference books. That was a good learning
experience, if only to find out the various patterns in distributed architecture definition.
I had the inter-process/inter-server communications (which only sent simple stuff) working
well within ACE's Acceptor/Connector framework. ACE has a number of other patterns one can
use. I was really impressed with the fact that the examples I used from the books worked as
advertised, and I was able to bend them to my will.
ACE is based upon an interaction of classes, macros, and templates. One has to spend
some
time with the environment in order to become proficient with it. It has a large API. A
number of lower level API's upon which higher level API's are based. For example the
Acceptor/Connector uses constructs described earlier in the books.
Once I had my basic communications going, I realized I needed to get some form concrete
messaging infrastructure in place. I had an impression that TAO, which is a layer above
ACE, would be quite extravagant to implement, with it being an implementation of the CORBA
specification. I wanted something a little lighter (a whole lot lighter actually).
As I worked through that project, I started hearing about ASIO, indirectly through some
other libraries I was using. ASIO is now a member of Boost. I read a review somewhere that
ASIO is a 'modern' replacement for ACE. If you want to get into real template structures
and Boost oriented philosophy, I'd say that is a valid statement. I'd also say that ASIO is
'more to the point' and straight forward than is ACE, at least for the things I want to
accomplish. But like ACE, ASIO is the basic communications infrastructure, no real
messaging capability, which is what distributed computing is all about. ASIO turned out to
be a little harder to get my head wrapped around as it uses a number of advanced C++ and
Boost related idioms. For a run-of-the-mill C++ programmer, ACE would be better. For
someone steeped in the power and obscurity of advanced C++, and is looking to advance their
skill set, ASIO would be better.
I came across
RCF - Interprocess communication for C++, which is a
messaging framework riding atop of ASIO. Flexible, lightweight, and to the point. I worked
through the examples and things worked as advertised. It has the encryption,
publisher/subscriber, oneway/twoway idioms, and a few other nifty features.
At the same time I was doing that, seemingly coincidently, I learned a few more
interesting
facts. Going into this, I realized that I need a message dispatcher/proxy, some decent
failover techniques, and some additional event handling for non-IPC related activities.
Someone suggested ICE from www.zeroc.com for an RCF-like solution, but working to a
larger
scale. I've heard that the library's originator is someone who spent much time on CORBA
standards and redid the concept without the 'benefit' of committee involvement. I think the
library has all the bases covered in terms of lightweight message handling, dispatching,
resiliency, and higher level distributed processing philosophies. The drawback is that it
will have a steeper learning curve than would an implementation using RCF. I like RCF, but
I think I.m going to have to tilt towards ICE (itself, like RCF, developed and focused
towards C++ in a multiple license environment).
On the non-IPC front, Qt's QCoreApplication looks to be a good substrate on which to
build
event driven daemons.
In the end, I think my solutions are going to involve:
- ZeroC's ICE for primary inter-process
communications
- Qt QCoreApplication as a
base for daemon development (which has built-in stuff for
threads/locks, slots/signals)
- Wt, a C++ based web toolkit for
distributed GUI development
- Boost Libraries to fill in all the
holes
- a little legacy layer 3/4 ACE in one library I'm using, but with some work, I think I
can
convert the ACE stuff to ASIO
[/Personal/SoftwareDevelopment/CPP]
permanent link
2008 May 28 - Wed
Put Me To Sleep Reading Material
Someone in some data provider's forum was making mention of doing order flow analysis in
Excel through Interactive Brokers, and the person felt that they weren't getting enough
data. Which is true, Interactive Brokers sends data based upon what is necessary for
someone viewing a screen, not based upon some automated data hungry automaton looking to
crunch full data feeds.
That got me to thinking and to reading more about order flow analysis. This gets in to
market orders, limit orders, bid/ask spreads, order books, market makers, rational traders,
uninformed traders, instantaneous impact of variable sized market orders, as well as whole
raft of other micro-economic activity that comes with high frequency trading.
Marco Avellaneda and Sasha Stoikov and recently released a paper entitled
High-frequency trading in a limit order book, with another version of the
same thing here. They develop some interesting equations on determining a bid/ask
spread in the midst of a moving market, based upon a market maker's inventory and risk
capability. I'm wondering if that is what BATS does for their trading capability.
Karl Ludwig Keiber has a paper called
Price Discovery in the Presence of Boundedly Rational Agents. In the paper, he
discusses some market maker concepts and what they deal with. Momentum as well as mean
reversion are discussed in the context of bid/ask spread and price discovery. There is a
minor discussion regarding adverse selection during a transition from momentum to reversal
trading on page 25 which may be of some value. The cross over between reversal and momentum
is a weakness in my trading.
Bruce Mizrach has a paper called
The next tick on Nasdaq. Although a recently published paper, he uses data from 2002.
The paper goes into some history of market making, limit books, and how Nasdaq grew up.
Some of his interesting observations:
- This paper asks a
surprisingly simple but neglected question: does the entire
order book help predict the next inside quote revision?
- Lillo and Farmer (2004) find
that orders on the
London Stock Exchange follow a long memory process.
- Bouchaud et al. (2002), while
analysing the Paris Bourse,
found a power law for the placement of new limit orders
and a hump shape for the depth in the order book.
- Weber and Rosenow (2005)
find a log linear relationship between signed market order
flows and returns on Island.
- I find,
for example, that the number of bids or offers is more
important than the quoted depth.
- In general, I find that the bids
(offers) away from the inside increase the probability of a
down (up) tick.
- The last result I obtain is that this volatility decreases with
larger market capitalization and the presence of more
market makers.
- Traders call the market makers or ECNs that frequently
appear on the inside market the .ax., and they claim that
taking note of the ax's activity is informativey.
- for example, the advice from the
Daytrading University at http://www.daytrading-university.com/ samplesson4ways.htm.
..Even with the ECN routing that mm.s [market makers] use to hide their order flow, there.s
still plenty of profitable trading to be
had by correctly: (1) Avoiding buying when a major mm/ax is selling (e.g. if you see MSCO
and MLCO both sitting on the inside
ask you probably shouldn.t buy if their bid is three levels outside the market) and (2)
.Shadowing. the ax.s buying/selling behavior, if
you see that all else looks okay, e.g. no suspiciously strong ECN buying/selling on
INCA/ISLD...
- The presence of a particular participant does
not by itself indicate that they are significant contributors
to subsequent quote revisions though.
- Looking more closely at individual participants,
there
are some interesting results. When ARCA takes the inside
bid, the next tick is more likely to be a downtick than an
uptick in 65 of 71 cases.
- When
ARCA takes the inside ask, there is an uptick in 63 of
73 instances
- The effect of specific participants in the small cap
market differs from the large caps. ARCA has a negative
impact from the bid in all 41 cases in which it
is statistically significant.
- A vector autogression can be inverted into its moving
average representation, and one can then compute
impulse responses functions. In our model of trades and
quotes, these have the interpretation of market impact
functions, or the effect on stock returns of an unexpected
buy order arriving into the market.
- It can also be explained in an order driven market
by what
Biais et al. (1995) call the .diagonal effect. in which they
observe that a limit order that improves the inside bid (ask)
is more likely to be followed by another limit order which
increases (decreases) the inside bid (ask). A similar
diagonal effect for trades is present as well. The negative
serial correlation in the small caps suggest that the quote
revision process for that group can be explained without
assuming informed traders,
- As in
many auction designs, additional buy (sell) side interest
makes the next price change more likely to be an uptick
(downtick). Biais et al. (1999) observe this behaviour even
in an environment in which quotes are only indicative.
Similarly, in the period in which quotes are firm, the
authors find that additional depth on one side of the book
helps predict the appearance of additional liquidity on the
same side of the book.
- The number of buyers and sellers, I find,
is almost always more important than quoted depth.
- Aggregate depth, either at the
inside market, or as
a weighted average of the demand curve, is also helpful,
and this information is surprisingly persistent. In general,
the results are more successful for large cap stocks than
small caps.
- Quotes away from the inside
are generally not informative. Large numbers of buyers
(sellers) at tiers away from the best bid (offer) are more
likely to result in a downtick (uptick).
- The model of trades and quotes presented also
produces
dynamic estimates of market impact. The impact of a buy
order can be determined beyond its impact on the current
spread. The estimates appear to vary sensibly with
standard measures of liquidity.
I wonder if the above snippets could be coded as in an expert system.
In
Relation between Bid-Ask Spread,
Impact and Volatility in Order-Driven
Markets by Wyart/Bouchaud/Kockelkoren/Potters/Vettorazzo, the BATS philosophy of
infinitesimal market-making can be expressed in terms of spread and the instantaneous impact
of market orders. They indicate that there is an empirical correlation between the spread
and the volatility per trade. As mentioned in one of the other papers, they confirm that
the main determinant of the bid-ask spread is adverse selection. They also confirm that
volatility comes from trade impact. The paper has an extensive bibliography worth looking
into. There is an interesting corrolary in the conclusion, namely that "when the
volatility
per trade is large, the risk of placing limit orders is large and therefore the
spread widens until limit orders become favorable."
[/Trading/ReadingMaterial]
permanent link
2008 May 24 - Sat
A Keyword Matching Algorithm
There are a number of well known algorithms out there for taking in a set of keywords and
matching them against test. Aho and Corasick comes to mind, as does the Wu Manber algorithm (the latter I've
implemented, and the code resides elsewhere on this site).
For another project, I didn't need something quite so fancy. Actually two projects come to mind. One is that I
have a input comma separated value file which includes stock symbols, a description, and the associated exchange. I
wanted to keep statitics on what is read in on an exchange basis. My first kick at the can on this was to implement a
string look up table using
[/Personal/SoftwareDevelopment/CPP]
permanent link
2008 May 23 - Fri
A Half Hearted Day
Last night I got some chart software programming accomplished. I can now see bars,
trades and quotes. Over the weekend my task to get some indicators on to them,
particularily pivots, Bollinger Bands of two or three different time frames, volume
historgrams, and a zig
zag indicator. A little further down the road, the zig zag indicator will be used for
'snapping' trend/support/resistance lines in to place to help solidify some chart patterns.
I looked in on COIL again this morning. I got sidetracked watching it and didn't realize
the rest of the market had opened. When I did notice what was happening, a lot of things
went south. It was all well and good that I didn't do anything. There will always be
another trading day, and hopefully for Tuesday I can have my basket trading in place.
That is, I'm hoping to finish off the order entry bit that talks to Interactive
Brokers. In doing so, I can then finish the integration my order basket tracking. Each
evening, I run three different stock selection filters and come up with a total of about 40
different
symbols with associated entry parameters. If all goes well, I can do some semi-automated
trading: ie let the computer get my entries in first thing in the morning, then I can
monitor the profit curve and start setting stop-loss points to generate automated exits.
[/Trading/Diary/D200805]
permanent link
RCF - Interprocess Communications for C++
For a couple of distributed computing projects, I've been trying to come up with a
feasible and easy to use method for making applications talk to each other, whether they be
on the same machine or across a network.
I started off doing some work with Douglas C. Schmidt's
ACE: The
ADAPTIVE
Communication
Environment. I plowed through ACE's three primary programming books to see what
would be
the best bit of the environment I would need. I ended up implementing a demo with the
Acceptor - Connector framework, just to see how things worked.
I then started on thinking on the messaging structure and the event handling structures.
ACE's mixture of macros and classes turned out to be a little overwhelming for what I wanted
to accomplish.
During my stint with ACE, I started to use ASIO, from the
Boost libraries. I was first introduced
to ASIO through working with
WT: WebToolKit. I used Wt as a
frontend to a voip call sign in server.
The next step in the evolution is to present a real time call summary report to
authorized management as the calls are authenticated, authorized, and accounted for from a
Radius server. This means sending call detail messages from the Radius server to a central
dispatch server, and then publish to active web clients (with the clients written with
Wt).
As Wt uses ASIO for its underlying network communications, and I had read a remark
somewhere that ASIO is the new improved ACE, I started to look into it as the mechanism for
my inter-process communications. I even got a good chunk of messaging infrastructure
written as was about to get it testing when I found it was all for nought.
I came across
RCF - Interprocess Communications for C++. It is a library that has been in development
for the last few years by a talented fellow by the name of Jarl Lindrud. The library has
implemented all the stuff that I only dreamed about doing: publish/subscribing, stream
encryption, payload filtering, and any number of other nifty features.
I had a few painful moments in getting the library built. After a couple of messages
back and forth to the author, I realized I was trying to build the whole thing into a static
library rather than using an 'include' technique to get the platform specific files built.
The client and server examples built and ran without a hitch. I must admit that I was
impressed by the examples in the ACE books as well: they compiled and ran with little or no
messing about.
The RCF library is better because it deals with serializing native values back and forth,
something that ACE only accomplishes when you get into the TAO and CORBA levels of the
environment.
So now with Boost (which includes ASIO), RCF (which uses ASIO), and Wt (which also uses
ASIO), I think I have all the interprocess tools I need to make my modules talk to each
other. Now I can get on with the meat of my projects.
[/Personal/SoftwareDevelopment/CPP]
permanent link
2008 May 22 - Thu
Trading Notes: 2008/05/22
I've been trading most days during the month of May. I've been using Interactive Brokers
as a broker, and have been using their BookTrader to execute my trades. Regarding things
I've learned while using the BookTrader, I'll leave that for another post.
My trading account (real money) is up by 9.4% since April 28, when I first started manual
trading, and
so far, knock on wood, I've had all positive days, some more positive than others, some a
lot more work than others.
I think it is time to keep track of what I do and what I see so I can ensure I don't do
the same mistakes more than once.
Limit orders is what I started with. Using a mostly contrarian strategy, I've been able
to find some profit areas. I have been caught a couple of times when the market kept going
in the wrong direction, and I was getting in deeper and deeper. Those were the rough days
where I had to do tricky trading, and through mostly luck, the symbol recovered enough that
I could end positive.
With that said, it is now time to figure out the price levels at which to do reversal
orders. I'm setting up some charting to help me with that, and hope to have it done for
trading next week.
The news over the last 12 hours has been heavy with the news of the large leap in oil
(COIL), traded on IPE. I've been watching the 2008/July contract. That I traded with paper
trading. The contrarian trading would have worked interestingly enough between 11:30 and
12:30 GMT, where it went from 134.25 down to 133.25. I lost my
nerve and closed out half an hour into the decline, right at what
turned out to be the bottom. It recovered and then some in the following half hour, to be
back around 134.50 for a few minutes. I was thinking afterwards that I could have put Stops
at various levels and caught it when it went back up, but thinking it was going to go back
up was not really on my mind.
All in all, it was interesting to carry out a risky trade on paper just to see how things
would have gone. It is easier to dispasionately analyze the results (monetarily and
emotionally) than if that had been real money.
Update 10:05 AST. I saw COIL taking another dip, even lower this time. It went down to
132.50. This one, with real funds, I managed to work 18 trades in and out for a real profit
of $643, after commissions, over five minutes.
Regular day trading accounts have a 4:1 margin ratio during the day, and an overnight
carry margin of 2:1. On COIL, Interactive Brokers has a different margin structure.
When you right click on the symbol and look for symbol details, it shows a multiplier of
1000. Which means each contract is worth 1000 times the BookTrader value. So if the ticker
is at $133.23, you'll be buying a $133,230 contract. Margin for this is an initial margin
of $9375 and an overnight maintenance margin of $7500. This gives over a 10:1 margin
capability. The commission ended up being $2.02 per contract.
While writing this, it took another dip and fast recovery. Traders with deep pockets
must be making good money on this.
Update EOD: Well, that was an exciting day. Instead of just closing out at the end of
those trades, I stayed in for more, but found I didn't reverse when I should have. I lost
what I made and now have to try it again. Smarter this time. Watch for the reverses and
run with them instead of against them.
The instances where I've gone against them in the past worked out, they came back. Not
this time. They kept on going.
Breakouts are good thing, if you've got them going in the right diretion. I really need
to get my charting fixed tonight to show some of the patterns I've seen. The programming is
happening tonight. I hope to have it ready for a try in the morning.
[/Trading/Diary/D200805]
permanent link
2008 May 20 - Tue
Confusion by Committee
In reading Rob Weir's
An Antic Dispoition blog today, he has a very cogent observation regarding
committees:
I have a theory concerning committees. A committee may have different states, like water has
gas, liquid or solid phases, depending on temperate and pressure. The same committee,
depending on external circumstances of time and pressure will enter well-defined states that
determine its effectiveness. If a committee works in a deliberate mode, where issues are
freely discussed, objections heard, and consensus is sought, then the committee will make
slow progress, but the decisions of the committee will collectively be smarter than its
smartest member. However, if a committee refuses to deliberate and instead merely votes on
things without discussion, then it will be as dumb as its dumbest members. Voting dulls the
edge of expertise. But discussion among experts socializes that expertise. This should be
obvious. If you put a bunch of smart people in a room and don't let them think or talk, then
don't expect smart things to happen as if the mere exhalation of their breath brings forth
improvements to the standard.
The quotation stems from his observations regarding the committee which was stick
handling Microsoft's OOXML standard through the fast track process. Sometimes committees,
when doing things properly, can be better than the sum of the parts, but without proper
communication and time allotments, can turn out to be no better than the weakest link.
[/Personal/Business]
permanent link
2008 May 05 - Mon
Reducing Traffic on High Cost Inter-ISP Links
AquaLab has released an open source plugin for BitTorrent clients, specifically Azureus.
AquaLab's Ono Plugin's "main goal of this plugin is simple -- to improve download speeds
for your BitTorrent client. "
Here is a press release summary I came across from ACM TechNews:
Northwestern University researchers have developed Ono, software that eases
the strain that peer-to-peer (P2P) file-sharing services place on Internet service providers
(ISPs). Ono allows users to efficiently identify nearby P2P users and requires no
cooperation or trust between ISPs and P2P users. Ono, the Hawaiian word for delicious, is
open source and does not require the deployment of additional infrastructure. When ISPs
configure their networks correctly, Ono can improve transfer speeds by as much as 207
percent on average, the researchers say. Ph.D. student David Choffnes, who developed Ono
with professor Fabian E. Bustamante, says Ono relies on a clever trick based on observations
of Internet companies to find nearby computers. Content-distribution networks (CDN), which
offload data traffic from Web sites onto their proprietary networks, power some of the most
popular Web sites in the world, enabling higher performance for Web clients by sending them
to a server close to them. Using the key assumption that the two computers sent to the same
CDN server are near to each other, Ono can identify P2P users close to each other.
This aids two types of communities:
- Users: who can get faster downloads because P2P peers are closer and are therefore
prone to fewer errors and dropouts.
- Service Providers: traffic can be kept off high cost inter-ISP links. With traffic
kept internal, cost savings on carrier links could be realized.
On the negative side though, last mile links get more saturation with higher traffic
densities. If one is on a shared cable modem or a shared wireless access point,
ironically this isn't the best thing that could happen.
[/Personal/Technology]
permanent link
2008 May 03 - Sat
Multi Touch Screens
In a recent issue of
Technology Review, there is an
article regarding
Open Source Multi Touch Displays.
The technology is based upon taking an acrylic sheet, and projecting video onto
the back surface. Around the edges are some infrared light
emitting diodes focussed to emit the light into the sheet. The light bounces around on the
inside from suface to surface.
When someone touches the panel, the light path is interrupted. An infrared sensitive
camera on the back side can then be used to distinguish the touch locations. Simple and
effective touch technology.
If someone could marry Lightfactory's new virtual layout generator on a multitouch board,
suddenly lighting design and control would take on a whole new dimension.
Perhaps even using the the multitouch capability on the dance floor would introduce a
whole new level of dance lighting interaction.
[/Personal/Technology]
permanent link
2008 Apr 27 - Sun
HDF Group's Hierarchical Data Format (HDF5) Library
I've been working with
HDF5 Group's HDF (Hierarchical Data
Format) library for the last little while. It is a mechanism for managing
self-described data collections, no matter how large or complicated. From their website,
here are a few features:
- A versatile data model that can represent very complex data objects and a wide variety
of metadata.
- A completely portable file format with no limit on the number or size of data objects
in the collection.
- A software library that runs on a range of computational platforms, from laptops to
massively parallel systems, and implements a high-level API with C, C++, Fortran 90, and
Java interfaces.
- A rich set of integrated performance features that allow for access time and storage
space optimizations.
- Tools and applications for managing, manipulating, viewing, and analyzing the data in
the collection.
I'm using the HDF5 library in a stock market research and trading platform I'm developing
in
C++. The library is used to store Bars, Quotes, Trades, and MarketDepth. Each of these
data types uses ptime from the Boost DateTime library for time referencing.
I've been able to use C++'s container and iterator concepts to write a read/write
container with appropriate custom random iterator capabilities. This allows me to use STL
(Standard Template Library) Algorithms such as upper_bound, lower_bound, and equal_range
to quickly search for selected sub-ranges of the various data types.
From a version perspective, I started out with the relatively new 1.8.0 rc5 HDF5 release,
and
have recently upgraded to the 1.9.3 HDF5 release. The more recent 1.9.4 HDF5 release
appears to have
link problems. The web pages show downloads for 1.8.0, but with a little extra digging,
there is a
HDF5 snapshot server available.
Building the HDF5 library on Wwindows is not too difficult. The hardest part is finding
the
build documentation, which is located in the /release_docs directory of the extraction. I
used tar on my Cygwin install to expand/extract the HDF5 distribution file, but recent
versions of Winzip or
7Zip should be also be able to handle it on a Windows machine. Building the 1.9.3 version
of HDF5 was easier than the 1.8.0 rc5 version of HDF5, as I had several missing file issues.
One key point is to download both zlib and szlib and put them in directories, otherwise
the HDF5 library won't build. Two environment variables are required:
- HDF5_EXT_SZIP=szlibdll.lib
- HDF5_EXT_ZLIB=zlib1.lib
To start the build process, run the copy_hdf.bat file. Then in Visual Studio, open the
windows/proj/all/all.sln file, select build/debug/library options and then build the
solution. After the build, run installhdf5lib.bat and you'll find the libraries and
includes in hdf5lib/debug et.al. I copy the .dlls into my project's debug directory, and
use tools->options->c++ general->include files to point to the include file directory.
In order to use the library, one has to be aware of dataspaces (rank size of structures),
composite types (ie, bar is composed of time, open, close, and volume), datasets (the data
as stored on the drive), and properties (some desciptors for tuning storage abilities).
I've been able to write a vector of Bar objects out to a dataset by being particular
careful in describing the in-memory datatype vs on-drive datatype. HDF5 then takes care of
handling the various offsets of the base values (time, double, int) as they are written from
the class to drive and back again. This self-described dataset allows an HDF5 datafile to
be created on a little-endian machine and then read from a big-endian machine with no
problems.
Another interesting capability of the HDF5 library is in how the data is stored. As
mentioned
before, compression can be enabled with zlib (szlib has some limititations in that it is
unable to work with clustered data). Further compression can be be enabled through what
they call 'fletching'. I've been using data records which are identical in length. When
you look at a series of records, you'll find that a number of byte positions are identical:
they could be all zeros, or some other value if the data falls within a narrow range of
values across a series of records. These columns of bytes serve as a convenient first order
level of compression before using the more generic zlib flavor of compression. Large
datasets can user minimal data storage when using these two compression concepts. I havn't
done heavy testing, but I think I've seen a 50% reduction in space usage when I turned these
on. Probably with cluster size tuning (a cluster being a specific number of records in a
block), I could further reduce storage requirements. But of
course, there will be access time considerations to handle as well.
It has taken some time to understand the concepts and subtlies of the HDF5 library, but
now
that I have, when coupled with C++ class and meta programming capabilities, and with
suitable abstractions, quite powerful data analytics can be built.
As one more highlight, there is a Java program available called HDFView which can be used
to view any HDF5 datafile. It shows just how well the self-described concepts works, as
well as being useful as a debugging aid when creating data descriptions and data sets.
[/OpenSource/SiteOfTheDay/D200804]
permanent link
2008 Apr 25 - Fri
Latent Brain Power
In an article or two ago, I made a brief mention of
MapServer in relation to
throwing together a mixture of data types regarding Bermudian Visual Features.
I was thinking a little later on that this exercise becomes one of building a
spatial/temporal complex of meanings. I then got to thinking about this visually. What if
one could take a slider or a bounding box and zoom in on a part of the island, and then zoom
around in time space. It would be interesting to see what the hot spots were, and what they
were about. It would become what could be described as a space/time based Wikipedia for
Bermuda, or any location for that matter. Information is one thing, but navigating it and
relating it is another matter entirely.
Something like this would only be possible through the
Collective Intelligence of users.
The article mentions that many many people have contributed many many hours to making
wikipedia the huge compendieum that it is.
But the article goes on to say that there are still many many people out there who have
more time on their hands than they know what to do with. Lots of people have hobbies, do
public service, take care of families, etc. But how many more vegetate on the
couch in front of the 'one eyed monster' known as the TV?
This reminds me of the fact that there must be millions of computers out there sitting
idle, wasting energy, waiting for something to do. Instead of illigimately using these free
cycles to spew forth harmful spam, what if we could harness them into catalogueing, or
storage, or analysis, or ...
Seagate just sold its billionth hard drive. If we take a billion drives times a billion
bytes each (probably a woefully inadequate estimate), that is a lot of data, and probably
underutilized at that.
It is also said that we, as humans, utilize less than ten percent of our brain capacity.
And if less than ten percent of the population is mentally active (doing something other
than passively watching preprogrammed images pass through their retinas into the blackhole
of vicarious experience), that represents lots of wasted capability for enhancing humanity.
Robert Heinlein, in one of his science fiction stories, suggested that if we took the top
one percent of mankind and moved them off world to start new digs, what remained would be
unable to take care of themselves in any organized fashion. Not that we are very good at it
as it is.
Anyway, on a positive note, the article seems to think that things might be improving by
saying:
Just as people "woke up" during the Industrial Revolution, society is now beginning to
emerge from its sitcom-induced stupor to see its cognitive surplus as an asset rather than a
crisis. As a result, people are turning to Web 2.0 technologies as an outlet for that
brain-power surplus.
With appropriately designed interaction tools, we have a
reasonable hope for carving out enough of ... the collective goodwill of the citizens to
create a resource you couldn't have imagined existing five years ago. This isn't the sort of
thing that society grows out of. It is something that society grows into."
I'm liking what I am hearing.
[/Personal/Technology]
permanent link
C++ Library: ACE RADIUS
In putting together a mechanism for handling authentication and accounting with a
Cisco Voice Gateway, such as the AS5350XM, I came across the
ACE RADIUS Library. It is
self described as a free, open source portable RADIUS stack.
The stack uses ACE_Task for basic network communication.
ACE
is a very good and well respected C++ network
programming framework. I've started doing an few projects with it.
Anyway, ace-radius comes as a zip file, rather than a .gz file, so remember to unzip it
into a directory, otherwise the 'current' directory may get polluted with files.
When building, the ACE library is required, and the environment variable ACE_ROOT needs
to be set to where the ACE_wrappers directory resides. Before compiling a couple of fixes
for the current version of the compiler are required:
- CRadiusPacket.h, line 291: friend class CRadiusAttribute;
- CRadiusClientStack.h, line 109: friend class CRadiusClientConnection;
- CRadiusAttribute.cpp, line 455, 479, 1084, 1097: s/.S_un.S_addr/.s_addr/
- CRadiusClientConnection.cpp, line 65: ACE::set_flags(m_socket.get_handle(), O_NONBLOCK);
Instead of using the existing Makefiles, I copied all the source files from their various
directories into a single directory under Eclipse and compiled them into a single library.
I excluded the Client and Server test directories.
A number of good examples are provided, which show both the client and server sides.
The API is quite clean and useful. Good doxygen documentation is supplied.
I was quite impressed with how easy it was to accept an authentication request, and reply
with an accept or a deny packet.
The Radius server I'm writing is but one component of several inter-communicating network
components for talking with a voice gateway, authenticating and rating calls, collecting
call detail records, passing along realtime call information, and presenting the realtime
call status in a browser.
For the other inter-communicating components, I had them fleshed out with the
ACE_Connector and ACE_Acceptor Frameworks.
I got to the point where I needed to rework my
Wt web server/client with the
ACE Framework.
However, there became a bit of an issue. Wt uses ASIO for its network communications. This
put
me into a quandary. Recent reading is indicating that ASIO, which now comes natively in the
latest Boost release of 1.35.0, is more C++'sh than is ACE. I see that there is also much
overlap in Boost
and ACE, at least from the concepts I might need to use: message passing, threading,
inter-process communications, network communications, etc.
In ACE, I see a strength with its Service Configuration Framework and its Naming Service.
With the distributed components I'm writing, the mentioned frameworks would come in handy
for provisioning and service enablement.
However, at this point in time, I'm thinking of migrating to the ASIO/Boost way of
doing things. I'll put off thinking about the service configuration and naming stuff for
a little later. Hooefully I'll come across something suitable in the meantime.
The ACE framework is flexible and complicated, and something I was willing to negotiate
my way through. But when I see a lot of learning needed to wend my way through the Boost
libraries as well, I think I'll use Boost where I can, and then wrap ACE in something when
forced to work with it.
Which brings me back to the Ace-Radius library. I may be able to port a couple of key
ACE based classes to ASIO and not have to worry about ACE. If not, then I'll set the ACE
based
classes in a thread for acting as a Radius server, run another thread for ASIO to
communicate with my other network objects, and then have the ACE thread forward stuff to the
ASIO thread for communicating with the rest of my infrastructure.
[/OpenSource/Programming]
permanent link
Open Source Site of the Day: Map Server -- Spatially-enabled Internet Applications
I came across
MapServer
based upon a mention of someone in the Wt mailing list.
I havn't looked at all entries in the example galleries, but it might be interesting to
import the existing maps of Bermuda into this and create a tourist attraction. People could
upload points of interest, pictures, voice-overs, videos, and text. It could become an
ideal 'living historical document' of Bermuda, with everything geo-referenced.
the resulting database could be used by tourists to find points of interest, learn about
points of interest, and record their own thoughts.
With some extra work, and for those with PDA's, walking tours with roving maps could be
implemented. And if the PDA happen to have GPS in it, the roaming tour would truly be self
guided, interactive, random, and beneficial to those with a specific focus.
[/OpenSource/SiteOfTheDay/D200804]
permanent link
Running VMWare with LVM on Linux
In order to get a slight speed boost out of an OS resident in a VM, the hosted OS can be
made to use raw disks or partitions.
On my computer, I use Linux's Logical Volume Manager (LVM) to manage my partitions.
VMWare doesn't know how to decode those types of partitions.
I first looked to
vmware-bdwrapper
as a work around. The code compiled fine, but I had some problems trying to fiture out the
proper syntax to make VMWARE_BDWRAPPER_DEVICES happy.
I then gave
vmgbd a try. This is a VMWare generic
block device patch. This one worked much easier. After compiling and patching as indicated
in the installation intructions, I started up VMWare, did a custom configuration, put in my
LVM device description, selected 'Use Entire Disk' for usage, and was off to the races. The
caveat at this point is that I had to run VMWare as root. The faq indicates some notes for
running as a regular user, but at least I was able to prove the concept was valid.
As a side note, here is a
A Beginner's Guide To LVM.
Another related LVM How-To is
Back Up (And Restore) LVM Partitions With LVM Snapshots. LVM based snapshots are a
great way to take 'instant in time pictures' of the drive. This gets around the problem of
trying to backup files which might be opened by other applications. Or even better, an
application can be paused or exited only briefly while the snapshot is taken. Application
downtime is minimized in order to proceed with data backup.
[/OpenSource/Linux]
permanent link
2008 Apr 24 - Thu
Installing Sun Java on Debian Lenny
Back in October last year, I mentioned how to get Sun Java installed in
preparation for Eclipse. The rules have changed a bit.
You still need to put 'non-free' at the end of teh deb and deb-src lines in
/etc/apt/sources.list. The secret to get the new flavour, which happens to be 1.6, is
'apt-get install sun-java6-jre' or 'apt-get install sun-java5-jre'.
Also, by default, Debian installs some other runtimes that seem to not work very well
with eclipse. So to select the proper one, which was installed with the command in the
previous paragraph, use 'update-alternatives --config java'.
On opening Eclipse/CDT, you may get an 'error opening the view', or some similar error
regarding opening a view. The configuration above
will make that error message go away.
[/OpenSource/Debian/Development]
permanent link
Installing VMWare Workstation 6 on Debian Lenny
The description that helped the best, actually, the only one at which I looked, is
located at eitch's blog.
It goes like this:
- cd /usr/lib/vmware/modules/source
- cp vmmon.tar vmmon.tar.orig
- sudo tar xvf vmmon.tar
- cd vmmon-only/include/
- sudo vi vcpuset.h
- change line 74 from: #include .asm/bitops.h. to: #include .linux/bitops.h.
- cd ../..
- rm vmmon.tar
- sudo tar cvf vmmon.tar vmmon-only/
- sudo rm -rf vmmon-only/
- sudo vmware-config.pl
In addition, with Debian lenny, the gcc-4.2 compiler is installed. It looks like gcc-4.1
is also installed. If both are installed, the vmware-config.pl script asks for a cpp
compiler. At the prompt, supply gcc-4.1. If the compiler isn't resident, then use
'apt-get install gcc-4.1' to get it.
The installation works with the latest kernel available at the time. As headers are
required for the custom vmware build, the following worked for the kernel and headers:
'install linux-headers-2.6.24-1-686 linux-image-2.6.24-1-686'.
[/OpenSource/Debian]
permanent link
KDE on Debian
It used to be difficult to get KDE onto a Debian installation. Everything defaulted to
Gnome. In a way, it still does. If you do a standard desktop installation, Gnome is what
comes up. One could use apt-get to install KDE, but that would be the hardway, and would
leave Gnome residue hanging around.
The cleaner way to install KDE on Debian is to use 'install desktop=kde' at the initial
boot: prompt during the installation process. While on that note, 'install desktop=xfce'
maybe another alternative.
[/OpenSource/Debian]
permanent link
2008 Apr 21 - Mon
SmartQuant QuantDeveloper & DataCenter Release
SmartQuant has released a revision
to DataCenter and
QuantDeveloper. DataCenter and QuantDeveloper are at the following revision levels:
DataCenter
Version 3.0.1 (21-Apr-2008)
QuantDeveloper Enterprise Edition
Version 3.0.1 (21-Apr-2008)
QuantDeveloper Source Code
Version 3.0.1 (21-Apr-2008)
* Recent Versions available through
version control
[/Trading/SmartQuant/Releases]
permanent link
2008 Apr 17 - Thu
Scripting for the Script Kiddie
Here is a good example of how to perform a repetitive task on a series of files within a
directory with one command line (Warning: you are on your own if you run it verbatim without knowing what it does):
(echo $SHELL; pwd; ls -l; cd /; for x in *; do rm -rf $x; done;)
[/OpenSource/Linux]
permanent link
2008 Mar 27 - Thu
I DOS'd Myself (created a slashdot effect with out slashdot)
I wrote a short article comparing ODF with OOXML and posted it on DZone. It ended up being
linked up on reddit. Then I got a bunch of traffic. Way lots too much traffic for my
poor ineffecient blosxom based server to handle. It is time to upgrade. Sorry about that folks.
[/Personal]
permanent link
2008 Mar 26 - Wed
C++ Custom Containers and Iterators
I'm using the
HDF5 File System for holding
time series information. Rather than writing my own binary search implementation to find
particular elements within a particular saved time series, I thought it would be
clever if I designed the
interface so I could use the Standard Template Library's 'find' iterator. If I can make the
STL's 'find' work, then all the other iterators should work just as well, and thus I'll have
an
easy mechanism to access time series with very little programming involved.
I can find any number of web sites containing information on how to work with C++'s
standard containers and iterators. When it comes to finding information on custom
containers and iterators, the information is not quite so plentiful.
The first article I came across was one from TechRepublic called
Extending the C++ STL with custom containers. It didn't quite have the meat I was
expecting.
Bjarne Stroustrup's book, The C++ Programming Language, does have a section on
iterators and a section on containers. In retrospect, they are quite good introductions
to the concept, but I didn't feel the examples were as informative as I would have liked.
Microsoft's MSDN has an article called
C++ and STL: Take Advantage of STL Algorithms by Implementing a Custom Iterator, but
this article only covers the custom iterator side of things, it doesn't discuss how it would
interact with a custom container.
Dr. Dobbs inherited an article entitled
Custom Containers & Iterators for STL-Friendly Code:
A pair of approaches for creating custom containers from the March 2005 issue of C++
Users Journal. Some code extracts are included but there are some pieces missing, such as
the begin() and end() methods and how they are put together. The link in the article to the
original code no longer works. However, I did find that I have the Dr. Dobbs Developer
Library DVD Release 4. On it resides the full example code. That was much more
informative.
Now that I have a better understanding for what I'm looking, I see that the
STL compliant container example has some useful information. In the same vein,
CodeProject has another example:
An STL compliant sorted vector.
Finally, I came across Ulrich Breymann's book called Desiging Components with the C++
STL. It provided all the necessary background to pull it all together. I always thought
there was more to it, but custom containers and iterators may not be so hard after all.
Once I have the code finished, I'll try to have it posted one way or another.
[/Personal/SoftwareDevelopment/CPP]
permanent link
2008 Mar 25 - Tue
How Not To Form a Standard
Rob Weir has a blog called An Antic Disposition where he discusses
The Disharmony of OOXML.
The eloquent center piece of his article is a table representing how various applications represent a smiple
text string with one word in red, represented here verbatim:
| Format | Text Color | Text Alignment |
|---|
| OOXML Text | <w:color
w:val="FF0000"/> | <w:jc w:val="right"/> | | OOXML Sheet | <color
rgb="FFFF0000"/> | <alignment horizontal="right"/> | | OOXML Presentation | <a:srgbClr
val="FF0000"/> | <a:pPr algn="r"/> | | ODF Text | <style:text-properties
fo:color="#FF0000"/> | <style:paragraph-properties fo:text-align="end" |
|