2008 Aug 28 - Thu
Console to Cisco Device from FreeBSD
I'm now involved in running an ISP. An existing ISP. One that has been through severval hands already. One that
has many strange corners and alleys. Unknown servers. Inconsistent switch configurations. I could go on, but suffice
to say, my job is to make it all work.... better. And in the mean time, bring my self up to speed on a bunch more
technologies. I've been mostly a Linux hack todate. I get to add FreeBSD and NetBSD to my list. Baby steps first.
The task of connecting a Cisco device to the serial port of a FreeBSD computer and communicating with it turned out
to be quite easy. So for my reference (which I found at
O'Reilly BSD DevCenter):
cu -l /dev/cuaa0 -s 9600
This uses the first serial port at 9600 bits per second. Connection and remote access is as simple as that. To exit
the session:
~.
2008 Aug 22 - Fri
DNS Tools
For my reference, a command for looking at ownership of w.x.y.z:
dig +trace z.y.x.w.in-addr.arpa.
DNS Operations List
DNS List: NameDropers: IETF DNSEXT
working group.
DNS Operatins
2008 Aug 06 - Wed
Labour Saving Devices (Software Wise)
Today's Dr. Dobb's Report via email from Jonathan Erickson has a reference to Michael
Abrash's book
Graphics Programming Black Book.
Based upon the book's table of content, the content covers many interesting algorithms in
and outside of graphics programming.
However, to download the book, one has to download over 76 individual files. What a
pain, especially if one is using Windows.
Actually, it isn't so bad. I manually downloaded the preludes and afterwords. Cygwin
came to the rescue for automating the chapter downloads. By starting a Cygwin shell,
putting the following content in a file called 'getch', and running it, I was able to
automate the download of the 70 individual chapters. For those running Linux or BSD
directly, then this little script can be used directly.
for (( i = 1; i <= 70; i++ ))
do
wget http://www.byte.com/abrash/chapters/gpbb$i.pdf
done
The 'wget' program is a useful tool for downloading web pages without using a browser.
And by creating a for loop with a variable substitution, one can iteratively download each
of the chapters.
2008 Jul 03 - Thu
Upgrade from Eclipse Europa to Ganymede (with painful Subversion)
Today, I upgraded from Eclipse/CDT Europa to Eclipse/CDT Ganymede.
(CDT meaning C++ Developer Tools). The Eclipse upgrade was painless:
download the Eclipse/CDT package, expand it, and start eclipse from
within the directory. After pointing it to my workspace, everything
was there. Nicely simple.
My subversion client was an entirely different story. For the Europa installation,
everything came from the tigris site and worked well. For the Ganymede installation,
there are now two sites involved, and I'm not sure which is what. I think
the tigris site can now be ignored (for the time being). In the
installation instructions somewhere, one needs to go to the
Polarion site for their client. There
are Eclipse update links there.
However, what I assumed to be workable defaults of using the JavaHL library on
Debian turned out to be non-workable. The solution was not to use the JavaHL
client but use the SVNKit client.
My key problem is that my SVN repository requries an ssh public/private key.
The JavaHL library, if or when it would or wouldn't load, I'm not sure what I was
seeing, but I could only see the option for user name and password authentication.
It would have been nice if the Subversion/Polarion/Eclipse guys would all get
together and make it straight-forward in terms of which libraries from
which sites need to be download. If they imply that the JavaHL libraries should
be downloaded, please make it painless to get the paths set and ensure the binaries
are present. After 15 million lines of code, you'd think that would be a
small task to accomplish.
2008 Jun 17 - Tue
Concurrency aka MultiThreading
A number of my projects are approaching the phase where some of their feature sets
will work better with some form of background processing. In a trading application,
plowing through historical data on thousands of symbols looking for patterns would be
best left to a background task, rather than rendering the user-interface frozen during
the, well, duration. For Radius, with a listener on an accounting port, and one on an
authorization port, the two threads need to coordinate access to resources.
Windows has a native threads API, but that isn't necessarily portable between operating systems.
My trading application is in Windows, and the Radius application in on Linux. Using the same API on both
platforms wuold be cool.
As I already use the Boost tools in their various forms, Boost::Thread would be a good candidate.
The Boost documentation isn't exactly overflowing with examples. Dr. Dobb's Portal
saves the day with an article dating from May 2002 entitled
The Boost.Threads Library.
It has good examples covering the basics:
- Thread Creation (boost::thread)
- Mutexes (boost::mutex), which protects one thread from another
- Condition Variables (boost::condition), which are good for getting data into and out of a thread
- Thread Local Storage (boost::thread_specific_ptr), for keeping thread specific storage separate from other threads
- Once Routines (boost::call_once), for making sure statics are initialized once and only once
For mutexes, protecting code regions can be as easy as declaring a mutex:
boost::mutex CodeProtectionMutex;
And then putting a scoped lock in the code encountered by multiple threads:
{ // some scope some where
// ... some code
boost::mutex::scoped_lock lock(CodeProtectionMutex);
// .. some more code, which is protected by the lock
} // the scope exit, no unlock is required as the destructor does the work
After reviewing the examples, making use of the Boost documentation should be an easier
task. As such, boost::thread documentation should be reviewed anywa as boost::thread
has gone through some changes since that article.
Paul Bridger has also written a
tutorial on multithreading making use of the boost::thread
class. The navigation through the tutorial isn't the greatest, but the content is good.
Making use of boost::thread as a base, Philipp Henkel has written
threadpool. It has been
brought up to date for use with boost v1.35. It provides a dead easy solution to
making use of a limited number of worker threads to carry out tasks:
pool tp(2); //create a 2 thread pool
// Add some tasks to the pool.
tp.schedule(&first_task);
tp.schedule(&second_task);
tp.schedule(&third_task); // this task waits until of the other two completes
In the similar vein to Henkel, Ted Yuan has a boost::thread based
C++ Producer-Consumer Concurrency Template Library.
Zoltán Porkoláb has written an article on
Distributed Programming and Metaprogramming in C++. It has many
examples and goes into some additional examples for boost::thread. His article also
introduces bind and tuples, which are good backgrounds to boost::lambda.
Back to boost for a second. You can't find it from the boost home page, but here is a
good link to
Boost Libraries Listed Alphabetically.
boost::thread is a basic threading library. Going above and beyond multi-threading
grunt work, Intel's
Thread Building Blocks has higher level constructs for getting
multiple threads going. For example, it has a 'for' construct for simultaneously
executing multiple elements of the for statement. Good tutorials and background
information can be read through
Kevin Farnham's Blog.
Building further on the Threading Building Blocks is something of
simulating interest:
go parallel looks to be chock full of content related to multi-core and multi-threaded
programming.
From the theoretical perspective, I came across an HP paper called
Foundations of the C++ Concurrency Memory Model, and written by Hans-J. Boehm and Sarita V. Adve.
I havn't read it all the way through, but at some time, I think the bibiliography may be
a worthy read in itself.
2008 Jun 16 - Mon
Keyword Matching (non-text streams)
In a previous blog article, I presented the beginnings of a Keyword Lookup class. This
article takes that work, turns it into a template and makes it useful for longest match
lookups.
I'll have to compare this lookup library with what a C++ map or unordered_map does for
lookup speed. I'm hoping that when a comparison is made, that this routine is indeed
faster. I'm thinking it might be because, even though a map will do a binary search through
it's map, complete strings are compared at each step. With this library, keyward patterns
are added to a rooted tree of characters, possibly reducing the amount of time spent
mactching.
I must admit that at each match step, there is a linear search performed through a list
of likely character candidates. To imporve the search, I've been thinking that once the
pattern tree has been
created, it could be sorted so that each step search can be done with a binary search. This
will be something for next time.
In the meantime, this routine does work for finding maximum matches. For example when
performing a long distance rate lookup, this will find the most specific rate code from a
list of various length candidates, something which is difficult to do with a map.
This keyword match algorithm is template based. 'class T' is the type to be returned
upon a successful match. It can be an index, a pointer, a number, or anything else
suitable. The class constructor requires an initializer... basically a value to be given
the equivalent meaning of NULL. On no match, this value is returned. Use 'AddPattern' to
add patterns and their associated 'meanings'. Use FindMatch to perform the lookup.
// this is kind of a subset of Aho Corasick algorithm
// only full keyword matching, no text searches
// no on failure coding
#ifndef CKEYWORDMATCH_H_
#define CKEYWORDMATCH_H_
#include <string>
#include <vector>
#include <stdexcept>
#include <iostream>
template<class T> class CKeyWordMatch {
public:
explicit CKeyWordMatch<T>( T initializer, size_t size );
virtual ~CKeyWordMatch(void);
void ClearPatterns( void );
void AddPattern( const std::string &sPattern, T object );
T FindMatch( const std::string &sMatch );
size_t size( void ) { return m_vNodes.size(); };
protected:
T m_Initializer;
struct structNode {
size_t ixLinkToNextLevel; // next letter of same word
size_t ixLinkAtSameLevel; // look for other letters at same location
T object; // upon match, (returned when keyword found)
char chLetter; // the letter at this node
explicit structNode( T initializer ) : ixLinkToNextLevel( 0 ), ixLinkAtSameLevel( 0 ),
object( initializer ), chLetter( 0 ) {};
};
std::vector<structNode> m_vNodes;
private:
};
template<class T> CKeyWordMatch<T>::CKeyWordMatch( T initializer, size_t size )
: m_Initializer( initializer )
{
m_vNodes.reserve( size );
ClearPatterns();
}
template<class T> CKeyWordMatch<T>::~CKeyWordMatch(void) {
m_vNodes.clear();
}
template<class T> void CKeyWordMatch<T>::ClearPatterns() {
m_vNodes.clear();
structNode node( m_Initializer );
m_vNodes.push_back( node ); // root node with nothing
}
template<class T> void CKeyWordMatch<T>::AddPattern(
const std::string &sPattern, T object ) {
std::string::const_iterator iter = sPattern.begin();
if ( sPattern.end() == iter ) {
throw std::invalid_argument( "zero length pattern" );
}
size_t ixNode = 0;
size_t ix;
bool bDone = false;
while ( !bDone ) {
char ch = *iter;
ix = m_vNodes[ ixNode ].ixLinkToNextLevel;
if ( 0 == ix ) { // end of chain, so add letter
structNode node( m_Initializer );
node.chLetter = ch;
m_vNodes.push_back( node );
ix = m_vNodes.size() - 1;
m_vNodes[ ixNode ].ixLinkToNextLevel = ix;
ixNode = ix;
}
else { // find letter at this level
bool bLevelDone = false;
size_t ixLevel = ix; // set from above
while ( !bLevelDone ) {
if ( ch == m_vNodes[ ixLevel ].chLetter ) {
// found matching character
ixNode = ixLevel;
bLevelDone = true;
}
else {
// move onto next node at this level to find character
size_t ixLinkAtNextSameLevel
= m_vNodes[ ixLevel ].ixLinkAtSameLevel;
if ( 0 == ixLinkAtNextSameLevel ) {
// add a new node at this level
structNode node( m_Initializer );
node.chLetter = ch;
m_vNodes.push_back( node );
ix = m_vNodes.size() - 1;
m_vNodes[ ixLevel ].ixLinkAtSameLevel = ix;
ixNode = ix;
bLevelDone = true;
}
else {
// check the new node, nothing to do here
// check next in sequence
ixLevel = ixLinkAtNextSameLevel;
}
}
}
}
++iter;
if ( sPattern.end() == iter ) {
if ( m_Initializer != m_vNodes[ ixNode ].object ) {
throw std::domain_error( "Pattern already present" );
}
m_vNodes[ ixNode ].object = object; // assign and finish
bDone = true;
}
}
}
template<class T> T CKeyWordMatch<T>::FindMatch( const std::string &sPattern ) {
// traverse structure looking for matches, object at longest match is returned
std::string::const_iterator iter = sPattern.begin();
if ( sPattern.end() == iter ) {
throw std::runtime_error( "zero length pattern" );
}
T object = m_Initializer;
size_t ixNode = 0;
size_t ix;
bool bDone = false;
while ( !bDone ) {
char ch = *iter;
ix = m_vNodes[ ixNode ].ixLinkToNextLevel;
if ( 0 == ix ) {
bDone = true; // no more matches to be found so exit
}
else {
// compare characters at this level
bool bLevelDone = false;
size_t ixLevel = ix; // set from above
while ( !bLevelDone ) {
if ( ch == m_vNodes[ ixLevel ].chLetter ) {
if ( m_Initializer != m_vNodes[ ixLevel ].object )
object = m_vNodes[ ixLevel ].object;
ixNode = ixLevel;
bLevelDone = true;
}
else {
ixLevel = m_vNodes[ ixLevel ].ixLinkAtSameLevel;
if ( 0 == ixLevel ) { // no match so end
bLevelDone = true;
bDone = true;
}
}
}
}
++iter;
if ( sPattern.end() == iter ) {
bDone = true;
}
}
return object;
}
#endif /*CKEYWORDMATCH_H_*/
2008 Jun 06 - Fri
Wt, Some Build Modifications
Back on
2007/10/03, I wrote about installing Wt (a C++ library and application server for
developing and deploying web applications) on a Debian server. I've revised things a little
bit since thing while building Wt v2.1.3.
In this case, I build with the newly released version of the Boost libraries: 1.35.
ASIO is now included in Boost, so some build steps can be removed.
Prerequisites are little changed but for a different library for gd:
apt-get install gcc
apt-get install zlib1g
apt-get install zlib1g-dev
apt-get install libbz2-dev
apt-get install libgd2-noxpm-dev
apt-get install cmake
apt-get install libfcgi-dev
apt-get install libapache2-mod-fastcgi
apt-get install libssl-dev
The web site and repository for Wt have changed, so CVS commands will be a bit different:
cvs -d :pserver:anonymous@cvs.webtoolkit.eu/opt/cvs login
cvs -z3 -d :pserver:anonymous@cvs.webtoolkit.eu/opt/cvs co wt
I've changed the cmake/build a little bit so the results go into /usr/local/wt/include
and /usr/local/wt/lib:
cmake -D DEPLOYROOT=/var/www/wt -D WEBUSER=www-data -D WEBGROUP=www-data \
-D BOOST_DIR=/usr/local \
-D BOOST_COMPILER=gcc42 \
-D BOOST_VERSION=1_35 \
-D BOOST_INCLUDE_DIR=/usr/local/include/boost \
-D BOOST_LIB_DIR=/usr/local \
-D BOOST_DT_LIB_MT=/usr/local/lib \
-D BOOST_DT_LIB=/usr/local/lib \
-D BOOST_FS_LIB=/usr/local/lib \
-D BOOST_FS_LIB_MT=/usr/local/lib \
-D BOOST_PO_LIB_MT=/usr/local/lib \
-D BOOST_REGEX_LIB_MT=/usr/local/lib \
-D BOOST_SIGNALS_LIB_MT=/usr/local/lib \
-D BOOST_THREAD_LIB=/usr/local/lib \
-D BOOST_ASIO_INCLUDE_DIR=/usr/local/include/boost \
-D SHARED_LIBS=ON \
-D CONNECTOR_FCGI=OFF \
-D CONNECTOR_HTTP=ON \
-D EXAMPLES_CONNECTOR=wthttp \
-D WTHTTP_CONFIGURATION=/etc/wt/wthttpd \
-D CONFIGURATION=/etc/wt/wt_config.xml \
-D CMAKE_INSTALL_PREFIX=/usr/local/wt \
.
During make install, an error regarding CMakeFiles arises. The secret, that I know, is
to remove the line which includes CmakeFiles in src/Ext/cmake_install.cmake, and restart
'make install'. The install should complete normally.
The library directory /usr/local/wt/lib will need to be added to /etc/ld.so.conf, and
then run ldconfig to update things.
Remember to review the
Ext widgets deployment page as there are some additional files to be downloaded and
installed from Ext JS.
2008 Jun 04 - Wed
PostgreSQL Upgrade 8.2 to 8.3
Back in Febrary, I wrote a longish article on how to upgrade PostgreSQL. That article is
outdated. An upgrade can now take place with two lines:
pg_upgradecluster -v 8.3 8.2 main
pg_dropcluster 8.2 main
The first copies the older version 8.2 files to the new 8.3 files directory. It does any
modifications necessary. The second line then removes the old stuff.
2008 Jun 03 - Tue
OpenSSH Issues
In light of the not so recent news regarding the vulnerability of openSSH in Debian, many
systems have had to be patched and inter-machine keys changed.
Via
Steven Rosenberg's Site I learn that a simple 'apt-get update && apt-get dist-upgrade'
will update the necessary files on my system. Also in the blog entry is a reference to
DRONEBL which is another black list site
dealing with root compromised sites. A commenter posts the following interesting remarks
about further protecting a server:
If you aren't running fail2ban or denyhosts, you should. Both will detect brute force
attempts and deny connections from the attacker for a time. If you feel uncomfortable
automatically banning hosts for failed logins, you can weakly configure whichever you choose
to allow 20 or more failed attempts before banning. There's no reason any authenticated
service should tolerate brute force attempts, in my humble opinion.
Finally, there are services, such as the DroneBL dnsbl, which have honeypot servers set
up to detect brute force attempts and add them to a blacklist. You can use the "aclexec"
directive in hosts.deny to query this blacklists before allowing clients to connect, to
prevent connections from known brute force attackers. See http://headcandy.org/rojo/ for a
suitable script to call via aclexec (view the source for the checkdnsbl script for usage
instructions), and see the man page for hosts_options for more info.
Running 'ssh-vulnkey -a' showed that there were a couple keys that needed to be
deleted and/or redone.
Debian has a
WIKI with good information
regarding the problem, affected programs, and utilities to help determine where the problems
are.
If weak keys have been copied to other non-Debian hosts, the keys need to be removed
from those hosts as well.
2008 Apr 27 - Sun
HDF Group's Hierarchical Data Format (HDF5) Library
I've been working with
HDF5 Group's HDF (Hierarchical Data
Format) library for the last little while. It is a mechanism for managing
self-described data collections, no matter how large or complicated. From their website,
here are a few features:
- A versatile data model that can represent very complex data objects and a wide variety
of metadata.
- A completely portable file format with no limit on the number or size of data objects
in the collection.
- A software library that runs on a range of computational platforms, from laptops to
massively parallel systems, and implements a high-level API with C, C++, Fortran 90, and
Java interfaces.
- A rich set of integrated performance features that allow for access time and storage
space optimizations.
- Tools and applications for managing, manipulating, viewing, and analyzing the data in
the collection.
I'm using the HDF5 library in a stock market research and trading platform I'm developing
in
C++. The library is used to store Bars, Quotes, Trades, and MarketDepth. Each of these
data types uses ptime from the Boost DateTime library for time referencing.
I've been able to use C++'s container and iterator concepts to write a read/write
container with appropriate custom random iterator capabilities. This allows me to use STL
(Standard Template Library) Algorithms such as upper_bound, lower_bound, and equal_range
to quickly search for selected sub-ranges of the various data types.
From a version perspective, I started out with the relatively new 1.8.0 rc5 HDF5 release,
and
have recently upgraded to the 1.9.3 HDF5 release. The more recent 1.9.4 HDF5 release
appears to have
link problems. The web pages show downloads for 1.8.0, but with a little extra digging,
there is a
HDF5 snapshot server available.
Building the HDF5 library on Wwindows is not too difficult. The hardest part is finding
the
build documentation, which is located in the /release_docs directory of the extraction. I
used tar on my Cygwin install to expand/extract the HDF5 distribution file, but recent
versions of Winzip or
7Zip should be also be able to handle it on a Windows machine. Building the 1.9.3 version
of HDF5 was easier than the 1.8.0 rc5 version of HDF5, as I had several missing file issues.
One key point is to download both zlib and szlib and put them in directories, otherwise
the HDF5 library won't build. Two environment variables are required:
- HDF5_EXT_SZIP=szlibdll.lib
- HDF5_EXT_ZLIB=zlib1.lib
To start the build process, run the copy_hdf.bat file. Then in Visual Studio, open the
windows/proj/all/all.sln file, select build/debug/library options and then build the
solution. After the build, run installhdf5lib.bat and you'll find the libraries and
includes in hdf5lib/debug et.al. I copy the .dlls into my project's debug directory, and
use tools->options->c++ general->include files to point to the include file directory.
In order to use the library, one has to be aware of dataspaces (rank size of structures),
composite types (ie, bar is composed of time, open, close, and volume), datasets (the data
as stored on the drive), and properties (some desciptors for tuning storage abilities).
I've been able to write a vector of Bar objects out to a dataset by being particular
careful in describing the in-memory datatype vs on-drive datatype. HDF5 then takes care of
handling the various offsets of the base values (time, double, int) as they are written from
the class to drive and back again. This self-described dataset allows an HDF5 datafile to
be created on a little-endian machine and then read from a big-endian machine with no
problems.
Another interesting capability of the HDF5 library is in how the data is stored. As
mentioned
before, compression can be enabled with zlib (szlib has some limititations in that it is
unable to work with clustered data). Further compression can be be enabled through what
they call 'fletching'. I've been using data records which are identical in length. When
you look at a series of records, you'll find that a number of byte positions are identical:
they could be all zeros, or some other value if the data falls within a narrow range of
values across a series of records. These columns of bytes serve as a convenient first order
level of compression before using the more generic zlib flavor of compression. Large
datasets can user minimal data storage when using these two compression concepts. I havn't
done heavy testing, but I think I've seen a 50% reduction in space usage when I turned these
on. Probably with cluster size tuning (a cluster being a specific number of records in a
block), I could further reduce storage requirements. But of
course, there will be access time considerations to handle as well.
It has taken some time to understand the concepts and subtlies of the HDF5 library, but
now
that I have, when coupled with C++ class and meta programming capabilities, and with
suitable abstractions, quite powerful data analytics can be built.
As one more highlight, there is a Java program available called HDFView which can be used
to view any HDF5 datafile. It shows just how well the self-described concepts works, as
well as being useful as a debugging aid when creating data descriptions and data sets.
|