2008 Nov 01 - Sat
MinHeap: C++ Template Implementation of Minimum Heap Algorithm
I am in the process of developing a trading application based in C++. I've realized that
it would be useful to test code and algorithms off line, before connecting to a
paper-trading system, or even a live scenario.
For code testing and algorithm testing, I need to send the quotes and trades I've
collected through an order fullfilment simualation engine. As the solution I'm developing
depends upon multiple symbols, I need a synchronization process to ensure the simulated
quotes and trades are processed in the same temporal order as I recieved them in real time.
One option for solving this ordering problem would be to simply merge/sort all the
vectors and process the single vector. I didn't like that idea as it wasn't amenable to
incorporating time based events as they are generated by the fulfillment algorithm.
I ended up designing a 'carrier' to hold each vector. In addition to a few other
house-keeping chores, The carrier manages the index into
the vector for the latest datum to be processed. The various carriers are then linked into
another vector, with the vector sorted in ascending date/time order based upon the current
datum pointed to by the carrier.
My first kick at the can for maintaining the vector of carriers was a brute force sift
down approach, with the sifting occuring after each datum was processed. This could turn
out to be O(n*m/2) time where n is the number of carriers and m is the number of total
datums across all carriers.
After looking at performance figures, I figured I could improve on this somewhat. The
MinHeap algorithm looked like a suitable candidate.
The Standard Template Library implementation of the algorithm uses pops and pushes to get
carriers into and out of the vector. This creates too much overhead. I basically needed a
mechanism to maintain minheap order as the carriers are added to the vector, and then
perform an in-place re-minheap when an datum from a carrier was 'consumed'. Once a carrier
is exhausted of all datums in its vector, the carrier is moved to the end of the carrier
vector with an 'archival' call.
Because, during the process of adding carriers to the vector, they are added at the end
and sifted up, once carrier archival starts to occur, no further carriers can be added. A
flag in the code is used to check for this condition.
The code is written for Visual Studio, but with a few minor mods, will hopefully run with
GCC.
There are two template parameters: C, T. T is the basic type of the carrier, and C is used for determining the
type of the operands for the lt operator in the SiftDown and SiftUp functions. I'm sure there is a more elegant way of handling the
storage of pointers in the carrier vector and trying to perform comparisons of the class represented by the pointer.
I havn't take the time to read up on that yet.
The following is the 'lt' operator of the class C:
static bool lt( CMergeCarrierBase *plhs, CMergeCarrierBase *prhs ) { return plhs->m_dt < prhs->m_dt; };
Here is the rest of the code:
#pragma once
#include <vector>
#include <assert.h>
// http://cis.stvincent.edu/html/tutorials/swd/heaps/heaps.html
// http://en.wikipedia.org/wiki/Binary_heap
// http://en.wikipedia.org/wiki/Heapsort
// http://people.cs.vt.edu/~shaffer/Book/C++2e/progs/minheap.h
// http://www.staroceans.com/minmaxHeap1.htm
// http://www.cppreference.com/wiki/stl/algorithm/is_heap is_heap()
template<class T, class C> class CMinHeap {
public:
CMinHeap<T,C>( size_t size );
CMinHeap<T,C>( void );
virtual ~CMinHeap( void );
void Append( T ); // automatic sift up
T RemoveEnd( void );
T GetRoot( void ) { assert( 0 != m_cntActiveItems ); return m_vT.front(); };
void ArchiveRoot( void ); // root item of no further use, swap to end and sift down
void SiftDown( void ) { SiftDown( 0 ); }; // reorder new root after use and change
bool Empty( void ) { return m_vT.empty(); };
size_t Size( void ) { return m_vT.size(); };
protected:
inline size_t Parent( size_t ix ) { return ( ix - 1 ) / 2; };
inline size_t RightChild( size_t ix ) { return 2 * ( ix + 1 ); };
inline size_t LeftChild( size_t ix ) { return ( 2 * ix ) + 1; };
inline bool isLeaf( size_t ix ) { return LeftChild( ix ) >= m_cntActiveItems; };
inline bool hasOneLeaf( size_t ix ) { return m_cntActiveItems == RightChild( ix ); };
inline size_t ixLastItem( void );
void SiftDown( size_t ix ); // from ix downwards, reordering item
void SiftUp( size_t ix ); // from ix upwards towards root, when appending new items
inline void Swap( size_t ix, size_t iy );
private:
std::vector<T> m_vT;
size_t m_cntActiveItems;
bool m_bArchivalStarted; // prevents further Appends
};
template<class T, class C> CMinHeap<T,C>::CMinHeap(size_t size)
: m_cntActiveItems( 0 ), m_bArchivalStarted( false )
{
m_vT.reserve( size );
}
template<class T, class C> CMinHeap<T,C>::CMinHeap( void )
: m_cntActiveItems( 0 ), m_bArchivalStarted( false )
{
}
template<class T, class C> CMinHeap<T,C>::~CMinHeap() {
}
template<class T, class C> size_t CMinHeap<T,C>::ixLastItem() {
assert( 0 < m_cntActiveItems );
return m_cntActiveItems - 1;
}
template<class T, class C> void
CMinHeap<T,C>::Append( T item ) {
assert( !m_bArchivalStarted );
m_vT.push_back( item );
++m_cntActiveItems;
SiftUp( ixLastItem() );
}
template<class T, class C> T CMinHeap<T,C>::RemoveEnd( void ) {
assert( !m_vT.empty() );
T item = m_vT.back();
m_vT.pop_back();
return item;
}
template<class T, class C> void
CMinHeap<T,C>::ArchiveRoot() {
// swap with last item and SiftDown
assert( 0 < m_cntActiveItems );
m_bArchivalStarted = true;
Swap( 0, ixLastItem() );
--m_cntActiveItems;
if ( 1 < m_cntActiveItems ) { // sift only if 2 or more items
SiftDown();
}
}
template<class T, class C> void
CMinHeap<T,C>::Swap( size_t ix, size_t iy ) {
assert( ix < m_cntActiveItems );
assert( iy < m_cntActiveItems );
T tmp = m_vT.at( ix );
m_vT.at( ix ) = m_vT.at( iy );
m_vT.at( iy ) = tmp;
}
template<class T, class C> void
CMinHeap<T,C>::SiftUp( size_t ix ) {
size_t cur = ix;
size_t parent;
while ( 0 != cur ) {
parent = Parent( cur );
if ( C::lt( m_vT.at( parent ), m_vT.at( cur ) ) ) {
break; // done sifting
}
else { // swap and try again
Swap( cur, parent );
cur = parent;
}
}
}
template<class T, class C> void
CMinHeap<T,C>::SiftDown( size_t ix ) {
size_t cur = ix;
while ( !isLeaf( cur ) ) {
if ( hasOneLeaf( cur ) ) {
size_t left = LeftChild( cur );
if ( C::lt( m_vT.at( left ), m_vT.at( cur ) ) ) {
Swap( left, cur );
cur = left;
}
else {
break;
}
}
else { // has two leaves
size_t right = RightChild( cur );
size_t left = LeftChild( cur );
// right side has shorter distance by default for same or greater
bool bGoRight = !( C::lt( m_vT.at( left ), m_vT.at( right ) ) );
if ( bGoRight ) {
if ( C::lt( m_vT.at( right ), m_vT.at( cur ) ) ) {
Swap( right, cur );
cur = right;
}
else {
break;
}
}
else { // test with left
if ( C::lt( m_vT.at( left ), m_vT.at( cur ) ) ) {
Swap( left, cur );
cur = left;
}
else {
break;
}
}
}
}
}
// if we needed to build a heap from pre-assigned vector:
// for (int i=n/2-1; i>=0; i--) siftdown(i);
|