One Unified Global Perspective
Communications with a Global Perspective
Home
Intro
Contact Us
Voice over IP
PBX Solutions
Services
Support
Glossary
Open Source
Blog
Forum

WebMail





2009 May 03 - Sun

Time Series Analysis on RRD Files

Crist Clark, in a posting on the NANOG mailing list, started an interesting thread on analyzing network traffic based upon frequency analysis rather than the traditional time based analysis. He started the thread by asking about Fourier Analysis on network traffic time series. A number of responses indicated that Wavelet Analysis might be the 'more modern' approrach. This type of analysis has been used for Network Traffic Anomoalies Detection. The responses indicate that operating systems can be deduced through analysis of RTD (Round Trip Delay) of ping generated traffic.

The thread started with:

Crist Clark started:

Has anyone found any value in examining network utilization numbers with Fourier analyses? After staring at pretty MRTG graphs for a bit too long today, I'm wondering if there are some interesting periodic characteristics in the data that could be easily teased out beyond, "Well, the diurnal fluctuations are obvious, but looks like we may have some hourly traffic spikes in there too. And maybe some of those are bigger every fourth hour."

Dave Plonka Responded:

Such techniques are used in the are of network anomaly detection. For instance, a search for "network anomaly detection" at scholar.google.com will yield very many results.
Our 2002 paper, "A Signal Analysis of Network Traffic Anomalies" [ACM SIGCOMM Internet Measurement Workshop 2002, Barford, et al.], is one such work. We mention that we use wavelet analysis rather than Fourier analysis because wavelet/framelet analysis is able to localize events both in the frequency and time domains, whereas Fourier analysis would localize the events only in frequency, so an iterative approach (with varying intervals of time) would be necessary. In general, this is the reason why Fourier analysis has not been a common technique used in network anomaly detection.
That work used data stored in RRD files at five minute intervals. Our subsequent work used data stored at one second intervals, again in RRD files.

Anton Kapela had a couple of messages and a link (look for Kapela):

Indeed, there are. Interesting things emerge in frequency (or phase) space - bits/sec, packets/sec, and ave size, etc. - all have new meaning, often revealing subtle details otherwise missed. The UW paper [Barford/Plonka et. al] is one of my favories and often referenced in other publications.
Along similar lines, I presented a lightning talk at nanog that demonstrates using windowed Ft's (mostly Gaussian or Hamming) in three-axis graphs (i.e. 'waterfalls') available in common tools (buadline, sigview, labview, etc) for characterizing round trip times through various network queues and queue states. Unexpectedly, interesting details regarding host IP stacks and OS scheduler behavior became visible.
I want to suggest that time windowed Ft might be a reasonable middle ground, certainly for Crist's case. Naturally, the trade-offs will be in frequency accuracy (ie. longer window) vs. temporal accuracy (ie. short window). Another solution for your needs might be cascaded FIR "bandpass" filters, but again, you're subject to time/frequency error trade-offs as related a filter's bandwidth.
While you're at it, consider processing your time series data into histogram stacks, or nested histograms. I haven't specifically seen a paper covering this, but another UW gent (DW, are you reading this?) used to process their 30 second ifmib data into a raw .ps file, and printed this out weekly/daily. The trends visible here were quite interesting, but I don't think much further work was done to see if anything super-interesting was more/less visible in this form than traditional ones.
... one point - since packets/bits/etc data is more monotonic than not (math wizards, please debate/chime in) and since it's not a 'signal' in the continuous sense, you might find value in differentially filtering the input data *before* FT or wavelet processing. This would serve to remove the weird-looking "DC" offset in the output simply by creating a semi-even distribution of both positive and negative input sample values.



Blog Content ©2009
Ray Burkholder
All Rights Reserved
ray@oneunified.net
(441) 505 7293
Available for Contract Work
Resume

RSS: Click to see the XML version of this web page.

twitter
View Ray 
Burkholder's profile on LinkedIn
technorati
Add to Technorati Favorites



May
Su Mo Tu We Th Fr Sa
         
3
           


Main Links:
Monitoring Server
SSH Tools
QuantDeveloper Code

Special Links:
Frink

Blog Links:
Sergey Solyanik
Marc Andreessen
HotGigs
Micro Persuasion
... Reasonable ...
Chris Donnan
BeyondVC
lifehacker
Trader Mike
Ticker Sense
HeadRush
TraderFeed
Stock Bandit
The Daily WTF
Guy Kawaski
J. Brant Arseneau
Steve Pavlina
Matt Cutts
Kevin Scaldeferri
Joel On Software
Quant Recruiter
Blosxom User Group
Wesner Moise
Julian Dunn
Steve Yegge
Max Dama

2009
Months
May




Mason HQ

Disclaimer: This site may include market analysis. All ideas, opinions, and/or forecasts, expressed or implied herein, are for informational purposes only and should not be construed as a recommendation to invest, trade, and/or speculate in the markets. Any investments, trades, and/or speculations made in light of the ideas, opinions, and/or forecasts, expressed or implied herein, are committed at your own risk, financial or otherwise.