The Ops Data Choke

Posted by scottk on Apr 14, 2010 in Ramblings |

With all this chatter about big database warehouses and driving deep into business data to bring out insights I think it’s also good to take a look one of the other areas that experienced a huge choke on data and has been running into it for decades. There was a lot of talk these past couple of days on all the application data that is out there because we have new remote sensor devices (like smart phones) that are creating these new streams of  data. Interestingly if you look in your data center you already have platforms that have the potential to generate volumes upon volumes of sensor data, of which traditionally a huge percentage of it is thrown away. The concepts of RRD files and sar data collection exist upon the idea that there is no freakin way you are going to be able to collect all that data and store it long term.

We are now just getting to a point where turning on the pieces that generate system data isn’t so much of a concern. Previously had you been running a dual single core server the idea of setting sar or even SNMP data collection to every minute gave you great pause, percentage wise that’s a lot of cycles for system status instead of the application. With dual quad cores that that kind of concern is starting to pass away and quad will soon be a small number of cores to be running on one processor. Once you had the data, pushing it over a 10Mb line could take up a fairly decent swath of that bandwidth, but we’re at 1Gb now and will most likely see 10Gb as the defacto standard in the near future so our network layer becomes less of a concern. The last hurdle that is standing in the way is a decent way to collect and store this data. It’s not only stoage size for all of this incoming data, it’s also speed to get the constant flow going into the system that needs to be handled well. Additionally we need to be able to pull it out quickly to be notified when change happens to trigger events. There isn’t a clear cut solution to this problem, though I think we are starting to see things that could take it on. If only we could get down to the raw level of the data stored so that when we go back to compare we can look at exact counters instead of min, max and average values over a fifteen minute period. How awesome would it be to use things like K-means clustering to assign fingerprints to servers which could make ops be more predictive about the failure horizons based on seemingly unrelated events to the naked eye. To be able to look back with precision to how much traffic our network was pushing two years ago and look for spikes in sub-minute time frames within that period. To have fine grained enough power usage data to determine what device is plugged into an outlet based on that fingerprint over time.To be able to see when the purchasing of new hardware will out weigh the costs of running legacy hardware based upon very concrete numbers and a model around them. All these things are there and they are not very far away.

Tags: , ,

Reply

Copyright © 2014 SimpIT.com All rights reserved. Theme by Laptop Geek.