By Kathryn Ash, President, IPCopper, Inc.
It turns out that IT people do get plenty of exercises. From the job description, it sounds like a desk job, but that promise of getting all the answers without leaving the desk hasn’t panned out. Take the example of a small 50 Mbps network – it produces around 10 TB of data per month, given 1/3 utilization over 24/7. That’s only about one hard drive’s worth, so why doesn’t everybody just capture their data in full and reap the benefits of packet capture by solving technical problems, finding security flaws, and, well, getting all the answers? Why does all troubleshooting still start with a ping, just like it did decades ago? The answer is glaringly simple: capturing the packets is easy. Making sense of the data is the hard part.
Take a mundane yet essential security task such as making sure all computers on the corporate campus are using up-to-date SSL. You could check every computer on the network. Or, you could check every packet on the network. The first takes your time and effort. The second is done by a machine: tell the machine to examine every packet to answer two questions: Is it SSL, and, if so, which version?
Making sense of packet capture data unlocks numerous possibilities for managing, monitoring, controlling and securing computer networks, from detecting and keeping tabs on a new device the second it sends out its first ARP to ferreting out zombie computers and alerting when a client computer’s bandwidth utilization suddenly looks more like a server’s. Likewise with identifying servers, tracking which computers checked in with the antivirus update server or even finding out who is sucking up all the bandwidth. This is all in addition to figuring out who is downloading or uploading files to China and what those files contain. It’s all in the packets.
While those terabytes of data may prove to be worth their virtual weight in gold, without the processing power and a system to unlock the value from the packets, they don’t amount to the cost of a hard drive. A single packet capture appliance lacks the oomph needed to extract value from the data – it bottlenecks at either the hard drives or the processor, resulting in long waits for queries, packet loss or both. Distributed packet capture systems, however, aggregate and orchestrate the processing power of multiple machines to blast through hundreds and thousands of terabytes of full packet capture, while capturing new packets at the same time.
In today’s computing environment a distributed system of four to eight machines, even with low-cost processors (yes, even down to yesterday’s desktops), has ample capacity and responsiveness to crunch the load from a 50 Mbps network. To get a one-minute response to a query spanning one month of data, you are looking at a ratio of 43,000:1, that is, one minute to process what took over 43,000 minutes to capture. A low-cost chassis with one regular HDD would deliver about 1 Gbps processing, while an SSD would deliver 5-7 Gbps. A system of eight machines translates to 8 to 56 Gbps raw processing throughput, maybe even, on a really good day, 100 Gbps. That brings the ratio down to around 1000:1. Cutting out the payload would make it possible to take care of that one month of data in 1-2 minutes (and if your software doesn’t do reports on the payload, what’s the use of having them anyway?). The power to process the payload and software to generate reports on the payload, however, gives you that very magical ability to get the answers and solve problems with the data to back it up – without having to hoof it around campus, checking individual computers one by one. Rather than cutting out the payload to speed up queries, software for a good distributed packet capture system multiplies the processing throughput of the hardware 10 to 100 times, making it possible to both capture the payload and get reports spanning one month of full packet data in less than one minute, even with a small set up of only four to eight machines. This is a game changer when it comes to packet capture and managing and monitoring networks, not the least because reports and aggregates take far less storage space than raw packet capture, meaning the sky’s the limit when it comes to the depth and breadth of the reports possible.
Once you get a taste of what a distributed system offers, you can expand it further by adding more hardware to increase the lookback period. This in turn makes it possible to trace problems from the beginning, rather than investigating them mid-stream and attempting to extrapolate – seeing how a problem started brings you a lot closer to seeing how it was triggered, than seeing how it ended. Incidentally, adding more hardware also adds to the available raw processing power, making it possible to do even more in less time – one of the beauties of a distributed system is its affordable scalability.
In addition to getting results and relegating marathons to your free time, you can also add in feeling good about doing your part to combat e-waste. Recycling is always good and saving money by reincarnating old, slow desktops that everyone hates into supercomputers for networking makes you a “green” champion, in more ways than one.
About the Author
Kathryn Ash is the President of IPCopper, Inc., a manufacturer of network appliances based in Portland, Oregon. She has been with the company for over the past decade, guiding the development and marketing of its cutting edge technology for packet capture and analysis, most recently presiding over the debut of its newest product, Lateral Data Processing for Distributed Packet Capture. Email Kathryn at firstname.lastname@example.org or visit http://www.ipcopper.com/.