Visualizing Netflow data

This is the first post in a series on visualizing Netflow data. The post starts with some basic Netflow concepts and some guidelines to setup an environment to reproduce the samples in these posts. After this, we'll be using FlowPlotter to create our first visualizations.

What is Netflow?
Netflow data is a recording of all traffic passing a certain network interface or device and can be invaluable during Incident Response and forensic investigations. Unlike full packet captures (FPC), Netflow only contains the meta-data from the network traffic, like:

  • start and end time of the flow

  • source and destination IP's

  • source and destination ports

  • source and destination AS

  • network protocol

  • the next hop the traffic has been sent to

  • number of bytes and packets

The actual data is not part of the Netflow record. So, while it's possible to generate flows from FPC's, you can't do it the other way around. Because Netflow only contains meta-data it's generally about 20 times smaller than FPC, which enables better retention times. And meta-data can still tell a lot about what happened: where did a certain workstation talk to, how much data was sent, what port or protocol was used and a lot more. For instance, a large number of bytes sent using only a relatively small number of packets from a single host over port 53 could raise suspicion.

A Netflow setup usually consists of Netflow exporters and Netflow collectors. The exporter sends the Netflow data over UDP to one or more collectors, which on their turn can distribute the data to other collectors. Netflow exporters are available on numerous network devices, but mostly on the higher end ones. You will not find it on SOHO equipment, but a switch with port mirroring and a Linux (virtual) machine are all you need to get started. Security Onion can be a good starting point for this. If you don't know Security Onion (you should though ;)): it's a full fledged IDS that you can download for free as a pre-configured virtual machine or as an Ubuntu bootstrap.

You also should know that there are multiple variants of Netflow. Cisco created the original specification and lots of vendors created their own implementations like jflow an sflow. But the most important versions are Cisco's Netflow v5 and v9 and IPFIX, the open standard Netflow version by the IETF. It should also be noted that different collectors have their own storage format. The demo's in these posts are based on the SiLK suite and will not work by default on other collector types like the nfdump suite.

Setting up the environment
Security Onion has Argus pre-installed as a Netflow collector. This is a great tool, but it doesn't support IPFIX or Netflow v9 at the moment. At the home office I've installed the SiLK suite on my Onion. SiLK, which stands for the System for Internet-Level Knowledge, is a collection of Netflow tools developed by the CERT/NetSA (Network Situational Awareness) Team to facilitate security analysis in large networks. A very nice guide to install this suite on Security Onion can be found here: http://www.appliednsm.com/silk-on-security-onion/ (replace the version numbers in the wget commands with the latest releases: https://tools.netsa.cert.org/).

For the demo's in this blog post I've installed SiLK on my SIFT workstation. You can use the installation guide for Security Onion mentioned above; following the Preparation and Installation sections should be sufficient the get a running analysis environment. As for demo data I've downloaded and installed the reference data set from the CERT/NetSA site: https://tools.netsa.cert.org/silk/referencedata.html. Installation steps for this are also provided on that page.

If everything is installed correctly you should be able to run a command like this:

rwfilter --start-date=2004/10/04:20 --end-date=2005/01/08:05 --sensor=S0,S1 --type=in,inweb,out,outweb --all-destination=stdout | rwuniq --fields=proto --sort-output --values=records,bytes,packets,stime,etime

The SiLK suite contains several tools to slice and dice your flows. The commands can be chained by piping the results of one tool into the next. The rwfilter tool is the one that let's you select the Netflow records you want to work with. The basic operations include filtering based on time, IP address, CIDR, port, protocol etc. You can select either the set of records that passed or failed the criteria by adding the - -fail or - -pass arguments to the statement. By selecting stdout as the destination you can chain the results to the next rwfilter or a different SiLK tool, like rwuniq to summarize results or rwstats to create TopN or BottomN lists. These tools have numerous options and possibilities, CERT/NetSA supplies a comprehensive set of documentation that describes them all in detail: https://tools.netsa.cert.org/silk/docs.html.

Visualizing Netflow data with FlowPlotter
Although the Netflow data only contains the meta-data there is still a lot of data to plough through. It's often hard to see the big picture or outliers. Visualization can help a lot with that. A simple yet powerful framework to visualize SiLK Netflow data is FlowPlotter. FlowPlotter can create several kinds of charts like geo-maps, line charts, tree maps, time lines, pie charts and more. It takes the standard output of a rwfilter query and creates an HTML page containing a chart of the data.

FlowPlotter can be downloaded from this Github repo: https://github.com/automayt/FlowPlotter and needs to be run from the folder containing the flowplotter.sh script. You basically pipe the rwfilter output in the flowplotter.sh script and supply the chart type and some additional parameters. For instance, if we want to create a line chart of the number of bytes send during each hour, we create a statement like this one:

rwfilter --start-date=2004/10/04:20 --end-date=2005/01/08:05 --sensor=S0,S1 --type=in,inweb,out,outweb --all-destination=../flow.rw

cat ../flow.rw | ./flowplotter.sh linechart 3600 Bytes > ../charts/linechart.html

We take the output from rwfilter and save it in a file. We cat the file to FlowPlotter and tell it to create a line chart of the number of bytes using 3600 second bins. Under the hood FlowPlotter uses the rwuniq statement for this. We send the stdout to an HTML file that can be viewed in a browser:

This chart now shows us the amount of data that has been sent every hour. This way it's easy to spot any unusual peaks in data flows, for instance in the case of data exfiltration.

One of the more interesting charts to analyze outliers is the bubble chart, you can create one with the following statement:

cat ../flow.rw | ./flowplotter.sh bubblechart sIP > ../charts/bubble.html

The bubble chart displays the relationship between the number of bytes and packets in the chart, in this case based on the source IP address:

Most IP addresses are plotted on a straight 45 degree line, representing "normal" traffic. The IP addresses that are not in the vicinity of that line are the interesting ones. IP addresses with a very high number of bytes per packet might indicate data exfiltration, the ones with a low number of bytes per packet might indicate malware beacons. The size of the bubble indicates to the number of flow records.

The next interesting chart is the geo-map, which plots source or destination IP's on a world map. This requires some extra work because we need to supply SiLK with a mapping between IP ranges and countries. You can download this file for free from the MaxMind website:

wget http://geolite.maxmind.com/download/geoip/database/GeoIPCountryCSV.zip

This file then needs to be converted to a SiLK specific format using the following command:

unzip -p GeoIPCountryCSV.zip | rwgeoip2ccmap --csv-input > /usr/local/share/silk/country_codes.pmap

And then use the following command to create the chart:

cat ../flow.rw | ./flowplotter.sh geomap dcc Bytes > ../charts/geo.html

While it's not unusual to see data flowing to countries like China and Russia, it may be worth a look if you see large quantities going that way ;).

Interactive charts
FlowPlotter also provides two interactive charts based on the D3 libraries. While the charts above help you too quickly spot anomalies, these interactive charts can help you while doing some deeper analysis. Although the concept is nice, it's lacking some flexibility in it's current form. During analysis you may want to compare different slices while sifting through data, which requires manual re-generation of the dataset and chart. The charts can still be insightful though. One chart enables you to quickly identify different types of servers in the flow data:

The other interactive chart can display the relations between IP addresses and the top talkers. This can be useful to identify lateral movement and stations involved in data exfiltration.

Customizations
FlowPlotter basically build up in the form of a bash script and a couple of HTML templates. The bash script accumulates the data using SiLK commands and injects the results in the templates. Both parts are easily adaptable to serve your own needs. You can create a dashboard based on these charts and run a cron job refreshing the data at given intervals and put it on a large LCD panel in a SOC.

Conclusion
While FlowPlotter already provides a couple of useful charts out of the box, it's easy to adapt it to your specific needs or even create your own charts. Here is where it's real power lies, providing a low cost and highly flexible solution.

In the next post we'll take a look at FlowBat, a web based suite to facilitate queries on SiLK Netflow data. FlowBat also provides visualizations and dashboard functionality.