The Network Forensics Engine of the Future

2019-02-27 Matthias Vallentin
tenzir network forensics

Actionable Insight at your Fingertips

We are excited to announce the release of Tenzir, the next-generation engine for network forensics at scale. Tenzir is made for security operations centers (SOCs) that want to build a high-performance stack for real-time incident response.

Today’s SOCs struggle with managing a massive amounts of network activity. They utilize separate tools for acquiring, streaming, storing, searching, and analyzing data. Eventually, orchestrating this bag of tools becomes too daunting of a task—not only on a large scale when high data volumes overload critical components but also on a small scale when it comes to managing an overly complex deployment. With Tenzir, we end this low-level data wrangling and give SOCs a firm foundation on which to build efficient security workflows, allowing them to tackle the challenges they are designed to: keeping users safe and the network healthy.

Our alpha version release represents the first step towards this vision. It ingests common representations of network activity (Zeek and PCAP), offers a rich-typed query language, and exports the data in a format of choice (JSON, ASCII, CSV, Zeek, PCAP). Sound like a bare-bones SIEM? Actually, that’s not wrong but there are some salient differences:

  • Built for network forensics: our data store is purpose-built to support common queries in the domain, such as checking indicators over the entire time spectrum and not just the last two months.

  • Interactive queries: our multi-level indexing approach delivers sub-second response times over the entire data set - a perfect fit for the explorative workflows of incident responders and threat hunters.

  • High-throughput streaming: we rely on end-to-end streaming to ingest massive amounts of data. Dynamic backpressure ensures that the system does not keel over when stuffing too much data into it.

  • Rich and typed data model: our type-rich data model helps to retain domain semantics during ingestion of the data and also manifests in the query language. All types support meaningful operations, e.g., IP address support top-k prefix search and containers membership queries. Moreover, our typed expression syntax allows you to search over fields having a particular type or attribute.

  • Seamless integrations: we are embedding Tenzir deep into systems for big data analytics, without inefficient JSON over REST APIs.

Use Cases

Tenzir is a data layer for ingesting massive amounts of network activity data, querying it interactively, and optionally post-processing it with more complex analytics downstream. Let’s have a look at a few examples on how you could use Tenzir.

Exploratory Data Analysis

For now, the best way to interact with Tenzir is to use our powerful command line interface. The screencast below shows a common session of importing and exporting data.

The typical deployment of Tenzir is to start a server via tenzir start and then (usually in another terminal) import your data with tenzir import <format>, where format is the data format to ingest. For example, to import a bunch of Zeek logs you could write:

zcat < *.log.gz | tenzir import zeek

You can also ingest a PCAP data:

tenzir import pcap -r trace.pcap # read from trace
tenzir import pcap -i en0        # read from network interface

Each command displays its helptext with -h. The PCAP format supports a few extra options, such as specifying flow cut-off to ignore packets after the first k bytes of the TCP byte stream.

Let’s say you want to check whether you’ve been hit with a certain indicator. Assuming your feed tells you that the IP address 6.6.6.6 is serving malware, you could look for it in several ways:

tenzir export ascii ':addr == 6.6.6.6'  # search all IP address fields
tenzir export ascii 'resp_h == 6.6.6.6' # search Zeek logs with field resp_h

Additionally, you can restrict your search to a specific time range:

tenzir export json '#time > 1 week ago && :addr == 6.6.6.6' # search last week only

The output here would be in line-delimited JSON. Since Tenzir has a flexible data model, you can render the output in many format, even as a Zeek log:

tenzir export -e 10 zeek ':addr == 6.6.6.6'

For an overview of the query language, please have a look at our documentation.

Building Realtime Correlation Systems

Tenzir is a low-level, yet powerful building block that allows users to develop novel security solutions on top. Earlier this month at the DFN Conference in Hamburg, Germany, we gave a talk on how to do exactly this: live correlation of threat intelligence with historical data. The idea is that newly arriving indicators of compromise should automatically trigger lookups in your historical log archive. Why is that useful? Because by the time a new indicator is published, the threat actors may have been wreaking havoc for quite a while already in your network. In fact, it takes an average of 8 months until a complex attack is detected. Therefore, you not only want to feed new indicators into your intrusion detection systems but also correlate them with past activity.

Intel Control

We built a virtual analyst that subscribes to an intelligence feeds, translates newly arriving indicators into Tenzir queries, interprets the results, and publishes them back to the intelligence provider. Here is the actual system architecture:

In this example, we used the Malware Intelligence Sharing Platform (MISP) as our intelligence provider since it is commonly used throughout Europe. The efficient and flexible data plane provided by Tenzir made building this correlation system a breeze, highlighting how we can iterate quickly in order to deliver scalable solutions.

Becoming an Alpha Tester

If you are interested in alpha testing the current version, we’d love to hear from you. Just drop us an email. We ship Tenzir as Docker image or a single binary for other environments as well (e.g., FreeBSD, which we like to use on our dev boxes).

The building block of Tenzir is VAST, which was a dissertation project supervisied by the creator of Zeek. The user interface of VAST is actually identical to Tenzir but lacks some performance optimizations and integrations. However, if you want to get started immediately and don’t have a massive dataset, browse over to our github repository at https://github.com/vast-io/vast.

What’s Coming Next?

We are currently working on the following topics:

  • Integration with big-data ecosystem. After having completed the conceptual phase of this feature, we are now in the middle of implementing native export of Apache Spark, R, and Python/Pandas. We can’t wait to show this feature to you soon.

  • Native NetFlow ingestion. Even though we have long history with the Zeek community, many SOCs still rely heavily on NetFlow data. We are working on a NetFlow collector so that you can just point your gear to us and call it a day.

  • Forensic Readiness for the GDPR. Based in Europe? Then you have to abide by the GDPR. This doesn’t mean forensics is out of scope, quite the opposite: The art is to store the right granularity of data. We are working on on-the-fly anonymization and pseudonymization techniques such that your data backend never stores information it shouldn’t in plain text.

Anything else you’d like to see? Just reach out to us via email or social media. We’d be thrilled to hear from you!