General security

Configuring a Test Lab for Data Analysis

Frank Siemons
March 14, 2016 by
Frank Siemons

The world of IT Security revolves around data. This data mainly consists of logs that are generated, collected, stored and analyzed. Considering the large amount of data security logs can generate; this can put a huge load on storage and processing resources.

In a professional production environment, this should be well organised, and there should be enough resources to allow for a reliable and well-performing setup.

FREE role-guided training plans

FREE role-guided training plans

Get 12 cybersecurity training plans — one for each of the most common roles requested by employers.

A Security Professional or Security Trainee, however, will need to test and practice data analysis and manipulation within a much smaller environment, with a considerably lower budget and with less internal and external support.


To keep the expenses low and the configuration flexibility high, it is recommended to utilize as much virtualized "hardware" as possible. VMWare ESX(i) is a very good candidate for a test lab. After registering for a free license, this Hypervisor can be installed on relatively inexpensive hardware (mind the supported CPU requirements though). On top of this Hypervisor, many different virtual hosts can be installed and linked together via virtual networks connected to virtual switches. This includes, for instance, Linux and Unix based proxy servers, IDS and Firewall appliances and Splunk and Syslog Servers.

Another option is to use a hosted hypervisor such as VMWare Workstation, Windows Virtual PC or Virtual Box. This will add some complexity to the network setup, however, and it can even limit the configuration options. It is much harder, for instance, to run two isolated networks through a single host, such as in the case of a transparent proxy server which will be discussed now.


There are many free and open source proxy servers available. PfSense (based on FreeBSD) seems to be the easiest to manage, and it has many advanced features built-in, such as Squid Proxy, SSL Inspection, Antivirus and VPN. Remote Syslog is simply a selectable option for the firewall logs, but to export the proxy logs as well, the following line needs to be added to the Custom Options field within the "proxy server" settings: access_log syslog:LOG_LOCAL4 (followed by a restart). For SSL Inspection, which can provide HTTPS proxy logging, the Squid 3 Proxy package needs to be installed via the webUI.

Some other examples of (Linux based) open source proxy servers are ClearOS and CacheGuard. These are more complex to fine-tune, and their interface is less intuitive, but they too will do their job as a virtual proxy server.


Snort is a free, open source Intrusion Prevention and Intrusion Detection System. It runs on Linux with low overhead, and this makes it a great candidate for a virtual TestLab IDS / IPS solution. The installation and configuration of Snort is relatively easy, and the rules are freely available a few weeks after their release by major vendors. Logs are exported to an external logging host via Remote Syslog (rsyslog).


A great starting point for centralized logging and analysis is Splunk Light. This product is free after registration, and it can be easily installed on, for instance, an Ubuntu Server. Considering the host itself is virtualized, this entire server comes at no cost. Another option available is Splunk Enterprise Free. This starts off as a 60-day trial allowing up to 500 MB of data to be indexed per day, but it will provide the option to change to a perpetual Free license after those 60 days.

A Splunk Universal Forwarder is needed to collect the external and internal logs. This forwarder can be installed on the same host as the Splunk Light Search Head (the Web UI) or on a separate host. Configuration and customization of the data feeds and the listening ports of the Splunk Universal Forwarders can all be done via the Splunk Management Web Interface. Most Proxy servers such as PfSense mentioned earlier will send their logs via Remote Syslog (RSyslog). Once configured, the Splunk Universal Forwarder will listen to this Remote Syslog data coming in; it will process it according to a customized format, and it will then forward those into a selected Splunk index.

SIEM - Security Information and Event Management

A simple SIEM solution is almost essential to make it possible to sift through all the large amounts of logs this lab will be able to process if you take into account, for instance, proxy, router, server and internal database events. An SIEM solution can perform correlation and automatically take actions after a detection, based on pre-defined rules or use cases. Professional SIEM solutions such an IBM QRadar, HP ArcSight and McAfee ESM, can be very expensive, and the overhead of these systems can be too much for a small lab environment. Some no-cost open source options are available, however. AlienVault OSSIM is an example of a freely available open source SIEM solution. This can be downloaded and installed as a (single) virtual machine to assist with the event analysis.


Data mining and Data science is a hot topic at the moment. For anyone who wants to delve deeper into this subject, there is the option to run up a Hadoop test server as well. Normally Hadoop runs in a cluster configuration, but this can be reduced to a single node pseudo cluster as well. This single server will then both contain the data node and the name node. To be able to interchange the data between the Splunk server and the Hadoop server, Hadoop Connect can be downloaded and installed. This requires Splunk Enterprise Free to be installed and will not work on Splunk Light due to the requirement of App support within Splunk. Installation manuals and videos are available on the Splunk site.

What should you learn next?

What should you learn next?

From SOC Analyst to Secure Coder to Security Manager — our team of experts has 12 free training plans to help you hit your goals. Get your free copy now.


As mentioned at the beginning of this article, in the end, security is all about data. Once this Testlab has been setup, data needs to be generated. This requires some activity within the virtual network. That activity could be a matter of running the home LAN through the Proxy server, setting up a Honeypot on the perimeter or running simulated attacks within the network. Because of the use of virtual components, the Testlab will be flexible enough to adapt to the required scenarios.

Frank Siemons
Frank Siemons

Frank Siemons is an Australian security researcher at InfoSec Institute. His trackrecord consists of many years of Systems and Security administration, both in Europe and in Australia.

Currently he holds many certifications such as CISSP and has a Master degree in InfoSys Security at Charles Sturt University. He has a true passion for anything related to pentesting and vulnerability assessment and can be found on His Twitter handle is @franksiemons