Machine learning and AI

AI for intrusion detection: Conquering the unknown unknowns

Emmanuel Tsukerman
April 3, 2020 by
Emmanuel Tsukerman


“There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know.”

- Former U.S. Secretary of Defense Donald Rumsfeld



Many cybersecurity newcomers view intrusion detection as primarily preventing distant, covert intruders and hackers from entering your network and wreaking havoc. This “firewall” use case is important, but intrusion detection is about so much more.

Learn Cybersecurity Data Science

Learn Cybersecurity Data Science

Build your skills using machine learning and other cutting-edge tools to perform various cybersecurity tasks.

In practice, the list of use cases is a mile long. Just to name a few, it includes:

  • spam filtering (blocking unwanted email)
  • financial fraud (blocking unwanted transactions)
  • insider threat detection (blocking employees from malice)
  • phishing detection (blocking phishing attempts of all forms)

All these use cases, including the firewall use case, have one thing in common: no security system defending against these types of attacks is complete without an ability to catch the unknown unknowns. 


Understanding unknown unknowns


Without such an ability, security systems will never be able to catch the latest dark developments that malicious actors work so hard to invent. To illustrate, consider three quick examples hackers use to avoid vanilla firewalls:

  1. Packet fragmentation – allows hackers to evade pattern matching by fragmenting their packets
  2. Spoofing source IP address – allows hackers to appear like legitimate users to the firewall
  3. Spoofing source port – allows hackers to bypass port-specific rules in the firewall

New methods for avoiding firewalls are constantly being invented, leaving any rule-based security system perpetually lagging behind. This issue, however, is not limited to firewalls.

Similar concerns exist in all other types of intrusion detection. For instance, in insider threat detection, it’s impossible to anticipate all the ways in which a disgruntled employee might cause trouble. Will they exfiltrate data? Destroy it? Falsify it? Introduce malware? Share credentials? 

In financial fraud, it’s difficult to guess what criminals will do. Will they impersonate customers? Or pose as the bank? Overcharge? Forge checks? Or will they introduce a new scheme entirely?

New attacks are constantly being created (new unknown unknown instances), and a reliable method is needed to defend against these. Rule-based systems simply cannot keep up. 


Using machine learning for intrusion detection


Fortunately, there is one proven method, and arguably the only one, for stopping unknown unknowns: an “anomaly detection” system.

By definition, anomaly detection is a form of machine learning architected to solve problems in which many types of undesired behaviors need to be detected, but there’s only a small amount of data for each of these behaviors. At its most extreme application, the algorithm is designed to recognize undesired behaviors in run-time that have never been seen before!

Network intrusions are a perfect example to illustrate anomaly detection. The ways in which an intrusion can take place are endless. An attack can be an:

  • IP sweep
  • password guess
  • portsweep
  • teardrop denial of service (DoS)
  • ping of death DoS
  • Satan probe
  • Phf remote to local attack
  • buffer overflow user to root attack
  • and on and on

It will never be possible to have sufficient data for the growing number of attacks out there. And there are, of course, attacks that haven’t yet been seen — those unknown unknowns.


Importance of anomaly detection


It’s no wonder that AI for intrusion detection keeps making the news, whether it’s to discover phishing attacks, protect against payments fraud, stop data leaks and insider attacks or improve spam detection.

That’s why an anomaly detection component, which allows detection of previously unknown behavior, is an essential and forward-looking solution to intrusion detection. 

Working with one of the top next-generation firewall providers in the world, I’ve designed and architected machine learning components of network intrusion detection systems in a scale and importance rivaled by few. Having this unique experience, I’ve collaborated with Infosec to create a series of courses for security-conscious professionals.

The courses in the Cybersecurity Data Science Learning Path provide guidance on how to implement a beginner-friendly intrusion detection system via anomaly detection. You’ll also learn how to utilize machine learning to defend against malware and perform penetration testing. Check out my learning paths below, and learn the techniques that are revolutionizing cybersecurity!


See Emmanuel's Courses

Emmanuel Tsukerman
Emmanuel Tsukerman

Dr. Tsukerman graduated from Stanford University and UC Berkeley. In 2017, his machine-learning-based anti-ransomware product won Top 10 Ransomware Products by PC Magazine. In 2018, he designed a machine-learning-based malware detection system for Palo Alto Network's WildFire service (over 30,000 customers).

In 2019, Dr. Tsukerman authored the Machine Learning for Cybersecurity Cookbook and launched the Infosec Skills Cybersecurity Data Science Learning Path.