Unsupervised feature selection for anomaly-based network intrusion detection using cluster validity indices.
Abstract
In recent years, there has been a rapid increase in Internet usage, which has in turn led to a
rise in malicious network activity. Network Intrusion Detection Systems (NIDS) are tools
that monitor network traffic with the purpose of rapidly and accurately detecting malicious
activity. These systems provide a time window for responding to emerging threats and
attacks aimed at exploiting vulnerabilities that arise from issues such as misconfigured
firewalls and outdated software.
Anomaly-based network intrusion detection systems construct a profile of legitimate or
normal traffic patterns using machine learning techniques, and monitor network traffic for
deviations from the profile, which are subsequently classified as threats or intrusions. Due
to the richness of information contained in network traffic, it is possible to define large
feature vectors from network packets. This often leads to redundant or irrelevant features
being used in network intrusion detection systems, which typically reduces the detection
performance of the system.
The purpose of feature selection is to remove unnecessary or redundant features in a feature
space, thereby improving the performance of learning algorithms and as a result the
classification accuracy. Previous approaches have performed feature selection via optimization
techniques, using the classification accuracy of the NIDS on a subset of the data
as an objective function. While this approach has been shown to improve the performance
of the system, it is unrealistic to assume that labelled training data is available in operational
networks, which precludes the use of classification accuracy as an objective function
in a practical system.
This research proposes a method for feature selection in network intrusion detection that
does not require any access to labelled data. The algorithm uses normalized cluster validity
indices as an objective function that is optimized over the search space of candidate
feature subsets via a genetic algorithm. Feature subsets produced by the algorithm are
shown to improve the classification performance of an anomaly{based network intrusion
detection system over the NSL-KDD dataset. Despite not requiring access to labelled
data, the classification performance of the proposed system approaches that of efective
feature subsets that were derived using labelled training data.
Collections
Related items
Showing items related by title, author, creator and subject.
-
Incorporating the Canegro sugarcane model into the DSSAT V4 cropping system model framework.
Jones, Matthew Robert. (2013)Canegro is a leading sugarcane crop simulation model and has been used extensively in agronomic research and management. The model has been under development since the late 1980s at the South African Sugarcane Research ... -
Built-in tests for a real-time embedded system.
Olander, Peter Andrew. (1991)Beneath the facade of the applications code of a well-designed real-time embedded system lies intrinsic firmware that facilitates a fast and effective means of detecting and diagnosing inevitable hardware failures. These ... -
An intelligent multi-terminal interface.
Peplow, Roger Charles Samuel. (1987)The document describes the development of a micro-processor based terminal multiplexer to connect four terminals to a standard Hewlett Packard series 1000 mini-computer. The project was required to fulfill the dual roll ...