Snort_AIPreproc/README

622 lines
30 KiB
Plaintext

============================================================================
,,_ ____ _ _ ___
o" )~ / ___| _ __ ___ _ __| |_ / \ |_ _|
'''' \___ \| '_ \ / _ \| '__| __| / _ \ | |
___) | | | | (_) | | | |_ / ___ \ | |
|____/|_| |_|\___/|_| \__| /_/ \_\___|
_ __ _ __ ___ _ __ _ __ ___ ___ ___ ___ ___ ___ _ __
| '_ \| '__/ _ \ '_ \| '__/ _ \ / __/ _ \/ __/ __|/ _ \| '__|
| |_) | | | __/ |_) | | | (_) | (_| __/\__ \__ \ (_) | |
| .__/|_| \___| .__/|_| \___/ \___\___||___/___/\___/|_|
|_| |_|
~ A REALLY smart preprocessor module for Snort ~
by BlackLight <blacklight@autistici.org>, http://0x00.ath.cx
============================================================================
This document describes the AI preprocessor module for Snort.
It also describes how to get it, install it, configure it and use it correctly.
Table of contents:
1. What's Snort AI preprocessor
2. How to get Snort AI preprocessor
3. Installation
3.1 Dependancies
3.2 Configure options
4. Basic configuration
5. Correlation rules
6. Output database
7. Web interface
8. Additional correlation modules
9. Additional documentation
===============================
1. What's Snort AI preprocessor
===============================
Snort AI preprocessor is a preprocessor module for Snort whose purpose is making
the reading of Snort's alerts more comfortable, clustering false positive alarms
emphasizing their root cause in order to reduce log pollution, clustering
similar alerts in function of the type and hierarchies over IP addresses and
ports that can be decided by the user, depending on the kind of traffic and
topology of the network, and constructing the flows of a multi-step attack in
function of correlation rules between hyperalerts provided by the developer
itself, by third parts or created by the user itself, again, in function of the
scenario of the network. It will furthermore possible, in a close future, to
correlate the hyperalerts automatically, by self-learning on the base of the
acquired alerts.
===================================
2. How to get Snort AI preprocessor
===================================
It it strongly suggested to get the latest and always-fresh release of Snort AI
preprocessor from GitHub -> http://github.com/BlackLight/Snort_AIPreproc
git clone git://github.com/BlackLight/Snort_AIPreproc.git
If git is not available on the machine or cannot be used, from the same page you
can also choose "download source" and download the source code in tar.gz format.
===============
3. Installation
===============
The installation procedure is the usual one:
$ ./configure
$ make
$ make install
If you did not install Snort in /usr directory you may need to use the --prefix
option with configure for selecting the directory where you installed Snort (for
example ./configure --prefix=$HOME/local/snort). If the prefix was
specified correctly, and it actually points to the location where Snort was
installed, the module binaries should be placed in
$SNORT_DIR/lib/snort_dynamicpreprocessor after the installation, and
automatically loaded by Snort at the next start. Moreover, a new directory
named corr_rules will be created, in /etc/snort if the prefix was /usr or in
$SNORT_DIR/etc otherwise, containing XML files describing default correlation
rules provided by the developer. This set can be enriched in any moment with new
XML files, provided by third parts or created by the user itself, describing
more hyperalerts.
================
3.1 Dependancies
================
Dependancies required for a correct compilation and configuration:
- pthread (REQUIRED), used for running multiple threads inside of the module. On
a Debian-based system, install libpthread-dev if you don't already have it.
- libxml2 (REQUIRED), used for parsing XML files from corr_rules directory. On a
Debian-based system, install libxml2-dev if you don't already have it.
- libgraphviz (RECOMMENDED), used for generating PNG (and in future PS too)
files representing hyperalert correlation graphs from .dot files
generated from the software. You can remove this dependancy from the
compilation process by specifying --without-graphviz to ./configure, but in
this case you will have .dot files, not easily understandable by a human,
for representing correlation graphs, and you may need an external graph
rendering software for converting them in a more easily readable format. On
a Debian system, install libgraphviz-dev if you don't already have it.
- libmysqlclient (OPTIONAL), used if you want to read alerts information saved
on MySQL DBMS, or enable MySQL support in the module. This option is disabled by
default (if not specified otherwise, the module will read the alerts from Snort
plain log files), and can be enabled by specifying the option
--with-mysql to ./configure. On a Debian-based system you may need to install
libmysqlclient-dev.
- libpq (OPTIONAL), used if you want to read alerts information saved on
PostgreSQL DBMS, or enable PostgreSQL support in the module. This option is
disabled by the default, and can be enabled by specifying the option
--with-postgresql to ./configure. On a Debian-based system you may need to
install libpq-dev.
- A DBMS (RECOMMENDED), MySQL and PostgreSQL are supported for now, for writing
clusters, correlations and packet streams information on a DBMS, making the
analysis easier.
- Perl (RECOMMENDED), used for the CGI script in the web interface that
saves a packet stream associated to an alert in .pcap format, to be analyzed
by tools like tcpdump and Wireshark.
- XML::Simple Perl module (RECOMMENDED), used by 'correlate.cgi' CGI script for
reading and writing manual (un)correlations XML files. A quick way for
installing it on a Unix system is by using CPAN.
- Python 2.6 (OPTIONAL), used for interfacing SnortAI module to Python scripts
through snortai module (see README file in pymodule/) and writing new
correlation modules (see example.py in corr_modules/).
Compile the module passing --with-python option to the ./configure script if you
want this feature. You need Python interpreter and libpython2.6 installed on
your system.
# cpan XML::Simple
=====================
3.2 Configure options
=====================
You can pass the following options to ./configure script before compiling:
--with-mysql - Enables MySQL DBMS support into the module (it requires
libmysqlclient)
--with-pq - Enables PostgreSQL DBMS support into the module (it requires libpq)
--without-graphviz - Disables Graphviz support from the module, avoiding the
generation of PNG or PS files representing hyperalerts correlation as well
======================
4. Basic configuration
======================
After installing the module in Snort installation directory a configuration for
this is required in snort.conf. A sample configuration may appear like the
following:
preprocessor ai: \
alertfile "/your/snort/dir/log/alert" \
alert_bufsize 30 \
alert_clustering_interval 300 \
alert_correlation_weight 5000 \
alert_history_file "/your/snort/dir/log/alert_history" \
alert_serialization_interval 3600 \
bayesian_correlation_interval 1200 \
bayesian_correlation_cache_validity 600 \
cluster ( class="dst_port", name="privileged_ports", range="1-1023" ) \
cluster ( class="dst_port", name="unprivileged_ports", range="1024-65535" ) \
cluster ( class="src_addr", name="local_net", range="192.168.1.0/24" ) \
cluster ( class="src_addr", name="dmz_net", range="155.185.0.0/16" ) \
cluster ( class="src_addr", name="vpn_net", range="10.8.0.0/24" ) \
cluster ( class="dst_addr", name="local_net", range="192.168.1.0/24" ) \
cluster ( class="dst_addr", name="dmz_net", range="155.185.0.0/16" ) \
cluster ( class="dst_addr", name="vpn_net", range="10.8.0.0/24" ) \
cluster_max_alert_interval 14400 \
clusterfile "/your/snort/dir/log/clustered_alerts" \
corr_modules_dir "/your/snort/dir/share/snort_ai_preproc/corr_modules" \
correlation_graph_interval 300 \
correlation_rules_dir "/your/snort/dir/etc/corr_rules" \
correlated_alerts_dir "/your/snort/dir/log/correlated_alerts" \
correlation_threshold_coefficient 0.5 \
database ( type="dbtype", name="snort", user="snortusr", password="snortpass", host="dbhost" ) \
database_parsing_interval 30 \
hashtable_cleanup_interval 300 \
manual_correlations_parsing_interval 120 \
max_hash_pkt_number 1000 \
neural_clustering_interval 1200 \
neural_network_training_interval 43200 \
neural_train_steps 10 \
output_database ( type="dbtype", name="snort", user="snortusr", password="snortpass", host="dbhost" ) \
output_neurons_per_side 20 \
tcp_stream_expire_interval 300 \
use_knowledge_base_correlation_index 1 \
use_stream_hash_table 1 \
webserv_banner "Snort AIPreprocessor module" \
webserv_dir "/prefix/share/htdocs" \
webserv_port 7654
The options are the following:
- alertfile: The file where Snort saves its alerts, if they are saved to a file
and not to a database (default if not specified: /var/log/snort/alert)
- alert_correlation_weight: When this number of alert is stored in the "memory"
of the software (i.e. in the alert history file or in the output database), the
weight for the heuristical correlation indexes (bayesian network and neural
network) will be more or less equal to 0.95, on a scale from 0 to 1.
This parameter expresses how much the heuristical indexes should be weighted and
it can be considered like a kind of "learning rate" for the alert correlation
algorithm (default value if not specified: 5000)
- alert_history_file: The file keeping track of the history, in binary format,
of all the alerts received by the IDS, so that the module can build some
statistical correlation inferences over the past
- alert_serialization_interval: The interval that should occur from a
serialization of a buffer of alerts on the history file and the next one
(default if not specified: 1 hour, as it is a quite expensive operation in terms
of resources if the system received many alerts)
- alert_bufsize: Size of the buffer containing the alerts to be sent, in group,
to the serializer thread. The buffer is sent when full and made empty even
when the alert_serialization_interval parameter is not expired yet, for
avoiding overflows, other memory problems or deadlocks (default value if
not specified: 30)
- alert_clustering_interval: The interval that should occur from the clustering
of the alerts in the log according to the provided clustering hierarchies and
the next one (default if not specified: 300 seconds)
- bayesian_correlation_interval: Interval, in seconds, that should occur between
two alerts in the history for considering them as, more or less strongly,
correlated (default: 1200 seconds). NOTE: A value of 0 will disable the bayesian
correlation. This setting is strongly suggested when your alert log is still
"learning", i.e. when you don't have enough alerts yet. After this period, you
can set the correlation interval to any value.
- bayesian_correlation_cache_validity: interval, in seconds, for which an entry
in the bayesian correlation hash table (i.e. a pair of alerts with the
associated historical bayesian correlation) is considered as valid
before being updated (default: 600 seconds)
- corr_modules_dir: This software supports a kind of plugins, or "modules over
the module", that allow the user to specify some extra correlation rules and
indexes. These modules are .so files placed in this directory (default if not
specified: PREFIX/share/snort_ai_preproc/corr_modules), dynamically loaded by
the module. For more information on how to write your own module, see the
dedicated section in this file.
- correlation_graph_interval: The interval that should occur from the building
of the correlation graph between the clustered alerts and the next one (default
if not specified: 300 seconds)
- correlation_rules_dir: Directory where the correlation rules are saved, as XML
files (default if not specified: /etc/snort/corr_rules)
- correlated_alerts_dir: Directory where the information between correlated
alerts will be saved, as .dot files ready to be rendered as graphs and, if
libgraphviz support is enabled, as .png and .ps files as well (default if not
specified: /var/log/snort/clustered_alerts)
- correlation_threshold_coefficient: The threshold the software uses for stating
two alerts are correlated is avg(correlation coefficient) + k *
std_deviation(correlation_coefficient). The value of k is specified through this
option, whose value is 0.5 by default, and is dependant on how "sensible" you
want the correlation algorithm. A value of k=0 means "consider all the couples
of alerts whose correlation coefficient is greater than the average one as
correlated" (negative values of k are allowed as well, but unless you know what
you're doing they're unrecommended, as some correlation constraints may appear
where no correlation exists). When the value of k raises also the threshold for
two alerts for being considered as correlated raises. A high value of k may just
lead to an empty correlation graph
- clusterfile: File where the clustered alerts will be saved by the module
(default if not specified: /var/log/snort/clustered_alerts)
- cluster_max_alert_interval: Maximum time interval, in seconds, occurred
between two alerts for considering them as part of the same cluster (default:
14400 seconds, i.e. 4 hours). Specify 0 for this option if you want to
cluster alerts regardlessly of how much time occurred between them
- cluster: Clustering hierarchy or list of hierarchies to be applied for
grouping similar alerts. This option needs to specify:
-- class: Class of the cluster node. It may be src_addr, dst_addr, src_port
or dst_port
-- name: Name for the clustering node
-- range: Range of the clustering node. It can include a single port or IP
address, an IP range (specified as subnet x.x.x.x/x), or a port
range (specified as xxx-xxx)
- database: If Snort saves its alerts to a database and the module was compiled
with database support (e.g. --with-mysql) this option specifies the
information for accessing that database. The fields in side are
-- type: DBMS to be used (so far MySQL and PostgreSQL are supported)
-- name: Database name
-- user: Username for accessing the database
-- password: Password for accessing the database
-- host: Host holding the database
- database_parsing_interval: The interval that should occur between a read of
the alerts from database and the next one (default if not specified: 30 seconds)
- hashtable_cleanup_interval: The interval that should occur from the cleanup of
the hashtable of TCP streams and the next one (default if not specified: 300
seconds). Set this option to 0 for performing no cleanup on the stream hash
table
- max_hash_pkt_number: Maximum number of packets that each element of the stream
hash table should hold, set it to 0 for no limit (default value if not
specified: 1000)
- manual_correlations_parsing_interval: Interval in seconds between an execution
of the thread for parsing the alert correlations manually set and the next one
(default value if not specified: 120 seconds)
- neural_clustering_interval: Interval in seconds between an execution of the
thread for clustering (using k-means) the alerts on the output layer of the
neural network in order to recognize likely attack scenarios, and the next one.
Set this to 0 if you want no clusterization (default if not specified: 1200
seconds)
- neural_network_training_interval: Interval in seconds between an execution of
the thread for training the neural network using the set of recent alerts and
the next one (default if not specified: 43200 seconds)
- neural_train_steps: Number of steps to take in each training cycle for the
neural network (default: 10)
- output_database: Specify this option if you want to save the outputs from the
module (correlated alerts, clustered alerts, alerts information and their
associated packets streams, and so on) to a relational database as
well (by default the module only saves the alerts on static plain files). The
options here are the same specified for the 'database' option.
The structure of this database can be seen in the files schemas/*.sql (replace
to * the name of your DBMS). If you want to initialize the tables needed by the
module, just give the right file to your database, e.g. for MySQL
$ mysql -uusername -ppassword dbname < schemas/mysql.sql
- output_neurons_per_side: Number of output neurons per side on the output layer
of the neural network (that is a rectangular matrix). A higher number allows a
higher granularity over similar alerts, but a linear increment of this value
produces a squared increment of the computational complexity for the training
and evaluation algorithms (default value if not specified: 20)
- tcp_stream_expire_interval: The interval that should occur for marking a TCP
stream as "expired", if no more packets are received inside of that and it's not
"marked" as suspicious (default if not specified: 300 seconds)
- use_knowledge_base_correlation_index: Set this option to 0 if you do not want
to use the knowledge base alert correlation index (default value if not
specified: 1)
- use_stream_hash_table: Set this option to 0 if you do not want to use the
hash table for storing the streams of packets associated to alerts, this is a
good choice on a system where many alerts are triggered (default value if not
specified: 1)
- webserv_banner: Banner of the web server, to be placed on the error pages and
in the "Server" HTTP reply header
- webserver_dir: Directory containing the contents of the web server running
over the module (default if none is specified:
$PREFIX/share/snort_ai_preprocessor/htdocs)
- webserver_port: Port where the web server will listen (default if none is
specified: 7654). Set this value to 0 if you don't want to run the web server
over the module for having the web interface (in this case, if you want to see
the web graphical visualization of the alerts, you should manually copy the
files contained in htdocs/ in a web server directory)
====================
5. Correlation rules
====================
The hyperalert correlation rules are specified in $SNORT_DIR/etc/corr_rules
directory through a very simple XML syntax, and any user can add some new ones.
The files in there must be named like the Snort alert ID they want to model. For
example, if we want to model a TCP portscan alert (Snort ID: 122.1.0) as a
hyperalert, i.e. as an alert with pre-conditions and post-conditions to be
correlated to other alerts, then we need to create a file named 122-1-0.xml
inside $SNORT_DIR/etc/corr_rules with a content like the following:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE hyperalert PUBLIC "-//blacklighth//DTD HYPERALERT SNORT MODEL//EN" "http://0x00.ath.cx/hyperalert.dtd">
<hyperalert>
<snort-id>122.1.0</snort-id>
<desc>(portscan) TCP Portscan</desc>
<pre>HostExists(+DST_ADDR+)</pre>
<post>HasService(+DST_ADDR+, +ANY_PORT+)</post>
</hyperalert>
The <desc> tag is optional, same for <pre> and <post> if an alert has no
pre-conditions and/or post-conditions, while the <snort-id> tag is mandatory for
identifying the hyperalert. In this case we say that the pre-condition for a TCP
portscan for being successful is that the host +DST_ADDR+ exists (the macro
+DST_ADDR+ will automatically be expanded at runtime and substituted
with the target address of the portscan). The post-condition of a
portscan consists in the attacker knowing that +DST_ADDR+ runs a service on
+ANY_PORT+ (+ANY_PORT+ is another macro that will be expanded on runtime). The
hyperalerts model in corr_rules are the knowledge base used for correlating
alerts triggered by Snort, the more information it has inside, the more accurate
and complete the correlation will be. The macros recognized and automatically
expanded from these XML files are
- +SRC_ADDR+: The IP address triggering the alert
- +DST_ADDR+: The target IP address in the alert
- +SRC_PORT+: The port from which the alert was triggered
- +DST_PORT+: The port on which the alert was triggered
- +ANY_ADDR+: Identifies any IP address
- +ANY_PORT+: Identifies any port
The correlation between two alerts A and B is made comparing the post-conditions
of A with the pre-conditions of B. If the correlaton coefficient computed in
this way is greater than a certain threshold (see "Basic configuration ->
correlation_threshold_coefficient") then the alerts are marked as
correlated, i.e. the alert A determines the alert B. This supports an elementary
reasoning algorithm for doing inferences on the conditions. For example, if A
has the post-condition "HasService(+DST_ADDR+, +ANY_PORT+)" and B has the
pre-condition "HasService(+DST_ADDR, 22)", a match between A and B is triggered.
Same if A has "HostExists(10.8.0.0/24)" as post-condition and B has
"HostExists(10.8.0.1)" as pre-condition.
There is no fixed semantics for the the predicates in pre-conditions and
post-conditions, any predicates can be used. The only constraint is that the
same predicates have the same semantic and prototype in all of the hyperalerts.
For example, if HasService has a prototype like "HasService(ip_addr, port)",
then all the hyperalerts should follow this prototype, otherwise the
matching would fail. Any new predicates can be defined as well in
hyperalerts, provided that it respects this constraint.
==================
6. Output database
==================
If the output_database option is specified in the documentation, the alerts, and
the relative clusters, correlations and packet streams information, will be
saved on a database as well. This is strongly suggested, first for making the
management of the module's information easier (a SELECT query needs to be done
for doing also complex searches instead of grep-ing or manually
searching inside of a text file), second because the web interface of
the module can work ONLY if the output_database option is specified (the web
interface strongly depends on the unique IDs assigned to the alerts by
the database interface). Note that for using this option you should
explicitly tell to the ./configure script which DBMS you're going to use, so
that it knows which APIs to use for interfacing with the database, e.g. via
--with-mysql or --with-postgresql.
After you compile the module, you should pick up the right .sql file from
schemas/ directory (for example mysql.sql or postgresql.sql), or from
$PREFIX/share/snort_ai_preprocessor/schemas after the installation of the
module, and import it in your database,
$ mysql -uusername -ppassword dbname < schemas/mysql.sql (for MySQL)
$ psql -U username -W dbname < schemas/postgresql.sql (for PostgreSQL)
You can check the structure of the database from the SQL file for your DBMS, or
from the E/R schema saved in schemas/database_ER.png.
================
7. Web interface
================
The module provides an optional (but strongly recommended) web interface for
browsing the triggered (and already clustered) security alerts, their
correlations and their packet streams information from your browser. This
feature can be switched off by setting the configuration option "webserv_port"
of the module to 0. Otherwise, if none between webserv_dir and webserv_port are
specified, the web server thread starts with the module picking by default the
directory $PREFIX/share/snort_ai_preproc/htdocs as document root and listening
for incoming connections on the port 7654.
You should use a browser supporting JavaScript, AJAX and SVG technologies in
order to view correctly the alert web interface on your browser (successfully
tested with Firefox 3.5, Chrome and Opera 10), for example, connecting
to the address http://localhost:7654. You can drag and drop the nodes in the
graph, modifying the layout of the graph on the fly or using the "redraw"
function. Each node represents a clustered alert. For viewing the information
over that cluster and the alerts group inside, just click on the node. You can
optionally save the stream of packets associated to a certain alert in .pcap
format (analyzable by tools like tcpdump and Wireshark) from this same
interface. This feature, anyway, is based on the CGI script pcap.cgi inside of
the document root, and it requires the Perl interpreter to be installed on the
machine.
The web server running over the module is a true web server with its own
document path, so you can use it as stand-alone web server as well and place
your documents and files inside. You can moreover place some CGI scripts or
applications made in the language you prefer, as long as they are files
executable by any users and they have the extension ".cgi".
A powerful featured offered by the web interface is the one that allows the user
to manually "mark" two alerts as correlated, if the system didn't do that, or as
not correlated, if the system made a mistake correlating two uncorrelated
alerts. These decisions are made simply by clicking the right button on the web
page and clicking the two alerts to mark as correlated or uncorrelated. After
that, all the alerts of those types will be marted as correlated, or
uncorrelated.
=================================
8. Additional correlation modules
=================================
It is possible to add extra parameters and indexes for evaluating the
correlation between two alerts in an extremely simple way. The directory
specified in the configuration option "corr_modules_dir" contains the extra
modules (as binary shared libraries -> .so). Each of these modules should
contain a function whose prototype is
double AI_corr_index ( AI_snort_alert*, AI_snort_alert* )
taking two alerts as parameters and returning a correlation value between them,
and one whose prototype is
double AI_corr_index_weight ()
returning a coefficient in [0,1] expressing the weight of that index. An
example module is contained in the corr_modules directory in the source
directory, or in PREFIX/share/snort_ai_preproc/corr_modules after installation.
When you write your own module, just add in the Makefile in the corr_modules
directory a line like the one already present there for compiling, then type
`make'. You may need to link your module source file(s) against
libsf_ai_preproc.la if you want to use some of the functions from the module,
for example, for reading the alerts stored in the history file, in the
database, the current correlations, and so on.
It is also possible to write your own modules in Python language. See the file
'example_module.py' in corr_modules/ for a quick overview. All you need to
do is to declare in your module the functions AI_corr_index (taking two
arguments, two alert descriptions) and AI_corr_index_weight
(taking no argument), both returning a real value descibing,
respectively, the correlation value between the two alerts and the
weight of that index, both between 0 and 1. You can also access the
alert information and all the alerts acquired so far by the module
by importing in your Python code the 'snortai' module. You can
compile it and install it by moving to 'pymodule/'
directory and running
$ python setup.py build
$ [sudo] python setup.py install
You can acquire the current alerts by writing a code like the following:
import snortai
alerts = snortai.alerts()
for alert in alerts:
# Access the alerts information
The fields in the alert class can be viewed in
pymodule/test.py and corr_modules/example_module.py examples. Take these
files as guides for interfacing your Python scripts with SnortAI module
or writing new correlation modules in Python.
===========================
9. Additional documentation
===========================
The additional documentation over the code, functions and data structures can
be automatically generated by Doxygen by typing `make doc', and installed in
$PREFIX/share/snort_ai_preproc/doc then after `make install'.