blacklight/Snort_AIPreproc: A preprocessor module for Snort that uses ML algorithms for pruning, clustering and finding correlation between alerts

mirror of https://github.com/BlackLight/Snort_AIPreproc.git synced 2025-01-30 01:20:29 +01:00

A preprocessor module for Snort that uses ML algorithms for pruning, clustering and finding correlation between alerts

Find a file

BlackLight fcb2f25476 Improving neural correlation algorithm		2011-02-17 01:59:33 +01:00
base64	(Oh, adding base64 stuff)	2010-10-11 17:02:38 +02:00
corr_modules	Python module rewritten in pure Python	2011-02-04 00:43:59 +01:00
corr_rules	Adding more ICMP ping hyperalert modules	2011-02-10 20:23:23 +01:00
fkmeans	Removed a useless Makefile	2010-11-20 16:51:54 +01:00
fsom	Fixing a 'Too many files open' error in fsom	2010-11-21 17:43:08 +01:00
htdocs	Improving XSLT format for neural clusters	2011-02-15 18:40:05 +01:00
include	Correlation graphs, macro substitution improved	2010-09-14 19:24:03 +02:00
m4	PostgreSQL support (EXPERIMENTAL)	2010-09-16 17:11:46 +02:00
pymodule	Fixing Makefile and pymodule	2011-02-08 00:57:55 +01:00
schemas	Updated documentation and Makefile	2010-10-12 03:12:11 +02:00
uthash	First commit for spp_ai	2010-08-14 14:30:41 +02:00
aclocal.m4	Adding webserver features	2010-10-07 12:19:21 +02:00
alert_history.c	Fixing db, mutex and multithreading small bugs	2010-11-23 18:42:20 +01:00
alert_parser.c	Integration with GeoIP and GMaps in web interface	2010-12-01 23:25:41 +01:00
AUTHORS	Authors changed	2010-09-16 17:13:40 +02:00
bayesian.c	Weighted neural and bayesian networks correlation	2010-10-26 00:01:32 +02:00
ChangeLog	ChangeLog modified	2010-10-26 22:21:12 +02:00
cluster.c	Circular buffer for pkt history & more improvements	2010-11-23 23:23:29 +01:00
config.guess	Fixed MacOS support, implemented regexp caching	2010-09-16 02:00:02 +02:00
config.h.in	Treating knowledge base as separate index	2011-01-28 19:38:11 +01:00
config.sub	Fixed MacOS support, implemented regexp caching	2010-09-16 02:00:02 +02:00
configure	Treating knowledge base as separate index	2011-01-28 19:38:11 +01:00
configure.ac	Treating knowledge base as separate index	2011-01-28 19:38:11 +01:00
COPYING	Using autotools now	2010-09-05 15:27:35 +02:00
correlation.c	Fixing a mutex unlock bug in correlation.c	2011-02-14 15:57:13 +01:00
db.c	Integration with GeoIP and GMaps in web interface	2010-12-01 23:25:41 +01:00
db.h	Fixing a multiple access to output database bug	2010-11-16 19:18:08 +01:00
Doxyfile	First commit for spp_ai	2010-08-14 14:30:41 +02:00
geo.c	Fixing str_replace* and other 64 bit stuff	2011-02-10 00:54:06 +01:00
INSTALL	Updating the documentation	2010-10-14 02:53:17 +02:00
install-sh	Fixed MacOS support, implemented regexp caching	2010-09-16 02:00:02 +02:00
kb.c	Still fixing 64-bit int/long int buggy casts	2011-02-09 03:22:17 +01:00
ltmain.sh	PostgreSQL support (EXPERIMENTAL)	2010-09-16 17:11:46 +02:00
Makefile.am	Fixing Makefile and pymodule	2011-02-08 00:57:55 +01:00
Makefile.in	Fixing Makefile and pymodule	2011-02-08 00:57:55 +01:00
manual.c	(Forgot to add manual.c)	2011-01-28 20:07:21 +01:00
missing	Fixed MacOS support, implemented regexp caching	2010-09-16 02:00:02 +02:00
modules.c	Python module rewritten in pure Python	2011-02-04 00:43:59 +01:00
mysql.c	Fixing db, mutex and multithreading small bugs	2010-11-23 18:42:20 +01:00
neural.c	Improving neural correlation algorithm	2011-02-17 01:59:33 +01:00
neural_cluster.c	Still working on memory bugs in neural_cluster.c	2011-02-15 17:58:05 +01:00
NEWS	Using autotools now	2010-09-05 15:27:35 +02:00
outdb.c	Fixing db, mutex and multithreading small bugs	2010-11-23 18:42:20 +01:00
postgresql.c	Fixing db, mutex and multithreading small bugs	2010-11-23 18:42:20 +01:00
README	Python module rewritten in pure Python	2011-02-04 00:43:59 +01:00
regex.c	(Another 64-bit fix)	2011-02-10 00:56:08 +01:00
sf_preproc_info.h	10 sept 2010 commit	2010-09-11 02:12:39 +02:00
spp_ai.c	Still fixing 64-bit int/long int buggy casts	2011-02-09 03:22:17 +01:00
spp_ai.h	Fixing str_replace* and other 64 bit stuff	2011-02-10 00:54:06 +01:00
stream.c	Still fixing 64-bit int/long int buggy casts	2011-02-09 03:22:17 +01:00
TODO	Changing TODO	2010-12-01 23:27:16 +01:00
webserv.c	Fixing str_replace* and other 64 bit stuff	2011-02-10 00:54:06 +01:00

README

============================================================================
   ,,_        ____                   _        _    ___ 
  o"  )~     / ___| _ __   ___  _ __| |_     / \  |_ _|
   ''''      \___ \| '_ \ / _ \| '__| __|   / _ \  | | 
              ___) | | | | (_) | |  | |_   / ___ \ | | 
             |____/|_| |_|\___/|_|   \__| /_/   \_\___|


              _ __  _ __ ___ _ __  _ __ ___   ___ ___  ___ ___  ___  _ __ 
             | '_ \| '__/ _ \ '_ \| '__/ _ \ / __/ _ \/ __/ __|/ _ \| '__|
             | |_) | | |  __/ |_) | | | (_) | (_|  __/\__ \__ \ (_) | |   
             | .__/|_|  \___| .__/|_|  \___/ \___\___||___/___/\___/|_|   
             |_|            |_|                                           

 ~ A REALLY smart preprocessor module for Snort ~
 by BlackLight <blacklight@autistici.org>, http://0x00.ath.cx
============================================================================


This document describes the AI preprocessor module for Snort.
It also describes how to get it, install it, configure it and use it correctly.

Table of contents:
	1. What's Snort AI preprocessor
	2. How to get Snort AI preprocessor
	3. Installation
		3.1 Dependancies
		3.2 Configure options
	4. Basic configuration
	5. Correlation rules
	6. Output database
	7. Web interface
	8. Additional correlation modules
	9. Additional documentation


===============================
1. What's Snort AI preprocessor
===============================

Snort AI preprocessor is a preprocessor module for Snort whose purpose is making
the reading of Snort's alerts more comfortable, clustering false positive alarms
emphasizing  their  root  cause  in  order  to  reduce log pollution, clustering
similar  alerts  in  function  of the type and hierarchies over IP addresses and
ports  that  can  be  decided  by the user, depending on the kind of traffic and
topology  of  the  network, and constructing the flows of a multi-step attack in
function  of  correlation  rules  between  hyperalerts provided by the developer
itself,  by third parts or created by the user itself, again, in function of the
scenario  of  the  network.  It will furthermore possible, in a close future, to
correlate  the  hyperalerts  automatically,  by self-learning on the base of the
acquired alerts.


===================================
2. How to get Snort AI preprocessor
===================================

It  it strongly suggested to get the latest and always-fresh release of Snort AI
preprocessor   from   GitHub   ->   http://github.com/BlackLight/Snort_AIPreproc

git clone git://github.com/BlackLight/Snort_AIPreproc.git

If git is not available on the machine or cannot be used, from the same page you
can also choose "download source" and download the source code in tar.gz format.


===============
3. Installation
===============

The installation procedure is the usual one:

$ ./configure
$ make
$ make install

If  you did not install Snort in /usr directory you may need to use the --prefix
option with configure for selecting the directory where you installed Snort (for
 example    ./configure   --prefix=$HOME/local/snort).   If   the   prefix   was
specified  correctly,  and  it  actually  points to the location where Snort was
installed,      the      module     binaries     should     be     placed     in
$SNORT_DIR/lib/snort_dynamicpreprocessor    after    the    installation,    and
automatically  loaded  by  Snort  at  the  next start. Moreover, a new directory
named  corr_rules  will  be  created, in /etc/snort if the prefix was /usr or in
$SNORT_DIR/etc  otherwise,  containing  XML files describing default correlation
rules provided by the developer. This set can be enriched in any moment with new
XML  files,  provided  by  third parts or created by the user itself, describing
more hyperalerts.


================
3.1 Dependancies
================

Dependancies required for a correct compilation and configuration:

- pthread (REQUIRED), used for running multiple threads inside of the module. On
a Debian-based  system,  install  libpthread-dev  if  you don't already have it.

- libxml2 (REQUIRED), used for parsing XML files from corr_rules directory. On a
Debian-based   system,  install  libxml2-dev  if  you  don't  already  have  it.

- libgraphviz  (RECOMMENDED),  used  for  generating  PNG (and in future PS too)
 files    representing   hyperalert   correlation   graphs   from   .dot   files
 generated   from  the  software.  You  can  remove  this  dependancy  from  the
 compilation  process  by  specifying  --without-graphviz to ./configure, but in
 this  case  you  will  have  .dot  files, not easily understandable by a human,
 for  representing  correlation  graphs,  and  you  may  need  an external graph
 rendering  software  for  converting  them in a more easily readable format. On
 a Debian  system,  install  libgraphviz-dev  if  you  don't  already  have  it.

- libmysqlclient  (OPTIONAL),  used if you want to read alerts information saved
on MySQL DBMS, or enable MySQL support in the module. This option is disabled by
default  (if not specified otherwise, the module will read the alerts from Snort
 plain   log   files),   and   can   be   enabled   by   specifying  the  option
--with-mysql  to  ./configure.  On a Debian-based system you may need to install
libmysqlclient-dev.

- libpq  (OPTIONAL),  used  if  you  want  to  read  alerts information saved on
PostgreSQL  DBMS,  or  enable  PostgreSQL  support in the module. This option is
disabled   by  the  default,  and  can  be  enabled  by  specifying  the  option
--with-postgresql  to  ./configure.  On  a  Debian-based  system you may need to
install libpq-dev.

- A  DBMS (RECOMMENDED), MySQL and PostgreSQL are supported for now, for writing
clusters,  correlations  and  packet  streams  information on a DBMS, making the
analysis                                                                 easier.

- Perl  (RECOMMENDED),  used  for  the  CGI  script  in  the  web interface that
saves  a  packet  stream  associated to an alert in .pcap format, to be analyzed
by            tools           like             tcpdump         and     Wireshark.

- XML::Simple  Perl module (RECOMMENDED), used by 'correlate.cgi' CGI script for
reading  and  writing  manual  (un)correlations  XML  files.  A  quick  way  for
installing     it     on     a     Unix     system    is    by    using    CPAN.

- Python  2.6  (OPTIONAL), used for interfacing SnortAI module to Python scripts
through   snortai  module  (see  README  file  in  pymodule/)  and  writing  new
correlation       modules      (see      example.py      in      corr_modules/).
Compile the module passing --with-python option to the ./configure script if you
want  this  feature.  You  need Python interpreter and libpython2.6 installed on
your system.

# cpan XML::Simple

=====================
3.2 Configure options
=====================

You  can  pass  the  following  options  to ./configure script before compiling:

--with-mysql  -  Enables  MySQL  DBMS  support  into  the  module  (it  requires
 libmysqlclient)

--with-pq  - Enables PostgreSQL DBMS support into the module (it requires libpq)

--without-graphviz  -  Disables  Graphviz  support from the module, avoiding the
generation  of  PNG  or  PS  files  representing hyperalerts correlation as well


======================
4. Basic configuration
======================

After  installing the module in Snort installation directory a configuration for
this  is  required  in  snort.conf.  A  sample configuration may appear like the
following:


preprocessor ai: \
	alertfile "/your/snort/dir/log/alert" \
	alert_bufsize 30 \
	alert_clustering_interval 300 \
	alert_correlation_weight 5000 \
	alert_history_file "/your/snort/dir/log/alert_history" \
	alert_serialization_interval 3600 \
	bayesian_correlation_interval 1200 \
	bayesian_correlation_cache_validity 600 \
	cluster ( class="dst_port", name="privileged_ports", range="1-1023" ) \
	cluster ( class="dst_port", name="unprivileged_ports", range="1024-65535" ) \
	cluster ( class="src_addr", name="local_net", range="192.168.1.0/24" ) \
	cluster ( class="src_addr", name="dmz_net", range="155.185.0.0/16" ) \
	cluster ( class="src_addr", name="vpn_net", range="10.8.0.0/24" ) \
	cluster ( class="dst_addr", name="local_net", range="192.168.1.0/24" ) \
	cluster ( class="dst_addr", name="dmz_net", range="155.185.0.0/16" ) \
	cluster ( class="dst_addr", name="vpn_net", range="10.8.0.0/24" ) \
	cluster_max_alert_interval 14400 \
	clusterfile "/your/snort/dir/log/clustered_alerts" \
	corr_modules_dir "/your/snort/dir/share/snort_ai_preproc/corr_modules" \
	correlation_graph_interval 300 \
	correlation_rules_dir "/your/snort/dir/etc/corr_rules" \
	correlated_alerts_dir "/your/snort/dir/log/correlated_alerts" \
	correlation_threshold_coefficient 0.5 \
	database ( type="dbtype", name="snort", user="snortusr", password="snortpass", host="dbhost" ) \
	database_parsing_interval 30 \
	hashtable_cleanup_interval 300 \
	manual_correlations_parsing_interval 120 \
	max_hash_pkt_number 1000 \
	neural_clustering_interval 1200 \
	neural_network_training_interval 43200 \
	neural_train_steps 10 \
	output_database ( type="dbtype", name="snort", user="snortusr", password="snortpass", host="dbhost" ) \
	output_neurons_per_side 20 \
	tcp_stream_expire_interval 300 \
	use_knowledge_base_correlation_index 1 \
	use_stream_hash_table 1 \
	webserv_banner "Snort AIPreprocessor module" \
	webserv_dir "/prefix/share/htdocs" \
	webserv_port 7654


The options are the following:

- alertfile:  The file where Snort saves its alerts, if they are saved to a file
and   not  to  a  database  (default  if  not  specified:  /var/log/snort/alert)


- alert_correlation_weight:  When this number of alert is stored in the "memory"
of  the software (i.e. in the alert history file or in the output database), the
weight  for  the  heuristical  correlation  indexes (bayesian network and neural
 network)  will  be  more  or  less  equal  to  0.95,  on  a  scale from 0 to 1.
This parameter expresses how much the heuristical indexes should be weighted and
it  can  be  considered like a kind of "learning rate" for the alert correlation
algorithm       (default       value      if      not      specified:      5000)


- alert_history_file:  The   file keeping track of the history, in binary format,
of   all  the  alerts  received  by  the  IDS, so that the module can build some
statistical        correlation        inferences       over       the       past


- alert_serialization_interval:   The   interval   that   should  occur  from  a
serialization  of  a  buffer  of  alerts  on  the  history file and the next one
(default if not specified: 1 hour, as it is a quite expensive operation in terms
of       resources      if      the     system     received     many     alerts)


- alert_bufsize:  Size of the buffer containing the alerts to be sent, in group,
to   the  serializer  thread.  The  buffer is sent when full and made empty even
when   the  alert_serialization_interval  parameter  is  not  expired  yet,  for
avoiding   overflows,  other  memory  problems  or  deadlocks  (default value if
not                                 specified:                               30)


- alert_clustering_interval:  The interval that should occur from the clustering
of  the  alerts  in the log according to the provided clustering hierarchies and
the     next     one     (default     if    not    specified:    300    seconds)


- bayesian_correlation_interval: Interval, in seconds, that should occur between
two  alerts  in  the  history  for  considering  them as, more or less strongly,
correlated (default: 1200 seconds). NOTE: A value of 0 will disable the bayesian
correlation.  This  setting  is  strongly suggested when your alert log is still
"learning",  i.e.  when you don't have enough alerts yet. After this period, you
can      set      the      correlation      interval      to      any     value.


- bayesian_correlation_cache_validity:  interval, in seconds, for which an entry
in  the  bayesian  correlation  hash  table  (i.e.  a  pair  of  alerts with the
 associated   historical   bayesian   correlation)   is   considered   as  valid
before         being         updated        (default:        600        seconds)


- corr_modules_dir:  This  software supports a kind of plugins, or "modules over
the  module",  that  allow  the user to specify some extra correlation rules and
indexes.  These  modules  are .so files placed in this directory (default if not
 specified:  PREFIX/share/snort_ai_preproc/corr_modules),  dynamically loaded by
the  module.  For  more  information  on  how  to write your own module, see the
dedicated              section             in             this             file.


- correlation_graph_interval:  The  interval that should occur from the building
of  the correlation graph between the clustered alerts and the next one (default
 if              not             specified:             300             seconds)


- correlation_rules_dir: Directory where the correlation rules are saved, as XML
files      (default      if      not      specified:      /etc/snort/corr_rules)


- correlated_alerts_dir:  Directory  where  the  information  between correlated
alerts  will  be  saved,  as  .dot  files ready to be rendered as graphs and, if
libgraphviz  support  is  enabled, as .png and .ps files as well (default if not
 specified: /var/log/snort/clustered_alerts)


- correlation_threshold_coefficient: The threshold the software uses for stating
two   alerts   are   correlated   is   avg(correlation   coefficient)   +   k  *
std_deviation(correlation_coefficient). The value of k is specified through this
option,  whose  value  is 0.5 by default, and is dependant on how "sensible" you
want  the  correlation algorithm. A value of k=0 means "consider all the couples
of  alerts  whose  correlation  coefficient  is  greater than the average one as
correlated"  (negative values of k are allowed as well, but unless you know what
you're  doing  they're unrecommended, as some correlation constraints may appear
where  no correlation exists). When the value of k raises also the threshold for
two alerts for being considered as correlated raises. A high value of k may just
lead           to          an          empty          correlation          graph


- clusterfile:  File  where  the  clustered  alerts  will be saved by the module
(default       if      not      specified:      /var/log/snort/clustered_alerts)


- cluster_max_alert_interval:   Maximum  time  interval,  in  seconds,  occurred
between  two  alerts  for considering them as part of the same cluster (default:
14400   seconds,  i.e.  4  hours).  Specify  0  for  this  option if you want to
cluster   alerts   regardlessly   of   how   much  time  occurred  between  them


- cluster:  Clustering  hierarchy  or  list  of  hierarchies  to  be applied for
grouping     similar     alerts.     This     option     needs    to    specify:
	-- class: Class of the cluster node. It may be src_addr, dst_addr, src_port
			or dst_port
	-- name: Name for the clustering node
	-- range: Range of the clustering node. It can include a single port or IP
			address, an IP range (specified as subnet x.x.x.x/x), or a port
			range (specified as xxx-xxx)


- database:  If Snort saves its alerts to a database and the module was compiled
 with   database   support   (e.g.   --with-mysql)  this  option  specifies  the
 information   for   accessing   that   database.   The   fields   in  side  are
	-- type: DBMS to be used (so far MySQL and PostgreSQL are supported)
	-- name: Database name
	-- user: Username for accessing the database
	-- password: Password for accessing the database
	-- host: Host holding the database


- database_parsing_interval:  The  interval  that should occur between a read of
the alerts from database and the next one (default if not specified: 30 seconds)


- hashtable_cleanup_interval: The interval that should occur from the cleanup of
the  hashtable  of  TCP  streams and the next one (default if not specified: 300
seconds).  Set  this  option  to  0 for performing no cleanup on the stream hash
table


- max_hash_pkt_number: Maximum number of packets that each element of the stream
hash  table  should  hold,  set  it  to  0  for  no  limit (default value if not
specified: 1000)


- manual_correlations_parsing_interval: Interval in seconds between an execution
of  the  thread for parsing the alert correlations manually set and the next one
(default       value       if       not       specified:       120      seconds)


- neural_clustering_interval:  Interval  in  seconds between an execution of the
thread  for  clustering  (using  k-means)  the alerts on the output layer of the
neural  network in order to recognize likely attack scenarios, and the next one.
Set  this  to  0  if  you want no clusterization (default if not specified: 1200
seconds)


- neural_network_training_interval:  Interval in seconds between an execution of
the  thread  for  training the neural network using the set of recent alerts and
the     next     one    (default    if    not    specified:    43200    seconds)


- neural_train_steps:  Number  of  steps  to take in each training cycle for the
neural                   network                  (default:                  10)


- output_database:  Specify this option if you want to save the outputs from the
module  (correlated  alerts,  clustered  alerts,  alerts  information  and their
associated    packets   streams,   and  so  on)  to  a  relational  database  as
well  (by  default  the module only saves the alerts on static plain files). The
options    here   are   the   same   specified   for   the   'database'   option.
The  structure  of this database can be seen in the files schemas/*.sql (replace
to  * the name of your DBMS). If you want to initialize the tables needed by the
module,   just   give   the   right  file  to  your  database,  e.g.  for  MySQL
$ mysql -uusername -ppassword dbname < schemas/mysql.sql


- output_neurons_per_side: Number of output neurons per side on the output layer
of  the  neural network (that is a rectangular matrix). A higher number allows a
higher  granularity  over  similar  alerts, but a linear increment of this value
produces  a  squared  increment of the computational complexity for the training
and    evaluation    algorithms   (default   value   if   not   specified:   20)


- tcp_stream_expire_interval:  The  interval that should occur for marking a TCP
stream as "expired", if no more packets are received inside of that and it's not
"marked"    as    suspicious   (default   if   not   specified:   300   seconds)


- use_knowledge_base_correlation_index: Set this option to 0 if you do not want
to use the knowledge base alert correlation index (default value if not
specified: 1)


- use_stream_hash_table: Set this option to 0 if you do not want to use the
hash table for storing the streams of packets associated to alerts, this is a
good choice on a system where many alerts are triggered (default value if not
specified: 1)


- webserv_banner:  Banner of the web server, to be placed on the error pages and
in           the           "Server"          HTTP          reply          header


- webserver_dir:  Directory  containing  the  contents of the web server running
over      the      module      (default      if      none      is     specified:
$PREFIX/share/snort_ai_preprocessor/htdocs)


- webserver_port:  Port  where  the  web  server will listen (default if none is
specified:  7654).  Set  this value to 0 if you don't want to run the web server
over  the  module for having the web interface (in this case, if you want to see
the  web  graphical  visualization  of  the alerts, you should manually copy the
files     contained     in     htdocs/    in    a    web    server    directory)

====================
5. Correlation rules
====================

The  hyperalert  correlation  rules  are  specified in $SNORT_DIR/etc/corr_rules
directory  through a very simple XML syntax, and any user can add some new ones.
The files in there must be named like the Snort alert ID they want to model. For
example,  if  we  want  to  model  a TCP portscan alert (Snort ID: 122.1.0) as a
hyperalert,  i.e.  as  an  alert  with  pre-conditions and post-conditions to be
correlated  to  other  alerts,  then  we need to create a file named 122-1-0.xml
inside   $SNORT_DIR/etc/corr_rules   with   a   content   like   the  following:


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE hyperalert PUBLIC "-//blacklighth//DTD HYPERALERT SNORT MODEL//EN" "http://0x00.ath.cx/hyperalert.dtd">

<hyperalert>
	<snort-id>122.1.0</snort-id>
	<desc>(portscan) TCP Portscan</desc>

	<pre>HostExists(+DST_ADDR+)</pre>
	<post>HasService(+DST_ADDR+, +ANY_PORT+)</post>
</hyperalert>


The  <desc>  tag  is  optional,  same  for  <pre>  and <post> if an alert has no
pre-conditions and/or post-conditions, while the <snort-id> tag is mandatory for
identifying the hyperalert. In this case we say that the pre-condition for a TCP
portscan  for  being  successful  is  that the host +DST_ADDR+ exists (the macro
 +DST_ADDR+   will   automatically   be  expanded  at  runtime  and  substituted
 with   the   target   address   of  the  portscan).  The  post-condition  of  a
portscan  consists  in  the  attacker  knowing that +DST_ADDR+ runs a service on
+ANY_PORT+  (+ANY_PORT+  is another macro that will be expanded on runtime). The
hyperalerts  model  in  corr_rules  are  the knowledge base used for correlating
alerts triggered by Snort, the more information it has inside, the more accurate
and  complete  the  correlation will be. The macros recognized and automatically
expanded           from           these          XML          files          are


- +SRC_ADDR+: The IP address triggering the alert
- +DST_ADDR+: The target IP address in the alert
- +SRC_PORT+: The port from which the alert was triggered
- +DST_PORT+: The port on which the alert was triggered
- +ANY_ADDR+: Identifies any IP address
- +ANY_PORT+: Identifies any port


The correlation between two alerts A and B is made comparing the post-conditions
of  A  with  the  pre-conditions of B. If the correlaton coefficient computed in
this  way  is  greater  than  a  certain  threshold (see "Basic configuration ->
 correlation_threshold_coefficient")    then    the   alerts   are   marked   as
correlated, i.e. the alert A determines the alert B. This supports an elementary
reasoning  algorithm  for  doing inferences on the conditions. For example, if A
has  the  post-condition  "HasService(+DST_ADDR+,  +ANY_PORT+)"  and  B  has the
pre-condition "HasService(+DST_ADDR, 22)", a match between A and B is triggered.
Same   if   A   has   "HostExists(10.8.0.0/24)"  as  post-condition  and  B  has
"HostExists(10.8.0.1)" as pre-condition.

There  is  no  fixed  semantics  for  the  the  predicates in pre-conditions and
post-conditions,  any  predicates  can  be used. The only constraint is that the
same  predicates have the same semantic and prototype in all of the hyperalerts.
For  example,  if  HasService  has a prototype like "HasService(ip_addr, port)",
then   all    the  hyperalerts  should  follow  this  prototype,  otherwise  the
matching    would   fail.   Any  new  predicates  can  be  defined  as  well  in
hyperalerts,      provided     that     it     respects     this     constraint.


==================
6. Output database
==================

If the output_database option is specified in the documentation, the alerts, and
the  relative  clusters,  correlations  and  packet streams information, will be
saved  on  a  database as well. This is strongly suggested, first for making the
management  of  the module's information easier (a SELECT query needs to be done
 for   doing   also   complex   searches   instead   of   grep-ing  or  manually
 searching  inside  of  a  text  file),  second  because  the  web  interface of
the  module  can  work  ONLY if the output_database option is specified (the web
 interface  strongly  depends  on  the  unique  IDs  assigned  to  the alerts by
 the   database   interface).  Note  that  for  using  this  option  you  should
explicitly  tell  to  the  ./configure script which DBMS you're going to use, so
that  it  knows  which  APIs  to use for interfacing with the database, e.g. via
--with-mysql                        or                        --with-postgresql.

After  you  compile  the  module,  you  should  pick up the right .sql file from
schemas/   directory   (for   example  mysql.sql  or  postgresql.sql),  or  from
$PREFIX/share/snort_ai_preprocessor/schemas   after   the  installation  of  the
module,        and        import        it        in        your       database,

$ mysql -uusername -ppassword dbname < schemas/mysql.sql (for MySQL)
$ psql -U username -W dbname < schemas/postgresql.sql (for PostgreSQL)

You  can check the structure of the database from the SQL file for your DBMS, or
from      the      E/R      schema     saved     in     schemas/database_ER.png.


================
7. Web interface
================

The  module  provides  an  optional (but strongly recommended) web interface for
browsing   the   triggered   (and  already  clustered)  security  alerts,  their
correlations  and  their  packet  streams  information  from  your browser. This
feature  can  be switched off by setting the configuration option "webserv_port"
of  the module to 0. Otherwise, if none between webserv_dir and webserv_port are
specified,  the  web server thread starts with the module picking by default the
directory  $PREFIX/share/snort_ai_preproc/htdocs  as document root and listening
for        incoming        connections       on       the       port       7654.

You  should  use  a  browser supporting JavaScript, AJAX and SVG technologies in
order  to  view  correctly the alert web interface on your browser (successfully
 tested  with  Firefox  3.5,  Chrome  and  Opera  10),  for  example, connecting
to  the  address  http://localhost:7654.  You can drag and drop the nodes in the
graph,  modifying  the  layout  of  the  graph  on the fly or using the "redraw"
function.  Each  node  represents a clustered alert. For viewing the information
over  that  cluster and the alerts group inside, just click on the node. You can
optionally  save  the  stream  of packets associated to a certain alert in .pcap
format  (analyzable  by  tools  like  tcpdump  and  Wireshark)  from  this  same
interface.  This  feature, anyway, is based on the CGI script pcap.cgi inside of
the  document  root, and it requires the Perl interpreter to be installed on the
machine.

The  web  server  running  over  the  module  is  a true web server with its own
document  path,  so  you  can use it as stand-alone web server as well and place
your  documents  and  files  inside.  You can moreover place some CGI scripts or
applications  made  in  the  language  you  prefer,  as  long  as they are files
executable    by    any    users   and   they   have   the   extension   ".cgi".

A powerful featured offered by the web interface is the one that allows the user
to manually "mark" two alerts as correlated, if the system didn't do that, or as
not  correlated,  if  the  system  made  a  mistake correlating two uncorrelated
alerts.  These decisions are made simply by clicking the right button on the web
page  and  clicking  the two alerts to mark as correlated or uncorrelated. After
that,  all  the  alerts  of  those  types  will  be  marted  as  correlated,  or
uncorrelated.


=================================
8. Additional correlation modules
=================================

It  is  possible  to  add  extra  parameters  and  indexes  for  evaluating  the
correlation  between  two  alerts  in  an  extremely  simple  way. The directory
specified  in  the  configuration  option  "corr_modules_dir" contains the extra
modules  (as  binary  shared  libraries  ->  .so).  Each of these modules should
contain          a          function          whose         prototype         is


double  AI_corr_index ( AI_snort_alert*, AI_snort_alert* )


taking  two alerts as parameters and returning a correlation value between them,
and               one              whose              prototype              is


double  AI_corr_index_weight ()


returning  a  coefficient  in  [0,1]  expressing  the  weight  of that index. An
example  module  is  contained  in  the  corr_modules  directory  in  the source
directory,  or in PREFIX/share/snort_ai_preproc/corr_modules after installation.

When  you  write  your  own module, just add in the Makefile in the corr_modules
directory  a  line  like  the one already present there for compiling, then type
`make'.   You   may   need   to   link   your   module  source  file(s)  against
libsf_ai_preproc.la  if  you  want to use some of the functions from the module,
 for  example,  for  reading  the  alerts  stored  in  the  history file, in the
 database,       the       current       correlations,      and      so      on.

It  is  also possible to write your own modules in Python language. See the file
'example_module.py'  in  corr_modules/  for  a  quick  overview. All you need to
do  is  to  declare  in  your  module  the  functions  AI_corr_index (taking two
arguments,      two     alert     descriptions)     and     AI_corr_index_weight
(taking     no    argument),    both    returning   a   real   value   descibing,
respectively,    the   correlation   value   between  the  two  alerts  and  the
weight  of  that   index,  both  between  0  and  1.  You  can  also  access the
alert   information    and  all  the  alerts  acquired  so  far  by  the  module
by   importing   in    your   Python   code   the   'snortai'  module.  You  can
compile     it     and      install     it     by    moving    to    'pymodule/'
directory and running

$ python setup.py build
$ [sudo] python setup.py install

You  can  acquire  the  current  alerts  by  writing  a code like the following:

import snortai

alerts = snortai.alerts()

for alert in alerts:
	# Access the alerts information

The     fields     in     the     alert     class     can     be    viewed    in
pymodule/test.py   and   corr_modules/example_module.py   examples.  Take  these
files  as  guides  for  interfacing  your  Python  scripts  with  SnortAI module
or       writing       new       correlation       modules       in      Python.


===========================
9. Additional documentation
===========================

The additional documentation over the code, functions and data structures can
be automatically generated by Doxygen by typing `make doc', and installed  in
$PREFIX/share/snort_ai_preproc/doc then after `make install'.