Running Binpac-generated Analyzer Standalone (updated)
From BroWiki
The following describes the process of decoupling binpac from Bro to run it as a standalone compiler. It is based on the latest Bro 1.4 release and provides a bit more detail than the previous explanation in this same wiki (e.g. it provides an example of HTTP pac specification which is Bro independent).
Decoupling binpac from BRO
In the latest BRO 1.4 release binpac sits under bro-1.4/aux/. To separate binpac, we can grab the whole directory and place it in a different location. There are two main changes that need to be applied to decouple binpac from bro, one at the compiler level and the other at the pac level.
Dependencies at the compiler level.
The dependency of binpac at the compiler level is with the regex library. We need to link binpac with the library libpcre (which can be downloaded from here) and add some stub code in between. The following steps obtained from the original binpac wiki with a few minor changes describe how to make this work for the latest binpac obtained from BRO 1.4.
Create these three header files and add them to the folder binpac/lib:
RE.h
#include "binpac_pcre.h" #include "binpac_stdalone.h"
binpac_stdalone.h
#ifndef __BINPAC_STDALONE_H__ #define __BINPAC_STDALONE_H__ #define DEBUG_MSG(x...) fprintf(stderr, x) #endif
binpac_pcre.h
#ifndef __BINPAC_PRE_H__ #define __BINPAC_PRE_H__
#include <stdio.h> #include <assert.h> #include <string> using namespace std;
// TODO: use configure to figure out the location of pcre.h #include "pcre.h"
class RE_Matcher {
public:
RE_Matcher(const char* pat){
pattern_ = "^";
pattern_ += "(";
pattern_ += pat;
pattern_ += ")";
pcre_ = NULL;
pextra_=NULL;
}
~RE_Matcher() {
if (pcre_) {
pcre_free(pcre_);
}
}
int Compile() {
const char *err = NULL;
int erroffset = 0;
pcre_ = pcre_compile(pattern_.c_str(),
0, // options,
&err,
&erroffset,
NULL);
if (pcre_ == NULL) {
fprintf(stderr,
"Error in RE_Matcher::Compile(): %d:%s\n",
erroffset, err);
return 0;
}
return 1;
}
int MatchPrefix (const binpac::uint8* s, int n){
const char *err=NULL;
assert(pcre_);
const int MAX_NUM_OFFSETS = 30;
int offsets[MAX_NUM_OFFSETS];
int ret = pcre_exec(pcre_,
pextra_, // pcre_extra
//NULL, // pcre_extra
(const char*) s, n,
0, // offset
0, // options
offsets,
MAX_NUM_OFFSETS);
if (ret < 0) {
return -1;
}
assert(offsets[0] == 0);
return offsets[1];
}
protected:
pcre *pcre_;
string pattern_;
pcre_extra *pextra_;
};
#endif
Notice that there are a couple of minor changes with respect to the original code from the binpac wiki. The pextra_ is left undefined in the original code and the prototype MatchPrefix does not match that of binpac_regex.h (conversion from const char* to const binpac::uint8* is not allowed by the compiler). The latter one could be because of mismatch in the wiki and the BRO version we used here. The above code should resolve these issues.
Recompiling binpac with the above changes will generate the standalone binpac compiler.
Dependencies at the pac level.
The parser examples written in pac language that come with the BRO/binpac distribution are also glued to BRO. In order to create a protocol parser using standalone binpac we need to remove those dependencies. The following files provide the pac specification of the HTTP parser that have been separated from BRO.
The following http.pac file corresponds to a pac specification of the HTTP protocol without BRO dependencies. It was generated from the original http.pac, http-protocol.pac and http-analyzer.pac files included in the BRO 1.4 distribution by removing the BRO dependencies and merging the result into one single file (since all that it does now is protocol parsing without generating any BRO events).
http.pac
# $Id:$
%include binpac-lib.pac
##
## Protocol parser specification
##
%extern{
#include "http-baseconn.h"
%}
analyzer HTTP withcontext {
connection: HTTP_Conn;
flow: HTTP_Flow;
};
enum ExpectBody {
BODY_EXPECTED,
BODY_NOT_EXPECTED,
BODY_MAYBE,
};
enum DeliveryMode {
UNKNOWN_DELIVERY_MODE,
CONTENT_LENGTH,
CHUNKED,
MULTIPART,
};
## token = 1*<any CHAR except CTLs or separators>
## separators = "(" | ")" | "<" | ">" | "@"
## | "," | ";" | ":" | "\" | <">
## | "/" | "[" | "]" | "?" | "="
## | "{" | "}" | SP | HT
## reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
## "$" | ","
type HTTP_TOKEN = RE/[^()<>@,;:\\"\/\[\]?={} \t]+/;
type HTTP_WS = RE/[ \t]*/;
type HTTP_URI = RE/[[:alnum:][:punct:]]+/;
type HTTP_PDU(is_orig: bool) = case is_orig of {
true -> request: HTTP_Request;
false -> reply: HTTP_Reply;
};
type HTTP_Request = record {
request: HTTP_RequestLine;
msg: HTTP_Message(BODY_MAYBE);
};
function expect_reply_body(reply_status: int): ExpectBody
%{
// TODO: check if the request is "HEAD"
if ( (reply_status >= 100 && reply_status < 200) ||
reply_status == 204 || reply_status == 304 )
return BODY_NOT_EXPECTED;
return BODY_EXPECTED;
%}
type HTTP_Reply = record {
reply: HTTP_ReplyLine;
msg: HTTP_Message(expect_reply_body(reply.status.stat_num));
};
type HTTP_RequestLine = record {
method: HTTP_TOKEN;
: HTTP_WS;
uri: HTTP_URI;
: HTTP_WS;
version: HTTP_Version;
} &oneline;
type HTTP_ReplyLine = record {
version: HTTP_Version;
: HTTP_WS;
status: HTTP_Status;
: HTTP_WS;
reason: bytestring &restofdata;
} &oneline;
type HTTP_Status = record {
stat_str: RE/[0-9]{3}/;
} &let {
stat_num: int = bytestring_to_int(stat_str, 10);
};
type HTTP_Version = record {
: "HTTP/";
vers_str: RE/[0-9]+\.[0-9]+/;
} &let {
vers_num: double = bytestring_to_double(vers_str);
};
type HTTP_Headers = HTTP_Header[] &until($input.length() == 0);
type HTTP_Message(expect_body: ExpectBody) = record {
headers: HTTP_Headers;
body_or_not: case expect_body of {
BODY_NOT_EXPECTED -> none: empty;
default -> body: HTTP_Body(expect_body);
};
};
#
# Multi-line headers are supported by allowing header names to be
# empty.
#
type HTTP_HEADER_NAME = RE/|([^: \t]+:)/;
type HTTP_Header = record {
name: HTTP_HEADER_NAME &transient;
: HTTP_WS;
value: bytestring &restofdata &transient;
} &oneline;
type MIME_Line = record {
line: bytestring &restofdata &transient;
} &oneline;
type MIME_Lines = MIME_Line[]
&until($context.flow.is_end_of_multipart($input));
# TODO: parse multipart message according to MIME
type HTTP_Body(expect_body: ExpectBody) =
case $context.flow.delivery_mode() of {
CONTENT_LENGTH -> body: bytestring
&length = $context.flow.content_length(),
&chunked;
CHUNKED -> chunks: HTTP_Chunks;
MULTIPART -> multipart: MIME_Lines;
default -> unknown: HTTP_UnknownBody(expect_body);
};
type HTTP_UnknownBody(expect_body: ExpectBody) = case expect_body of {
BODY_MAYBE, BODY_NOT_EXPECTED -> maybenot: empty;
BODY_EXPECTED -> rest: bytestring &restofflow &chunked;
};
type HTTP_Chunks = record {
chunks: HTTP_Chunk[] &until($element.chunk_length == 0);
headers: HTTP_Headers;
};
type HTTP_Chunk = record {
length_line: bytestring &oneline;
data: bytestring &length = chunk_length &chunked;
opt_crlf: case chunk_length of {
0 -> none: empty;
default -> crlf: bytestring &oneline &check(trailing_crlf == "");
};
} &let {
chunk_length: int = bytestring_to_int(length_line, 16);
};
##
## Connection and flow definitions
##
connection HTTP_Conn(http_conn: BaseConn) {
upflow = HTTP_Flow(true);
downflow = HTTP_Flow(false);
};
flow HTTP_Flow(is_orig: bool) {
flowunit = HTTP_PDU(is_orig) withcontext (connection, this);
%member{
int content_length_;
DeliveryMode delivery_mode_;
bytestring end_of_multipart_;
double msg_start_time_;
int msg_begin_seq_;
int msg_header_end_seq_;
bool build_headers_;
%}
%init{
content_length_ = 0;
delivery_mode_ = UNKNOWN_DELIVERY_MODE;
msg_start_time_ = 0;
msg_begin_seq_ = 0;
msg_header_end_seq_ = -1;
%}
%cleanup{
end_of_multipart_.free();
%}
function content_length(): int
%{
return content_length_;
%}
function delivery_mode(): DeliveryMode
%{
return delivery_mode_;
%}
function end_of_multipart(): const_bytestring
%{
return end_of_multipart_;
%}
function is_end_of_multipart(line: const_bytestring): bool
%{
if ( line.length() < 4 + end_of_multipart_.length() )
return false;
int len = end_of_multipart_.length();
// line =?= "--" end_of_multipart_ "--"
return ( line[0] == '-' && line[1] == '-' &&
line[len + 2] == '-' && line[len + 3] == '-' &&
strncmp((const char*) line.begin() + 2,
(const char*) end_of_multipart_.begin(),
len) == 0 );
%}
};
##
## Sample event
##
function scb_store_method_uri_version(conn: BaseConn,
is_orig: bool,
method: const_bytestring,
uri: const_bytestring,
version: const_bytestring): bool
%{
//
// Store the parsed fields
//
bytestring_to_string(method, conn->method);
bytestring_to_string(uri, conn->uri);
bytestring_to_string(version, conn->version);
return true;
%}
refine typeattr HTTP_RequestLine += &let {
process_request: bool = scb_store_method_uri_version($context.connection.http_conn,
$context.flow.is_orig,
method,
uri,
version.vers_str);
};
Notice that at the end of the http.pac file, a sample event has been added using the refine and let directives. This is just to illustrate how one can add events that can be triggered in real time as the parser resolves header fields on the flight. This part of the code is not part of the HTTP parser and can be removed. Namely, the event scb_store_method_uri_version() stores the method, URI and version number of an HTTP request in a per-connection basis (the BaseConn class where these fields are stored are application specific and therefore defined outside the scope of binpac, as shown next).
The binpac compilation of the above http.pac specification will give us two files: http_pac.cc and http_pac.h. These files define the HTTP specification in C++. We can then compile them and link them to a separate main program to run in standalone mode.
An example of a main program that parses a local file (http_conversation.txt) containing an HTTP conversation is shown next (this program has an optional input parameter to specify where the message should be cut into two pieces to emulate the case of stop-and-go parsing on a continuous strem of incoming bytes). Based on the above pac specification, we need to define a connection class that must be passed when allocating a connection parser. This connection class, which we have named BaseConn in our example, can be arbitrarily defined. In the example we use BaseConn to store per-connection info in the event scb_store_method_uri_version().
http-baseconn.h
#ifndef HTTP_BASECONN_H
#define HTTP_BASECONN_H
class BaseConn {
public:
#define HEADERFIELD_MAXSIZE 256
char method[HEADERFIELD_MAXSIZE];
char uri[HEADERFIELD_MAXSIZE];
char version[HEADERFIELD_MAXSIZE];
BaseConn();
~BaseConn();
protected:
};
#endif
http-baseconn.cc
#include "http-baseconn.h"
BaseConn::BaseConn()
{
}
BaseConn::~BaseConn()
{
}
http-main.cc
#include <fcntl.h>
#include "http_pac.h"
#define FILEPATH "./http_conversation.txt"
#define DATALEN 4096
/*
* Optionally pass an integer parameter
* to specify where to cut the message
* to emulate stop-and-go parsing
*
*/
int main(int argc, char* argv[])
{
binpac::uint8 data[DATALEN];
BaseConn *conn = new BaseConn();
binpac::HTTP::HTTP_Conn *parser;
int cut = 0;
int len;
int fd;
if(argc == 2)
cut = atoi(argv[1]);
else
cut = -1;
if((fd = open(FILEPATH, O_RDONLY)) < 0)
goto end;
len = lseek(fd, 0, SEEK_END);
lseek(fd, 0, SEEK_SET);
if(read(fd, data, len) < 0)
goto close;
if(cut > len)
cut = len;
parser = new binpac::HTTP::HTTP_Conn(conn);
parser->NewData(1, data, data + cut + 1);
parser->NewData(1, data + cut + 1, data + len + 1);
printf("\nMessage parsed:\n\n");
printf(" method: %s\n", conn->method);
printf(" uri: %s\n", conn->uri);
printf(" version: %s\n\n", conn->version);
close:
close(fd);
end:
return(0);
}