Synchronizing Policy Scripts
From BroWiki
Contents |
Introduction
To reduce load on an individual Bro, analysis can be split across multiple peers, a distributed cluster. In such a setup, incoming traffic is usually split up in slices and forwarded to one particular peer. Every peer performs its analysis on a a different slice of the whole traffic. A split of traffic adheres to a loss of state information. To mitigate this loss, Bro provides mechanisms to exchange state operations among the peers.
In an ideal setup, the cluster should appear transparently as one single Bro and should yield the same analysis results. However, this quest is not easy to accomplish since we face many tradeoffs during the setup.
Synchronizing
There are two different ways of synchronizing:
- Script level:
&synchronizedattribute. - Event level: exchange of events. The events are exchanged after the a peer connection has been established.
In general, it is recommended to synchronize as much as possible in the script level. Only if this fails, an event level synchronization should be considered, because it is more expensive.
Common pitfalls
- counters and sets
- Some scripts use counters cohering to sets (e.g. scan.bro). As soon as an element expires from a set, the counter decrements in an expire function. If these two operations don't occur subsequently, inconsitencies are the consequence. Imagine now we synchronize the two variables. First, the expiration of the set will be propagated to the remote side (and decrement the counter in the expire function as well!). Second, the counter in the expire function propagates another update, although it has been already decremented by the propagation of the expiration. This inconsistency may lead to a count underflow or even worse problems.
- Countermeasures:
- We can take advantage of the peculiarity of sets to mitigate this problem. Sets have the property that inserting an already existing element doesn't have any effect. One solution is to omit the counter and using the cardinality of set instead (e.g.
|my_set|). Now, multiple incoming expiration operations for the same element don't decrement the counter twice. - If there is no way to omit the semantically adhered set or table in the expire function, you can still suppress state updates. Consider the following example:
- We can take advantage of the peculiarity of sets to mitigate this problem. Sets have the property that inserting an already existing element doesn't have any effect. One solution is to omit the counter and using the cardinality of set instead (e.g.
global foo: table[addr] &synchronized &create_expire = 10 mins &expire_func=foo_exp;
global bar: table[addr] &synchronized &create_expire = 30 mins;
function foo_exp(...) : interval
{
suspend_state_updates();
delete bar[idx];
resume_state_updates();
return 0 secs;
}
- Deleting an element from the table
foowill automatically delete the corresponding entry inbar. Sincebaris synchronized, this delete state operation will be propageted to the participating peers. They themselves expire it as well. In such a case, a lot of superfluous delete operations are transfered. By embracing the statement withsuspend_state_updates()andresume_state_updates(), you suppress any state updates in this block.
- Deleting an element from the table
- splitting multi-indexed tables
- Several tables (and so sets) have more than one index. For example, the set
distinct_peers: set[addr, addr]of the scan Analyzer is indexed by originator and responder address. To count the number of distinct responders, the natural behavior was to introduce a new table[addr], e.g.num_distinct_peers: table[addr] of count &default=0 &create_expire = 5 mins. However, this cumbersome way includes unnecessary complexity. First, an additional table only for counting needs extra memory and cpu and second, difficult synchronization issues can emerge (see above). - Countermeasures:
- We suggest splitting such multi-indexed tables and sets into a multi-dimensional table.
num_distinct_peers: table[addr] of countbecomesdistinct_peers: table[addr] of set[addr]. - Note that after having split up the set, the expire semantics have to be adapted. the
&create_expire = 5 mins...
- We suggest splitting such multi-indexed tables and sets into a multi-dimensional table.
- changing local sets
- If a set or table belongs to multi-dimensional data type (e.g.
global foo: table[addr] of set[addr] &synchronized), the inner data type -in this case theset[addr]- usually has local scope. In this case, each peer can fill his localset[addr]with entries, independent from the others. As soon as the global synchronized table propagates changes for a particular index, a unique identifier (#hostname#pid#counter) for the local set is used to recognize it on the remote machine. Yet, race conditions can lead to inconsitencies when each peer has different local set content refering to the same table index. - Countermeasures:
- We use the attribute
&mergeableto avoid the race conditions. (todo) - Further, different ids for the same set are treated as an alias. (todo)
- We use the attribute
- threshold checks
- Many scripts trigger alerts or notices depending on passing a threshold. Prior to the ability of merging sets, transgressing a threshold could be checked by a simple check for equality, for example
|distinct_ports[orig]| == possible_port_scan_thresh. Since we avoid race-conditions in local sets with the&mergeableattribute, the cardinality of sets may jump and not continously increment by one. - Countermeasures:
- Hence, the
==operator becomes>=in such cases.
- Hence, the
Configuration examples
The default set of configuration files doesn't ship with support for clustering, so we encourage you to create your own overlay configuration files using the @prefix mechanism.
References
This article is a Stub. You can help Bro-Wiki by expanding it.