Synchronizing Policy Scripts

From BroWiki

Jump to: navigation, search

Contents

Introduction

To reduce load on an individual Bro, analysis can be split across multiple peers, a distributed cluster. In such a setup, incoming traffic is usually split up in slices and forwarded to one particular peer. Every peer performs its analysis on a a different slice of the whole traffic. A split of traffic adheres to a loss of state information. To mitigate this loss, Bro provides mechanisms to exchange state operations among the peers.

In an ideal setup, the cluster should appear transparently as one single Bro and should yield the same analysis results. However, this quest is not easy to accomplish since we face many tradeoffs during the setup.

Synchronizing

There are two different ways of synchronizing:

  • Script level: &synchronized attribute.
  • Event level: exchange of events. The events are exchanged after the a peer connection has been established.

In general, it is recommended to synchronize as much as possible in the script level. Only if this fails, an event level synchronization should be considered, because it is more expensive.


Common pitfalls

counters and sets 
Some scripts use counters cohering to sets (e.g. scan.bro). As soon as an element expires from a set, the counter decrements in an expire function. If these two operations don't occur subsequently, inconsitencies are the consequence. Imagine now we synchronize the two variables. First, the expiration of the set will be propagated to the remote side (and decrement the counter in the expire function as well!). Second, the counter in the expire function propagates another update, although it has been already decremented by the propagation of the expiration. This inconsistency may lead to a count underflow or even worse problems.
Countermeasures:
  • We can take advantage of the peculiarity of sets to mitigate this problem. Sets have the property that inserting an already existing element doesn't have any effect. One solution is to omit the counter and using the cardinality of set instead (e.g. |my_set|). Now, multiple incoming expiration operations for the same element don't decrement the counter twice.
  • If there is no way to omit the semantically adhered set or table in the expire function, you can still suppress state updates. Consider the following example:
global foo: table[addr] &synchronized &create_expire = 10 mins &expire_func=foo_exp;
global bar: table[addr] &synchronized &create_expire = 30 mins;

function foo_exp(...) : interval
	{
	suspend_state_updates();
	delete bar[idx];
	resume_state_updates();

	return 0 secs;
	}
Deleting an element from the table foo will automatically delete the corresponding entry in bar. Since bar is synchronized, this delete state operation will be propageted to the participating peers. They themselves expire it as well. In such a case, a lot of superfluous delete operations are transfered. By embracing the statement with suspend_state_updates() and resume_state_updates(), you suppress any state updates in this block.
splitting multi-indexed tables
Several tables (and so sets) have more than one index. For example, the set distinct_peers: set[addr, addr] of the scan Analyzer is indexed by originator and responder address. To count the number of distinct responders, the natural behavior was to introduce a new table[addr], e.g. num_distinct_peers: table[addr] of count &default=0 &create_expire = 5 mins. However, this cumbersome way includes unnecessary complexity. First, an additional table only for counting needs extra memory and cpu and second, difficult synchronization issues can emerge (see above).
Countermeasures:
  • We suggest splitting such multi-indexed tables and sets into a multi-dimensional table. num_distinct_peers: table[addr] of count becomes distinct_peers: table[addr] of set[addr].
  • Note that after having split up the set, the expire semantics have to be adapted. the &create_expire = 5 mins ...
changing local sets 
If a set or table belongs to multi-dimensional data type (e.g. global foo: table[addr] of set[addr] &synchronized), the inner data type -in this case the set[addr]- usually has local scope. In this case, each peer can fill his local set[addr] with entries, independent from the others. As soon as the global synchronized table propagates changes for a particular index, a unique identifier (#hostname#pid#counter) for the local set is used to recognize it on the remote machine. Yet, race conditions can lead to inconsitencies when each peer has different local set content refering to the same table index.
Countermeasures:
  • We use the attribute &mergeable to avoid the race conditions. (todo)
  • Further, different ids for the same set are treated as an alias. (todo)
threshold checks 
Many scripts trigger alerts or notices depending on passing a threshold. Prior to the ability of merging sets, transgressing a threshold could be checked by a simple check for equality, for example |distinct_ports[orig]| == possible_port_scan_thresh. Since we avoid race-conditions in local sets with the &mergeable attribute, the cardinality of sets may jump and not continously increment by one.
Countermeasures:
  • Hence, the == operator becomes >= in such cases.

Configuration examples

The default set of configuration files doesn't ship with support for clustering, so we encourage you to create your own overlay configuration files using the @prefix mechanism.

References

This article is a Stub. You can help Bro-Wiki by expanding it.

Personal tools
User Management