Computing eavesdropping probability on an "Ideal" TOR network

First of all, consider this TOR network as an "ideal" one (http://en.wikipedia.org/wiki/Ideal_gas)

We know agencies (e.g. NSA) run a number of both rely and exit nodes of the TOR network and is not very difficult to understand the reason :)
Number of running total nodes nodes(R) and exit nodes(E) is freely available[1].
Default number of hops (h) randomically chosen  inside TOR network is equal to 3: 2 (h-1) relay nodes and 1 exit node.

Unique paths ($N$) between a Tor client (Alice) and a chosen destination(Bob) is given by the number of combination[2] of $h$ distinct elements of set $R$ multiplying the number of exit nodes, minus those chains where is present the same exit nodes more than once:

$N = \dbinom{R}{h-1}E - \dbinom{R-1}{h-2}E$

in this case $h \ll R$ so we could apply the approximation: 

$for\ n \ll k: \dbinom{n}{k} \approx \frac{n^k}{k!} $


$N = \dbinom{R}{h-1}E - \dbinom{R-1}{h-2}E \approx \frac{R^{h-1}}{(h-1)!}E - o(R^{h-1})$

$N \approx \frac{R^{h-1}}{(h-1)!}E$

At the time of writing these values are:

$R = 4202$

$E = 934 $


$h = 3 $

We can easily find the N value

$N = \dbinom{4202}{2}934 - \dbinom{4201}{1}934 \approx \frac{4202^2}{2!} \cdot 934 \approx 8 \cdot 10^{9} $

We know an "ideal TOR network" is a strong tool for anonymization, and even if some of the nodes we’re using are compromised we could feel quite safe. Obviously, in case the entire path is compromised every recipient: sender, destination and payload of the session is no longer secret. Let’s calculate the probability a connection over TOR is fully eavesdropped by controlling due a full compromised chain:

By definition probability($p$) of an event is given by:

$p = \frac{Number\ of\ Favorable\ Outcomes}{Total\ Number\ of\ Possible\ Outcomes} $

in this case $p$ is given by the ratio between the number of possible unsafe chains($N_c$) and the total number of possible chains($N$).

$p = \frac{N_c}{N} \approx \frac{R_c^{h-1}}{R^{h-1}}\cdot\frac{E_c}{E} = \left (\frac{R_c}{R} \right )^{h-1}\cdot\left (\frac{E_c}{E} \right )$

where $N_c$ and $E_c$ are respectly the numbers of compromised relay and exit nodes.

let's call ratios $\frac{Rc}{R}$ and $\frac{Ec}{E}$ respectly $Q_R$ and $Q_E$ and here it is our simple and beautiful equation:

$\boldsymbol{p = Q_R^{h-1}\cdot Q_E}$

It is worth calculating the "best strategy" in deploying malicios nodes in order to maximize $p$:
$p \approx \frac{R_c^{h-1}}{R^{h-1}}\cdot\frac{E_c}{E} $

the number of "evil" nodes to be deployed is $R_c$ both relay and exit nodes: $E_c$ is a subset of $R_c$.

$E_c = R_c \cdot z$

where $z$ is the ratio between relaying nodes and exit nodes.

$p \approx \frac{R_c^{h-1}}{R^{h-1}}\cdot\frac{R_c \cdot z}{E} ; 0 \le z \le 1$
$p(z) \approx \frac{R_c^{h} \cdot z}{R^{h-1} \cdot E}$

it's trivial to find maximum value of $p(z)$ is at $z=1$, in other words when all the to-be-deployed nodes are setted as exit nodes.
Once achieved this piece of information let's calculate how many new nodes must be deployed in a TOR network in order to gain a given probability of fully eavesdrop someone.

the previous formulas could be rewritten in function of "evil nodes"($R_c$) and due to the previous result these nodes will be all exit nodes ($z=1; R_c=E_c$):

$p(R_c) \approx \frac{R_c^h}{(R+R_c)^{h-1} \cdot (E+R_c)}$

and rewrite this in term of normalized $R_c$ :

$X = \frac{R_c}{R} ; R_c = R \cdot X$
$p(X) \approx \frac{(R \cdot X)^h}{(R+R \cdot X)^{h-1} \cdot (E+R \cdot X)}$

here is a plot of the formula above:

x-axis represents how many "evil" exit nodes has been deployed versus the "good" nodes;
for instance: if it has been deployed 2000 "evil" nodes and there are 1000 "good" nodes already deployed the corresponding value on x-axis  is 2 (2000/1000).

The graph shows clearly must be deployed a large number of evil exit nodes in order to have a reasonable probability to eavesdrop someone.
In order to have a 50% of chance to eavesdrop someone must be deployed a number a of evil nodes greater about four times the number of "good" exit nodes or taking control of 4/5 of the total exit nodes...


TOR is a great tool for anonymity, and its power comes from its distributed nature.
This short paper is merely a mathematical analysis of an "ideal TOR Network", a simplified network model useful to show some important properties of it.
Probability of being eavesdropped on a TOR network is dependent of number of compromised nodes:

$\boldsymbol{p = Q_R^{h-1}\cdot Q_E}$

next, the "optimous" strategy in deploying "evil" nodes in order to maximize $p$ is to configure them all as exit nodes

$p \approx \frac{R_c^{h}}{R^{h-1} \cdot E}$
this result is as trivial as interesting:
in order to maximize efforts to find "evil" nodes, looking for them among exit nodes would be a good strategy too.

The last consideration should convince us deploying new "evil" nodes in order to eavesdrop someone isn't a good strategy at all.
Let's consider the "at the time of writing" number of exit nodes in the real TOR network:
$E = 934$

a potential attacker, in order to gain 50% probability to eavesdrop someone, needs to deploy about 4000 own nodes, or compromise about 750 "good" nodes.
If I was the attacker I would keep a low profile and invest resources and time in a good strategy aimed to compromise "good" nodes and not, actually, in a baroque, expensive and noisy nodes mass deploying.

[1] e.g. http://torstatus.blutmagie.de/

No comments: