We know agencies (e.g. NSA) run a number of both rely and exit nodes of the TOR network and is not very difficult to understand the reason :)
Number of running total nodes nodes(R) and exit nodes(E) is freely available[1].
Default number of hops (h) randomically chosen inside TOR network is equal to 3: 2 (h-1) relay nodes and 1 exit node.
Unique paths ($N$) between a Tor client (Alice) and a chosen destination(Bob) is given by the number of combination[2] of $h$ distinct elements of set $R$ multiplying the number of exit nodes, minus those chains where is present the same exit nodes more than once:
$N = \dbinom{R}{h-1}E - \dbinom{R-1}{h-2}E$
in this case $h \ll R$ so we could apply the approximation:
$for\ n \ll k: \dbinom{n}{k} \approx \frac{n^k}{k!} $
thus:
$N = \dbinom{R}{h-1}E - \dbinom{R-1}{h-2}E \approx \frac{R^{h-1}}{(h-1)!}E - o(R^{h-1})$
$N \approx \frac{R^{h-1}}{(h-1)!}E$
At the time of writing these values are:
$R = 4202$
$E = 934 $
assume:
$h = 3 $
We can easily find the N value
$N = \dbinom{4202}{2}934 - \dbinom{4201}{1}934 \approx \frac{4202^2}{2!} \cdot 934 \approx 8 \cdot 10^{9} $
We know an "ideal TOR network" is a strong tool for anonymization, and even if some of the nodes we’re using are compromised we could feel quite safe. Obviously, in case the entire path is compromised every recipient: sender, destination and payload of the session is no longer secret. Let’s calculate the probability a connection over TOR is fully eavesdropped by controlling due a full compromised chain:
By definition probability($p$) of an event is given by:
$p = \frac{Number\ of\ Favorable\ Outcomes}{Total\ Number\ of\ Possible\ Outcomes} $
in this case $p$ is given by the ratio between the number of possible unsafe chains($N_c$) and the total number of possible chains($N$).
where $N_c$ and $E_c$ are respectly the numbers of compromised relay and exit nodes.
let's call ratios $\frac{Rc}{R}$ and $\frac{Ec}{E}$ respectly $Q_R$ and $Q_E$ and here it is our simple and beautiful equation:
$\boldsymbol{p = Q_R^{h-1}\cdot Q_E}$
It is worth calculating the "best strategy" in deploying malicios nodes in order to maximize $p$:
$p \approx \frac{R_c^{h-1}}{R^{h-1}}\cdot\frac{E_c}{E} $
the number of "evil" nodes to be deployed is $R_c$ both relay and exit nodes: $E_c$ is a subset of $R_c$.
$E_c = R_c \cdot z$
where $z$ is the ratio between relaying nodes and exit nodes.
$p \approx \frac{R_c^{h-1}}{R^{h-1}}\cdot\frac{R_c \cdot z}{E} ; 0 \le z \le 1$
$p(z) \approx \frac{R_c^{h} \cdot z}{R^{h-1} \cdot E}$
it's trivial to find maximum value of $p(z)$ is at $z=1$, in other words when all the to-be-deployed nodes are setted as exit nodes.
Once achieved this piece of information let's calculate how many new nodes must be deployed in a TOR network in order to gain a given probability of fully eavesdrop someone.
the previous formulas could be rewritten in function of "evil nodes"($R_c$) and due to the previous result these nodes will be all exit nodes ($z=1; R_c=E_c$):
$p(R_c) \approx \frac{R_c^h}{(R+R_c)^{h-1} \cdot (E+R_c)}$
and rewrite this in term of normalized $R_c$ :
$X = \frac{R_c}{R} ; R_c = R \cdot X$
$p(X) \approx \frac{(R \cdot X)^h}{(R+R \cdot X)^{h-1} \cdot (E+R \cdot X)}$
here is a plot of the formula above:
x-axis represents how many "evil" exit nodes has been deployed versus the "good" nodes;
for instance: if it has been deployed 2000 "evil" nodes and there are 1000 "good" nodes already deployed the corresponding value on x-axis is 2 (2000/1000).
The graph shows clearly must be deployed a large number of evil exit nodes in order to have a reasonable probability to eavesdrop someone.
In order to have a 50% of chance to eavesdrop someone must be deployed a number a of evil nodes greater about four times the number of "good" exit nodes or taking control of 4/5 of the total exit nodes...
Conclusions
TOR is a great tool for anonymity, and its power comes from its distributed nature.
This short paper is merely a mathematical analysis of an "ideal TOR Network", a simplified network model useful to show some important properties of it.
Probability of being eavesdropped on a TOR network is dependent of number of compromised nodes:
$\boldsymbol{p = Q_R^{h-1}\cdot Q_E}$
next, the "optimous" strategy in deploying "evil" nodes in order to maximize $p$ is to configure them all as exit nodes
$p \approx \frac{R_c^{h}}{R^{h-1} \cdot E}$
this result is as trivial as interesting:
in order to maximize efforts to find "evil" nodes, looking for them among exit nodes would be a good strategy too.
The last consideration should convince us deploying new "evil" nodes in order to eavesdrop someone isn't a good strategy at all.
Let's consider the "at the time of writing" number of exit nodes in the real TOR network:
$E = 934$
a potential attacker, in order to gain 50% probability to eavesdrop someone, needs to deploy about 4000 own nodes, or compromise about 750 "good" nodes.
If I was the attacker I would keep a low profile and invest resources and time in a good strategy aimed to compromise "good" nodes and not, actually, in a baroque, expensive and noisy nodes mass deploying.
[1] e.g. http://torstatus.blutmagie.de/