Evaluation

We evaluate the effectiveness of TIE for achieving our goal of minimizing sensitivity to equipment failures on the Abilene network and a tier-1 ISP backbone. We obtain the network topology

and the egress sets $\{E(p)\}$ as described in the Appendix. For this problem, we set the IGP link weights to the geographic distance between the PoPs to approximate the propagation delay. We optimize TIE for two sets of topology changes $\Delta G$ (single link failures and single node failures) and three different delay thresholds

(

, and

We ran the simulation and the optimization phases on different machines because the raw measurement data could only be stored on one machine, and the CPLEX license resides on another. The simulation phase ran on a 900MHz Ultrasparc-III Copper processor of a Sun Fire 15000. This phase consumed

MB of RAM and took

and

seconds to build the constraints for all pairs

for the Abilene and ISP networks, respectively. The optimization phase ran on a 196 MHz MIPS R10000 processor on an SGI Challenge. This phase consumed just under

MB of RAM and took

seconds and

minutes to run for the Abilene and ISP networks, respectively. We expect that the optimization phase would complete much faster if we invoke the CPLEX library directly from a C program rather than the AMPL interpreter.

For the Abilene network, $\alpha$ was equal to

for 93% of the

tuples and had only four distinct values ( $\alpha \in [1,4]$ ); $\beta$ was zero for $90\%$ of the

tuples and had only three distinct values ( $\beta \in \{0, 1, 3251\}$ ). The ISP network has a much larger number of destination prefixes and distinct egress sets, which resulted in a broader range of values for the parameters ( $\alpha \in [1,19]$ and $\beta \in \{0, 1, 3411, 4960, 5185, 5009\}$ ). However, the vast majority of $\alpha$ values ( $88\%$ ) were equal to one, and $69\%$ of $\beta$ values were zero. The small number of distinct values for the parameters, and the large number of $\alpha(i,p,e)=1$ and $\beta(i,p,e)=0$ , help reduce the overhead of configuring and storing the parameters, as discussed in more detail in Section VI.

**Figure 5:** Comparison of egress-selection schemes on the Abilene network under single-node failures with TIE optimized for single-link failures and .
$\begin{figure}\begin{center} \begin{tabular}{cc} \epsfig{figure=./abilene.popf.... ... (CCDF) & (b) Routing sensitivity (CCDF) \end{tabular} \end{center}\end{figure}$

After generating the values of $\alpha(i,p,e)$ and $\beta(i,p,e)$ for each one of these scenarios, we simulate the behavior of each network with this configuration. For comparison, we also simulate the behavior of the network using hot-potato routing (by setting $\alpha(i,p,e)=1$ and $\beta(i,p,e)=0$ for all

), and the fixed ranking egress selection (by setting $\alpha(i,p,e) = 0$ for all

, and $\beta(i,p,e) = d(G,i,b(G,i,p))$ ). We simulate the behavior of these egress-selection policies under the set of all single-link failures and the set of all single-node failures. For conciseness, we only present the results for single-node failures, the results for the other instances lead to the same conclusions. We compare the three mechanisms using two metrics:

Figure 5(a) presents the complementary cumulative distribution function (CCDF) of the delay ratio for the Abilene network. A delay ratio equal to one means that the delay after the failure is the same as the delay in the original network. Many of the node failures do not affect the path between an ingress node and a best egress node for a prefix. Therefore, we omit all values that had a delay ratio of one. Given that the link weights are set according to geographic distance, the delay ratio achieved by hot-potato routing represents the smallest feasible delay ratio. Fixed ranking represents the delay to reach the old egress point after the failure. In this plot, we present the results for TIE optimized for single-link failures and

, and evaluate the schemes against single-node failures. The results of TIE optimized for single-node failures were very similar (in fact most of the values of $\alpha$ and $\beta$ were the same).

**Figure 6:** Comparison of egress-selection schemes on the ISP network under single-node failures for TIE optimized for single-link failures and .
$\begin{figure}\begin{center} \begin{tabular}{cc} \epsfig{figure=./cbb.popf.l3.d... ... (CCDF) & (b) Routing sensitivity (CCDF) \end{tabular} \end{center}\end{figure}$

Despite being optimized for a different set of topology changes, TIE still behaves according to the original goal. TIE exceeds the delay threshold of

for only $20\%$ of the $(i,p,\delta)$ , and hot-potato routing also exceeds the threshold in each of these cases. Fixing the ranking of egress points leads to delays that are higher than the delay achieved by TIE in the majority of instances. Whenever the fixed-ranking scheme lies below the threshold of

, TIE is below it as well. When the fixed-ranking scheme exceeds the threshold, TIE shifts to an egress point that is at or below the threshold. This is the reason why the TIE curve lies below the fixed-ranking curve for delay ratios under

Below the threshold of

TIE, has higher delay than hot-potato routing in exchange for lower sensitivity values as shown in Figure 5(b). This graph plots the CCDF of routing sensitivity for all $(i,\delta)$ pairs. Fixing the ranking of egress points has the lowest sensitivity. In fact, the fixed-ranking scheme has a non-zero sensitivity only when the best egress point fails, forcing even this scheme to change to the second-ranked egress point (i.e., the one that was second-closest at the initial topology). The TIE curve follows the fixed ranking for most points. TIE only experiences egress changes when they are unavoidable. The gap between the hot-potato and the TIE curve--around $15\%$ of the $(i,\delta)$ pairs--represents the scenarios for which egress-selection disruptions could be avoided without violating the delay threshold.

Although we observe similar behavior in the results for the large ISP network (presented in Figures 6(a) and 6(b)), the gap between the curves is not as large as for the Abilene network. In this case, we optimize TIE for single-link failures with a delay threshold

. The ISP network has many more choices of egress points per prefixes than the Abilene network. Therefore, the delay to reach the closest egress point in the original topology is likely to be very small, and setting the threshold to three times this delay still gives reasonably short delays. This network also has more path diversity than the Abilene network. In a more diverse graph, it is more likely that there is still a low-delay path to the initial egress point, even after the failure. Contrasting the delay ratio and routing sensitivity of the two networks illustrates that there is not a single policy that fits all networks. Compared to the Abilene network, the ISP network could safely put more emphasis on setting the $\beta$ values, because its rich connectivity makes it unlikely that equipment failures would lead to significant changes in the IGP distance between a pair of routers. The TIE mechanism is flexible enough to accommodate both of these networks.

In this section, we assume that the egress set for each destination prefix is stable when determining the values of $\alpha$ and $\beta$ . Our evaluation shows that even when the an egress node is removed from the egress set, TIE behaves as expected. We can extend the formulation of this problem to find solutions that are robust to egress-set changes. For instance, we can configure TIE to react slowly to the announcement of new routes (i.e., additions to the egress set) by setting the values of $\alpha(\cdot,p,e)$ and $\beta(\cdot,p,e)$ to be very high for all $e \not \in E(p)$ . We can also extend our notion of graph transformations $\delta$ to include changes to the egress sets.