转载地址:http://netlab.caltech.edu/projects/ns2tcplinux/ns2linux/tutorial/index.html
A mini-tutorial for TCP-Linux in NS-2
(Part of the NS-2 enhancement project)
David X. Wei
Netlab @ Caltech
Initial Draft: May 2006;
Revision 1 for parameter tunings: Sep 2007.
This tutorial is dedicated to people who want to use TCP-Linux to do NS-2 simulations. For information on how to install TCP-Linux into NS-2, see TCP-Linux website. For general tutorials of NS-2, see the NS-2 website.
Table of Content:
- Change your existing NS-2 simulation script to use TCP-Linux with default parameters
- Change parameters of Linux congestion control modules in TCP-Linux simulations
- Change parameters of Linux system in TCP-Linux simulations
- Update the TCP congestion control module source codes with a newer linux kernel
- Develop your own congestion control algorithm with TCP-Linux
- Q&A
- Acknowledgment
Change your existing NS-2 simulation script to use TCP-Linux with default parameters
It is very simple to change existing an existing NS-2 simulation script to use TCP-Linux. Two changes need to be done:
- Change the tcp agent to "Agent/TCP/Linux".
- Make sure the TCP Sink has Sack1 support. That is, either you are using Agent/TCPSink/Sack1 or Agent/TCPSink/Sack1/DelAck . Currently, TCP-Linux does not support receivers without SACK. (More accurately, the results without SACK have not been validated by comparing against emulation results.)
- Add a TCP command "select_ca <the name of your congestion control algorithm>"
- (Optional) delete any assignment of windowOption_. This is not necessary. Once a congestion control algorithm is selected, the value of windowOption_ has no effect to the real code. But deleting this assignment may avoid any confusion to others who read the script.
The following table shows an example. The left side is a simple NS-2 simulation script that uses SACK1 as the TCP sender. The right side is the corresponding NS-2 simulation script that uses
TCP-Linux
as the TCP sender. Once the two blue on the left lines are changed to the two red lines on the right,
TCP-Linux
is effective.
WARNING: You also need to set "window_" option in tcp agent to be large enough to see the performance difference. "window_" is the upperbound of congestion window in a TCP. It is 20 by default.
A script for TCP-Sack1 | A script for TCP-Linux using Highspeed TCP (hstcp) |
#Create a simulator object set ns [new Simulator] #Create two nodes and a link #setup sender side #set up receiver side #logical connection #Setup a FTP over TCP connection #Schedule the life of the FTP #Schedule the stop of the simulation #Start the simulation |
#Create a simulator object set ns [new Simulator] #Create two nodes and a link #setup sender side #set up receiver side #logical connection #Setup a FTP over TCP connection #Schedule the life of the FTP #Schedule the stop of the simulation #Start the simulation |
Change parameters of Linux congestion control modules in TCP-Linux simulations
Linux kernel congestion control modules may have module parameters, for example, the alpha, beta and gamma parameters in TCP Vegas. All the connections in a Linux system share the same parameter values.
The
TCP-Linux
supports changing congestion control parameters in the simulation system-wide (as Linux supports), or, on per-connectoin basis.
The command
set_ca_default_param
changes the default value of a parameter system-wide. This is a very efficient implementation (similar to Linux) without simulation overhead. However, once a default value is changed system-wide, all connections using this parameter default value are affected. (Connections that have defined their own local parameter values will not be affected.)
The command
set_ca_param
changes a parameter value in a particular flow. The change is local to this particular flow. Other flows running the same congestion control algorithm continue using their own local values of parameters (if they have ever called set_ca_param) or using the default value. This implementation introduces simulation overhead for each packet being processed. The simulation runs slower if there are many flows that set their local parameters.
For optimum simulation performance, it is suggested that we set the default values to be the values that are used by the majority of the flows (and these flows do not have to set their local values). For flows that can not use the default values, we set local values for them.
Changing Default Parameters
If all the TCP/Linux flows in the simulation have the same set of parameter values, we can use "set_ca_default_param" command to change the default parameters any time. If any flow calls this command, all other flows can see the changes, too.
The format of the command is:
< tcp instance > set_ca_default_param < algorithm name> < parameter name> < new value>
To print out the current default value of a parameter, the command get_ca_default_param:
< tcp instance > get_ca_default_param < algorithm name> < parameter name>
Changing Local Parameters on a per-flow basis
If a particular flow has to use a different value for a parameter, we can use "set_ca_param" command to change the local value of a parameter any time. This command may slow down the simulation.
The format of the command is:
< tcp instance > set_ca_param < algorithm name> < parameter name> < new value>
To print out the current local value of a parameter, the command get_ca_param:
< tcp instance > get_ca_param < algorithm name> < parameter name>
Example
The following table shows an example of changing parameters in TCP Vegas.
At time 3 sec, the Vegas parameters (both alpha and beta) are changed to 40. 40 is equivalent to 20 packets because Vegas uses the last bit of the parameters for accuracy preservation.
Note that this change is a global change on Vegas parameters. All the TCP-Linux which is running TCP-Vegas without per-connection parameters will be affected by this change. In the following example, we only have tcp(1) change default parameters, tcp(2) can see the new values too.
At time 6 sec, tcp(3) changes its local parameter of alpha and beta to be 20 (equivalent to 10 packets). Due to a smaller alpha value than other flows, tcp(3) will see smaller throughput from 6 sec.
And at the bottleneck queue, the queue length will be around 9 packets from 0 to 3 seconds, around 60 packets from 3 to 6 seconds, and around 50 packets from 6 to 10 seconds.
A script for TCP-Linux using TCP-Vegas (vegas) |
#Create a simulator object set ns [new Simulator] #Create a bottleneck link. #set up receiver side #logical connection #Setup a FTP over TCP connection #Schedule the life of the FTP #change default parameters, all TCP/Linux will see the changes! # change local parameters, only tcp(3) is affected. #Start the simulation |
Change parameters of Linux system in TCP-Linux simulations
The patch supports the simulation to change Linux parameters (out side the congestion control modules) in the same way as the congestion control modules. The Linux system is regarded as a special module "linux". Hence, get_default_ca_param, set_default_ca_param, get_ca_param, set_ca_param can also tune the Linux parameters. For example:
set_default_ca_param linux sysctl_tcp_abc 0
turns off the ABC option of in Linux system.
All the Linux system variables are listed in tcp/linux/ns-linux-param.c. The following table summarizes all the parameters currently available:
Variable Name | Default value | Description |
sysctl_tcp_abc | 1 | 0: Turn off Appropriate Byte Counting (ABC); 1: Turn on ABC. Turn on for faster cwnd growth in bulk transfer. |
tcp_max_burst | 3 | The maximum number of packets that can be sent back-to-back during loss recovery. This parameter controls the maximum burst size. |
debug_level | 1 | The verbose level of debug message. 0: print everying including INFO; 1: print ERROR and NOTICE; 2: print ERROR only |
Update the TCP congestion control module source codes with a newer linux kernel
Take the following steps:
- download the latest linux kernel source code (e.g. from kernel.org). (For simiplicity of the explanation, let's say you place the kernel source code in /tmp/kernel_src/ directory.)
- go to your NS-2 directory by: cd < NS2-Directory>/tcp/linux
- run sh migrate.sh < path of the linux kernel> < directory name you want to save the src code>.
For example,
sh migrate.sh /tmp/kernel_src/ 2.6.25
will copy all the relevant files (usually tcp_* files in net/ipv4 directory) from /tmp/kernel_src/ to < NS2-Directory>/tcp/linux/2.6.25/ , remove the old directory of < NS2-Directory>/tcp/linux/src/ , and create a soft link from < NS2-Directory>/tcp/linux/2.6.25/ to < NS2-Directory>/tcp/linux/src/ - compile, run and compare the simulation results with Linux experiments results.
You might encounter one of the following problems in the last step:
- If the new kernel source code has new congestion control algorithms in new files, add records in Makefile by adding items to let compiler know the new codes:
tcp/linux/src/< new code name> .o - If your algorithm requires access to many new fields in Linux TCP structure, you might need to add more fields to struct tcp_sock in tcp/linux/ns-linux-util.h;
Develop a new congestion control algorithm with TCP-Linux
Here we explain the very basic concepts which are enough for developing simple loss-based algorithms.
An example: the implementation of a very simple Reno
The following table gives the implementation of a very simple Reno (In fact, it's a FACK since
TCP-Linux
takes care of all loss detection, loss recovery and rate-halving.)
This Reno implementation includes two parameters: AI parameter (alpha) and MD parameter (beta). Every RTT without loss, the congestion window is increased by alpha pkt. Every loss event, the congestion window is reduced by 1/beta*cwnd. By default, alpha=1 and beta=2, as the Reno algorithm.
Naive Reno ( u32 in the codes are equivalent to unsigned int) |
/* This is a very naive Reno implementation, shown as an example on how to develop a new congestion control algorithm with TCP-Linux. */ /* This file itself should be copied to tcp/linux/ directory. */ /* To let the compiler compiles this file, an entry "tcp/linux/<NameOfThisFile>.o" should be added to Makefile */ /* This definition lets the compiler knows the name of this protocol */ /* This two header files link your implementation to TCP-Linux */ /* Define a parameter alpha for AI parameter */ /* Define a parameter beta for MD parameter */ /* This equivalent to opencwnd in other implementation of NS-2. */ /* This function returns the slow-start threshold after a loss.*/ /* This function returns the congestion window after a loss — it is called AFTER the function ssthresh (above) */ /* a constant record for this congestion control algorithm */ /* defines a initialization function */ |
As in the example above, an implementation includes five parts:
- The header files to link the implementation to TCP-Linux;
- Optionally, define and decluare parameters — parameters have to be defined static because different modules might have different parameters with the same variable names
- Implementation of (at least) the three congestion control functions defined in struct tcp_congestion_ops: cong_avoid, ssthresh and min_cwnd;
- A static record of struct tcp_congestion_ops to store the function calls and algorithm's name.(In the example, I gave the algorithm a name " naive_reno")
- An module initialization function which calls tcp_register_congestion_control to register the module.
After
copying the file to tcp/linux
,
changing Makefile
, you can run the algorithm by adding "
select_ca naive_reno
" in your tcl script.
To develop your algorithm seriously, please go ahead to read the following details.
To fully understand the process, readers are expected to have knowledge in C programming.
For more complicated algorithms, readers are encouraged to read the Linux Kernel Documents:
Documentation/networking/tcp.txt (in any Linux kernel source code with Version 2.6.13 or above)
.
Data structure interface
TCP-Linux
exposes several important variables in Linux TCP to NS-2 (in
tcp_sock
structure of
tcp/linux/ns-linux-util.h
in the NS-2 code patched with
TCP-Linux
), as listed in the following table. Most of these variables are read-only, except the red ones (
snd_ssthresh
,
snd_cwnd
,
snd_cwnd_cnt
, and
icsk_ca_priv
).
Variable Name | type (32bit by default) |
Meanings | equivalence in existing NS-2 TCP |
snd_nxt | unsigned | The sequence number of the next byte that TCP is going to send. | t_seqno_*size_ |
snd_una | unsigned | The sequence number of the next byte that TCP is waiting for acknowledgment | (highest_ack_+1)*size_ |
mss_cache | unsigned | The size of a packet | size_ |
srtt | unsigned | 8 times of the smooth RTT | t_srtt_ |
rx_opt.rcv_tsecr | unsigned | Value of timestamp echoed by the last acknowledgment | ts_echo_ |
rx_opt.saw_tstamp | bool | Whether tiemstamp is seen in the last acknowledgment | !hdr_flags::access(pkt)->no_ts_ |
snd_ssthresh | unsigned | Slow-Start threshold | ssthresh_ |
snd_cwnd | unsigned | Congestion window | trunc(cwnd_) |
snd_cwnd_cnt | unsigned (16 bit) |
Fraction of congestion window which is not accumulated to 1 | trunc(cwnd_*cwnd_)%cwnd_ |
snd_cwnd_clamp | unsigned (16bit) |
upper bound of the congestion window | wnd_ |
snd_cwnd_stamp | unsigned | the last time that the congestion window is changed (to detect idling and other situations) | n/a |
bytes_acked | unsigned | the number of bytes that were acknowledged in the last acknowledgment (for ABC) | n/a |
icsk_ca_state | unsigned (8bit) |
The current congestion control state, which can be one of the followings: TCP_CA_Open: normal state TCP_CA_Recovery: Loss Recovery after a Fast Transmission TCP_CA_Loss: Loss Recovery after a Timeout (The following two states are not effective in TCP-Linux but is effective in Linux) TCP_CA_Disorder: duplicate packets detected, but haven't reach the threshold. So TCP shall assume that packet reordering is happening. TCP_CA_CWR: the state that congestion window is decreasing (after local congesiton in NIC, or ECN and etc). |
n/a |
icsk_ca_priv | unsigned[16] | private data for individual congestion control algorithm for this flow | n/a |
icsk_ca_ops | struct tcp_congesiton_ops* | a pointer to the congestion control algorithm structure for this flow | n/a |
Congestion control algorithm interface
The congestion control algorithm interface is described in
struct tcp_congestion_ops
, which is a structure of function call pointers.
The structure is defined as below (in
tcp/linux/ns-linux-util.h
in the NS-2 code patched with
TCP-Linux
):
struct tcp_congestion_ops {
char
name[
16
];
void
(*cong_avoid)(
struct
tcp_sock *sk,
unsigned int
ack,
unsigned int
rtt,
unsigned int
in_flight,
int
good_ack);
unsigned int
(*ssthresh)(
struct
tcp_sock *sk);
unsigned int
(*min_cwnd)(
struct
tcp_sock *sk);
unsigned int
(*undo_cwnd)
(
struct
tcp_sock *sk);
void
(*rtt_sample)(
struct
tcp_
sock *sk,
unsigned int
usrtt);
void
(*set_state)
(
struct
tcp_
sock *sk,
unsigned int
newstate);
void
(*cwnd_event)
(
struct
tcp_
sock *sk,
enum
tcp_ca_event ev
);
void
(*pkts_acked)
(
struct
tcp_
sock *sk,
unsigned int
num_acked,
ktime_t
last);
void
(*init)(
struct
tcp_
sock *sk);
void
(*release)(
struct
tcp_
sock *sk);
}
name[16]
is the name of the TCP congestion control algorithm. This will be the name for "
select_ca
" command in tcl script.
struct
sock*
tcp_s
k
is always the pointer of
the TCP data structure
of the flow.
The three function calls (in red) are function calls that are
REQUIRED
to be implemented. Others are optional. They are explained in the table below:
function name | explanation |
cong_avoid | This function is called every time an acknowledgment is received and the congestion window can be increased. This is equivalent to opencwnd in tcp.cc. ack is the number of bytes that are acknowledged in the latest acknowledgment; rtt is the the rtt measured by the latest acknowledgment; in_flight is the packet in flight before the latest acknowledgment; good_ack is an indicator whether the current situation is normal (no duplicate ack, no loss and no SACK). Value: 1 for normal, 0 for dubious |
ssthresh | This function is called when the TCP flow detects a loss. It returns the slow start threshold of a flow, after a packet loss is detected. |
min_cwnd | This function is called when the TCP flow detects a loss. It returns the congestion window of a flow, after a packet loss is detected; (for many algorithms, this will be equal to ssthresh). When a loss is detected, min_cwnd is called after ssthresh. But some others algorithms might set min_cwnd to be smaller than ssthresh. If this is the case, there will be a slow start after loss recovery. |
undo_cwnd | returns the congestion window of a flow, after a false loss detection (due to false timeout or packet reordering) is confirmed. This function is not effective in the current version of TCP-Linux. |
rtt_sample | This function is called when a new RTT sample is obtained. It is mainly used by delay-based congestion control algorithms which usually need accurate timestamps. usrtt is the RTT value in microsecond (us) unit. |
set_state | This function is called when the congestion state of the TCP is changed. newstate is the state code for the state that TCP is going to be in. The possible states are listed in the data structure interface table. It is to notify the congestion control algorithm and is used by some algorithms which turn off their special control during loss recovery. |
cwnd_event | This function is called when there is an event that might be interested for congestion control algorithm. ev is the congestion event code. The possible events are: CA_EVENT_FAST_ACK: An acknowledgment in sequence is received; CA_EVENT_SLOW_ACK: An acknowledgment not in sequence is received; CA_EVENT_TX_START: first transmission when no packet is in flight CA_EVENT_CWND_RESTART: congestion window is restarted CA_EVENT_COMPLETE_CWR: congestion window recovery is finished. CA_EVENT_FRTO: fast recovery timeout happens CA_EVENT_LOSS: retransmission timeout happens |
pkts_acked | This function is called when there is an acknowledgment that acknowledges some new packets. num_acked is the number of packets that are acknowledged by this acknowledgments. last is the time (in microsecond) when the latest acked packet was sent. A value of 0 means no timestamp measurement is collected for this acked packet. |
init | This function is called after the first acknowledgment is received and before the congestion control algorithm will be called for the first time. If the congestion control algorithm has private data, it should initialize its private date here. |
release | This function is called when the flow finishes. If the congestion control algorithm has allocated additional memory other than the 16 unsigned int of icsk_ca_priv, it should delete the additional memory here to avoid memory leak. |
The process to implement a new (and simple) congestion control algorithm
- Understand the data structure interface and congestion control interface
- Give a name for your congestion control algorithm — this name will be used in the "select_ca" command.
- Implement at least the three required congestion control functions (cong_avoid, ssthresh, and min_cwnd) in the congestion control interface
- Create a constant struct struct tcp_congestion_ops YourCongestionControlStructure {…} with the name, cong_avoid, ssthresh, and min_cwnd (and/or other congetion implemented functions) filled
- Include two header files: "linux-linux-util.h" and "ns-linux-c.h".
- copy your file (mytcpfile.c) to tcp/linux/ directory
- Add an entry in Makefile: tcp/linux/mytcpfile.o to let the compiler knows to compile your file
- compile, run and check the simulation results
Q&A
1. What happens if the select_ca command selects a non-existing congesiton control algorithm (e.g. "highsped" by a typo)?
TCP-Linux
will first display an error message to the screen:
Error: do not find highsped as a congestion control algorithm
. Then,
TCP-Linux
calls the default congestion control algorithm in
tcp.cc
. (And in this case, the value of
windowOption_
is effective.)
2. I found some fairness issue in TCP Vegas.
Please check the known Linux bugs page to make sure it is really the problem of the algorithm, not a bug in Linux implementation.
Acknowledgment
This work is inspired and greatly helped by Prof. Pei Cao at Stanford and by Prof. Steven Low at Caltech. Many thanks to them!