Abstract
Wide-area data transfers in high-performance computing infrastructures and big data applications are increasingly carried over dynamically provisioned dedicated network connections, which provide high capacities with no competing traffic. Throughput of TCP and UDT over wide-area connections depends, often in a complex non-linear way, on the congestion control mechanism, buffer sizes, and the number of parallel streams. In addition, our extensive TCP and UDT throughput measurements and time traces over a suite of physical and emulated 10 Gbps connections with 0-366 ms round-trip times (RTTs) show significant statistical and temporal variations. Consequently, parameter selection and optimization for transport methods require data analytics to complement analytical throughput models. We present analytics based on the concavity-convexity of throughput profiles which provide insights into peak throughput and superior/inferior trend compared to linear interpolations based on RTT. In particular, we propose the utilization-concavity coefficient, a scalar metric that incorporates both utilization and concavity profile to characterize the overall performance of a transport protocol. These measurements-based anaytics enable us to select a transport protocol and its parameters for a given host and connection configuration to achieve high throughput with statistical guarantees.