Many advanced computational science applications require a balanced and scalable distributed cyberinfrastructure. The process of designing and implementing distributed cyberinfrastructure to meet their diverse needs presents a challenging research agenda. This talk aims to convey some of our experience and excitement in the research and development of cyberinfrastructure for scientific applications.
Specifically, we discuss three different scales at which RADICAL-Cybertools are being designed and developed to support advances in computational science. At the wide area distributed computing scale, we highlight the concepts and abstractions underlying the RADICAL-WLMS (Work-Load Management System) and the importance of integrating resource and application information on heterogeneous and dynamic resources such as OSG and XSEDE. At the application level we discuss how RADICAL-Pilot provides an effective resource management abstraction for domain specific workflows systems, and allows the scalable execution of thousands (and soon tens of thousands) of concurrent and coupled simulations. At the system level we discuss how RADICAL-Pilot is being interfaced with OpenMPI to support the effective execution of tens of thousands of simulations.