Enabling Scalable Data Analysis of Large Computational Structural Biology Datasets on Distributed Memory Systems

Today, petascale distributed memory systems perform large-scale simulations and generate massive amounts of data in a distributed fashion at unprecedented rates. When clustering data, traditional analysis methods may require the comparison of single records with each other in an iterative process, and therefore involve moving data across system nodes. When both the data and the number of nodes increase, clustering methods can increase pressure on storage and bandwidth. Thus, the methods become inefficient and do not scale. New methodologies are needed to analyze data when it is distributed across nodes of large distributed memory systems.

When analyzing structural biology datasets, we focus on specific properties of the data records such as the molecular geometry or the location of a molecule in a docking pocket. In this talk we propose a methodology that allows the scalable analysis for large datasets composed of millions of individual structural biology records in a distributed manner on large distributed memory systems. The methodology is based on two general steps. The first step extracts concise properties or features of each data record and represents them as metadata in parallel. The second step performs the clustering on the extracted properties using machine-learning techniques. We apply the methodology to two different computational structural biology datasets to identify geometrical features. Our results show that our approach enables scalable clustering analyses for large-scale computational structural biology datasets on large distributed memory systems, and that our method achieves better accuracy compared to traditional analysis approaches.

Bio

Michela Taufer is the David L. and Beverly J.C. Mills Chair and Assoc. Prof. in Computer and Information Sciences at the Univ. of Delaware. She earned her MS in Computer Engineering from the Univ. of Padova and her PhD in Computer Science from the Swiss Federal Institute of Technology. From 2003-04 she was a La Jolla Interfaces in Science Training Program Postdoctoral Fellow at the Univ. of Calif. San Diego and The Scripps Research Institute, where she worked on interdisciplinary projects in computer systems and computational chemistry. From 2005-07 she was an Asst. Prof. at the Computer Science Department of the Univ. of Texas at El Paso. She joined the Univ. of Delaware in 2007 as an Asst. Prof. and was promoted to Assoc. Prof. with tenure in 2012.

Speaker

Michela Taufer

Date

Thursday, May 29, 2014

Time

1:30 - 2:30 pm

Location

Computer Science Building, Room 2311

Main menu

Events

Event Calendar

Bio

Speaker

Date

Time

Location

Search form

Main menu

Events

Event Calendar

Enabling Scalable Data Analysis of Large Computational Structural Biology Datasets on Distributed Memory Systems

Bio

Speaker

Date

Time

Location