A Parallel EM Algorithm for Model-Based Clustering Applied to the Exploration of Large Spatio-Temporal Data...

by Wei-chen Chen, George Ostrouchov, David R Pugmire, Prabhat, Michael Wehner

Publication Type

Journal

Journal Name

Technometrics

Publication Date

November, 2013

Page Numbers

513 to 523

Volume

Issue

View DOI Listing

Abstract

We develop a parallel EM algorithm for multivariate Gaussian mixture models and use it to
perform model-based clustering of a large climate data set. Three variants of the EM algorithm
are reformulated in parallel and a new variant that is faster is presented. All are implemented
using the single program, multiple data (SPMD) programming model, which is able to take
advantage of the combined collective memory of large distributed computer architectures to
process larger data sets. Displays of the estimated mixture model rather than the data allow
us to explore multivariate relationships in a way that scales to arbitrary size data. We study
the performance of our methodology on simulated data and apply our methodology to a high
resolution climate dataset produced by the community atmosphere model (CAM5). This article
has supplementary material online.

A Parallel EM Algorithm for Model-Based Clustering Applied to the Exploration of Large Spatio-Temporal Data...

Abstract

Researchers

Organizations