Abstract
Analyzing and clustering documents is a complex
problem. One explored method of solving this problem borrows from
nature, imitating the flocking behavior of birds. One limitation of this
method of document clustering is its complexity O(n2). As the number of
documents grows, it becomes increasingly difficult to generate results in a
reasonable amount of time. In the last few years, the graphics processing
unit (GPU) has received attention for its ability to solve highly-parallel
and semi-parallel problems much faster than the traditional sequential
processor. In this paper, we have conducted research to exploit this archi-
tecture and apply its strengths to the flocking based document clustering
problem. Using the CUDA platform from NVIDIA, we developed a doc-
ument flocking implementation to be run on the NVIDIA GEFORCE
GPU. Performance gains ranged from thirty-six to nearly sixty times
improvement of the GPU over the CPU implementation.