P16: a blog by Matt Kangas home archive
01 Jul 2008

Supercomputing for the masses: Nvidia CUDA

This is a post I should have written two weeks ago, but I've been lazy. (Or busy...)

I went to a seminar by Nvidia about CUDA, which is a framework for writing apps to run directly on an Nvidia GPU. Not just graphics apps, either. Nvidia GPUs are massively parallel -- up to 128 hardware threads -- and you can write arbitrary C code to run on it.

The seminar was part of a financial technology conference, so they had a third-party app provider which naturally was building solutions on CUDA. They said they were seeing a 50x speedup (one Nvidia GPU versus one Intel CPU core) when running their compute-heavy applications.

I came home and immediately installed their CUDA SDK for Mac OS X on my MacBook Pro. To my surprise, it ran with minimal hassle (*), and I was promptly playing with particle field and liquid simulations. But my Mac doesn't have enough VRAM to run some of the more intensive demos. (MonteCarlo, BlackScholes)

(*) The only real tweak necessary was: "export DYLD_LIBRARY_PATH=/usr/local/cuda/lib"

Takeaways:

One thing CUDA doesn't provide is a way to manage and process massive datasets. The cards have somewhat limited memory, and you have to write an app that runs on the host and feeds the card with data. For search applications like topic clustering -- which I'd like to use this for -- CUDA alone doesn't provide an answer.

Perhaps it would make sense eventually to use Hadoop plus CUDA -- write your map/reduce tasks in CUDA, and rely on Hadoop to distribute data around a cluster of Nvidia-accelerated boxes?

Two spiffy related projects:

Special thanks to Rich Hecker for posting a heads-up about the CUDA seminar to the NY-newtech list.