Revisiting GPU Computing

by ThomasA

A few years ago (ah well, I guess back in 2011…) I started experimenting with scientific computing on GPUs. The research project I was working in had equipped a couple of quite powerful servers with as many NVIDIA Tesla C1060s as we could cram in there.

Back then it was a lot of work getting algorithms to run on GPUs. First I would have to install the drivers for the GPUs manually which required some detective work to find the right configuration files to edit. Then I would have to install the CUDA toolkit manually. Once that worked, I could start writing code for it. As some may know, I like doing my computing in Python (for example this). Before that, I was an enthusiastic Matlab user, and ArrayFire (back then they were called Accelereyes) offered a very nice solution – Jacket – making it very easy to perform computations on GPU in Matlab. Unfortunately that solution was discontinued.

This was around the time I was starting to use Python instead. As far as I recall, PyCUDA was more or less the only option at the time to access the GPU from Python. This was a bit challenging as you would have to write your own kernels in CUDA C to be plugged into Python. Developing software for the GPU in CUDA C was way less efficient than Python coding. On top of that, things had to be optimised quite specifically for a particular GPU architecture. With each new generation of GPU, details changed quite drastically and your existing code would run inefficiently or not at all on newer GPUs. This made it too challenging to keep up and I decided to focus on more efficient code development in Python (when I say more efficient – I don’t mean in terms of execution time, I mean in terms of development time) and quietly mothballed my GPU computing.

Fast forward to today. A lot has happened since and the newest generations of NVIDIAs GPUs make the good old ones I was experimenting with almost ridiculous. Not least the explosion of research in and applications of deep neural networks has resulted in several high-quality software libraries for computing on GPUs. Most of these software libraries seem to be quite high-level, meaning that you can interface to the GPU and execute various operations on it at a high abstraction level. This includes simply calling functions directly in Python.

The emergence of new, high-level tools for GPU computing in Python (among other) has convinced me that the time is ripe for giving GPU computing another go. So I went and bought a GeForce GTX 1080 Ti for my office workstation to get back in the game with some newer hardware. The Tesla C1060s from back in the day were GPUs aimed for scientific computing and especially later generations of the Tesla line focused on getting good double precision floating point performance. The GeForce card here is a gaming card and relatively less powerful in double precision than single precision, but the newer Tesla cards are much too expensive, so I chose a GeForce card to keep the cost down.

Over the next few weeks I am going to be experimenting with various possibilities for interfacing to the GPU from Python. Luckily, this has become a walk in the park compared my earlier attempts:

  1. First I installed the NVIDIA GPU driver from this PPA (I am running Ubuntu). This seems a quite stable archive that does not resort to installing all sorts of unstable, bleeding-edge packages on your system. For me, it just worked out of the box without any manual configuration file editing. Wonderful!
  2. Since I use Continuum Analytics’ Anaconda distribution for all my Python needs, it is very convenient that it can also install the CUDA toolkit:
    conda install cudatoolkit
    And this also worked out of the box for me.

So, the first library I will be trying out is Numba: https://devblogs.nvidia.com/seven-things-numba/. Stay tuned for experiments and don’t hesitate to let me know of any great packages/toolboxes/libraries you think I should try.