Contribution to scientific publications in GPUGrid

Looking at my account page in GPUGrid I notice that my computations have lead to some publications. Nice.
3063rd/3113Giorgino et al, J. Chem. Theory Comput, 2012cancer
3389th/5798Sadiq et al, PNAS 2012hiv
1801st/1995Venken et al, JCTC 2013hiv
1886th/3349Buch et al, JCIM 2013cancer
3741st/4477Pérez-Hernández et al, JCP 2013methods
564th/2163Bisignano et al. JCIM 2014methods
369th/1283Doerr et al. JCTC 2014methods
573rd/2838Stanley et al, Nat Commun 2014cancer
1453rd/3611Ferruz et al., JCIM 2015methods
1313th/4128Ferruz et al., Sci Rep 2016brain
3083rd/4815Stanley et al., Sci Rep 2016cancer
4319th/4730Noe et al., Nat Chem 2017methods
Anyone having a device which supports BOINC can contribute his compute power to scientific endeavours. Also see BOINC wiki.

Announcement: 11th International Conference on Parallel Programming and Applied Mathematics

The conference will start in September 6-9, 2015, Krakow, Poland.

Parallel Programming and Applied Mathematics, PPAM for short, is a biennial conference started in 1994, with the proceedings published by Springer in the Lecture Notes in Computer Sciences series, see PPAM. It is sponsored by IBM, Intel, Springer, AMD, RogueWave, and HP. The last conference had a fee of 420 EUR.

It is held in conjunction with 6th Workshop on Language based Parallel Programming.

Prominent speakers are:

Continue reading

Day 2, Workshop Programming of Heterogeneous Systems in Physics

Day 2 of the conference had below talks. Prof. Dr. Bernd Brügmann gave a short introduction. He pointed out that Jena is number 10 in Physics in Germany, has ca. 100.000 inhabitants, and 20.000 students.

  1. Dr. Karl Rupp, Vienna, Lessons Learned in Developing the Linear Algebra Library ViennaCL. Notes: C++ operator overloading normally uses temporary, special trickery necessary to circumvent this, ViennaCL not callable from Fortran due to C++/operator overloading,, Karl Rupp’s slides, with CUDA 5+6 OpenCL and CUDA are more or less on par,
  2. Prof. Dr. Rainer Heintzmann, Jena, CudaMat – a toolbox for Cuda computations. Continue reading

Day 1, Workshop Programming of Heterogeneous Systems in Physics

As announced in Workshop Programming of Heterogeneous Systems in Physics, July 2014, I attended this two-day conference in Jena, Germany. Below are speakers and pictures with my personal notes.

  1. Dipl.-Ing. Hans Pabst from Intel, Programming for the future: scaling forward with cores and vectors. Hans Pabst Continue reading

Announcement: Workshop Programming of Heterogeneous Systems in Physics, July 2014

This workshop is planned for 14-15 July 2014 in Jena, Germany.

The workshop is organized by Bernd Brügmann (University Jena), Xing Cai (Simula and University Oslo), Gundolf Haase (University Graz), and Gerhard Zumbusch (Chair, University Jena).

The last workshop from 2011 had two high-profile speakers regarding CUDA technology: Dr. Timo Stich and Vasily Volkov. The 2014 workshop features Dr. Stich, Dr. Karl Rupp, Hans Pabst (Intel), and others.

Addendum 06-Aug-2014: See my notes taken at the workshop in

  1. Day 1, Workshop Programming of Heterogeneous Systems in Physics
  2. Day 2, Workshop Programming of Heterogeneous Systems in Physics

Output from deviceQuery for NVidia GTX 560

This article “CUDA-Enabled GPUs” made me check my GPU again using deviceQuery in /usr/local/cuda/samples/sdk/1_Utilities/deviceQuery:

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 560"
  CUDA Driver Version / Runtime Version          6.0 / 5.0
  CUDA Capability Major/Minor version number:    2.1
  Total amount of global memory:                 1023 MBytes (1072889856 bytes)
  ( 7) Multiprocessors x ( 48) CUDA Cores/MP:    336 CUDA Cores
 Continue reading 

Newer GPUGRID Tasks Keep GPU Really Hot

Recent GPUGRID tasks, like I60R2-NATHAN_KIDKIXc22 or 27x0-SANTI_RAP74wtCUBIC really keep my NVidia GTX 560 hot, i.e., as warm as 70° Celsius or higher.

Fri Sep  6 17:16:38 2013       
| NVIDIA-SMI 5.325.15   Driver Version: 325.15         |                       
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce GTX 560     Off  | 0000:01:00.0     N/A |                  N/A |
| 92%   76C  N/A     N/A /  N/A |      866MB /  1023MB |     N/A      Default |
 Continue reading 

Running CPU/GPU Intensive Jobs on Titan Supercomputer

There is a an INCITE program (HPC Call for Proposals), where one can apply for CPU/GPU intensive jobs, the link is INCITE.

From the FAQ: The INCITE program is open to US- and non-US-based researchers and research organizations needing large allocations of computer time, supporting resources, and data storage to pursue transformational advances in science and engineering.

The machines in question: Mira and Titan.

Vasily Volkov (UC Berkeley): Unrolling parallel loops

Loop unrolling is not only good for sequential programming, it has similar dramatic effects in highly parallel codes as well, see Unrolling parallel loops (local copy), also see #pragma unroll in the NVidia CUDA programming guide.

Some bullet points of the presentation:

More resources consumed per thread

Note: each load costs 2 arithmetic instructions
• 32 banks vs 32 streaming processors
• But run at half clock rate
These 3 loads are 6x more expensive than 1 FMA

• Simple optimization technique
• Resembles loop unrolling
• Often results in 2x speedup

Dead link: On the homepage Vasily Volkov you find more information on CUDA optimizations.

Cédric Augonnet, Samuel Thibault and Raymond Namyst call Vasily Volkov a “CUDA-hero” in How to get portable performance on
accelerator-based platforms without the
agonizing pain

In a similar vein Dr. Mark Harris describes the beneficial effect of unrolling in parallel reduction.