On 23-Oct-2017 Filippo Mantovani held a talk in Darmstadt on “Mobile technology for production-ready high-performance computing systems: The path of the Mont-Blanc project”. Unfortunately I was unable to attend, but Mr. Mantovani sent me his Darmstadt Seminar slides. As his slides and documents are very interesting to people using or intending to use ARM in HPC, I copy these documents here, so they are easily available. I also copied a report on “MB3_D6.4 Report on application tuning and optimization on ARM platform“.
- Introduction to OpenMP
- OpenMP Tasking In Depth
- OpenMP Recap
- OpenMP and Performance
- Advanced OpenMP Features
Some very striking slides are reproduced here.
This is the output of
lstopo --of png > ~/tmp/lstopo.png
for a machine with an AMD octacore FX 8120, bulldozer architecture, see AMD Bulldozer CPU Architecture Overview.
Added 06-Jan-2018: Below is the output for Skylake i7-6600U in an HP EliteBook notebook:
Georg Hager’s Blog posted an illustrative article on icc versus g++ performance w.r.t. OpenMP. Dr. Georg Hager is one of the authors of Introduction to High Performance Computing for Scientists and Engineers.
double precision, dimension(N) :: a,b,c,d ! initialization etc. omitted s = walltime() !$omp parallel private(R,i) do R=1,NITER !$omp do do i=1,N a(i) = b(i) + c(i) * d(i) enddo !$omp end do enddo !$omp end parallel e=walltime() MFlops = R*N/(e-s)/1.e6