Tuesday, January 19, 2016

Conference Attendance HiPEAC - Day 2 - Papers

Another conference day.  Much of my time is spent talking with other attendees and doing "work", such as preparing my presentation, send emails, etc.  However, I do take some time to actually sit in on other presentations, so here are two highlights:

PARSECs - This work explores rewriting some of the PARSEC benchmarks to use a task-based parallelism (OpenMP tasks), rather than pthreads.  For many workloads, these changes provide improved scaling.  For almost all workloads, the code size was reduced as the original thread pools, job queues, etc could be removed.  In the near future, these revised versions should be released.

HRF-Relaxed - The original OpenCL had no memory model; however, many vendors implemented one.  Now, C++ and other languages use SC for DRF (sequential consistency for data-race-free programs).  Unfortunately, if you use this consistency model in OpenCL, you will lose performance.  Instead, this work proposes a hierarchical race-free model, where the races are only checked at a certain scope of the program.

Monday, January 18, 2016

Conference Attendance HiPEAC - Day 1 - MULTIPROG

It is once again, conference time.  For North Americans, this might seem rather early as I am writing from Prague, Czech Republic (or at least when I started 12 hours ago).  I am attending HiPEAC, which is the premier European computer architecture conference.  HiPEAC is a dual-track conference.  Throughout the three days there is the paper-track, where the accepted papers to TACO (such as mine) are presented.  And simultaneously there are workshops.  For the first day, I am starting with the MULTIPROG workshop, which is on Programmability and Architectures for Heterogeneous Multicores.

Let's start with the keynote, given by David Kaeli of Northeastern University.
- Concurrent execution of compute kernels
- Scheduling of kernels, deadlines
- Sharing / access to host memory (i.e., RAM)

The current model of using a GPGPU is that it runs 1 computation kernel; however, there are many problems that would better decompose into several separate kernels.  It would also be valuable if there were further examples of these problems (i.e., benchmarks).  Now, whenever you try running multiple anything on a computational resource, there is a runtime scheduling problem.  Which should run to best complete the overall problem.  A follow-on research question explores this question a cloud-based environment where the GPU may be shared across entirely independent compute kernels.  This requires the kernels to be tagged with IDs to ensure that their memory is kept separate.  All of this sounds as if we need an OS for the GPU.

Following the late-morning break, we heard next from MECCA (MEeting the Challenges in Computer Architecture) - 3Ps: parallelism, power, and performance.  Consider parallel program annotations for describing the concurrency, runtime management of caches using the annotations to indicate the flow of data and transfer the data before it is required and with the appropriate coherence states and indicate when a block is dead and can be evicted from the cache.

Then there was lunch, resting from my flights, then networking, especially the part where I stood by my poster and discussed my research for 3 hours.  Now to rest for day 2.

Wednesday, January 13, 2016

Repost: Avoid Panicking about Performance

In a recent post, another blogger related how a simple attempt to improve performance nearly spiraled out of control.  The lesson is that always measure and understand your performance problem before attempting any solution.  Now, the very scope of your measurements and understanding can vary depending on the complexity of your solution.  And when your "optimizations" have caused the system to go sideways, it is time to take a careful appraisal of whether to revert or continue.  I have done both, and more often have I wished that I reverted rather than continued.  Afterall, it is better for the code to work slowly rather than not work.

Again, always measure before cutting.