Monday, April 8, 2019

Presentation: The Quest for Energy Proportionality in Mobile & Embedded Systems

This is a summary of the presentation on "The Quest for Energy Proportionality in Mobile & Embedded Systems" by Lin Zhong.

We want mobile and other systems to be energy efficient, and particularly use energy in proportion to the intensity of the required operation.  However, processor architectures only have limited regions where these are in proportion, given certain physical and engineering constraints on the design.  ARM's big.LITTLE gives the a greater range in efficiency by placing two similar cores onto the same chip; however, it is constrained by a need to ensure the cores remain cache coherent.

The recent TI SoC boards also contained another ARM core, running the Thumb ISA for energy efficiency.  This additional core was hidden behind a TI driver (originally to support MP3 playing), but was recently exposed, so allowing further design to utilize it as part of computation.  But this core is not cache coherent with the other, main core on the board.

So Linux was extended to be deployed onto both cores (compiled for the different ISAs), while maintaining the data structures, etc in the common, shared memory space.  Then the application can run and migrate between the cores, based on application hints as to the required intensity of operations.  With migration, one of the core domains is put to sleep and releases the memory to the other core.  This design avoids synchronization between the two domains, which simplifies the code and the concurrency demands are low in the mobile space.  And here was a rare demonstration of software-managed cache coherence.

Therefore, DVFS provides about a 4x change in power, then big.LITTLE has another 5x.  The hidden Thumb core supports an additional 10x reduction in power for those low intensity tasks, such as mobile sensing.  Thus together, this covers a significant part of the energy / computation space.

However, this does not cover the entire space of computation.  At the lowest space, there is still an energy intensive ADC component (analog digital conversion).  This component is the equivalent of tens of thousands of gates.  However, for many computations, they could be pushed into the analog space, which saves on power by computing a simpler result for digital consumption and that the computation can be performed on lower quality input (tolerating noise), which reduces the energy demand.