"Optimizing Embedded Software for Power Efficiency," Part Four

This is my fourth and final post in a series of blog posts based on a series of articles by Rob Oshana and Mark Kraeling on “Optimizing Embedded Software for Power Efficiency” that ran in Embedded.com in May. The focus of Part Four moves beyond a discussion of power utilization with respect to memory access to a discussion of peripheral and algorithmic optimization. For peripherals – for which the main communication forms “for embedded processors include DMA, SRIO, Ethernet, PCI Express, and RF antenna interfaces” – the considerations are “burst size, speed grade, transfer width and general communication modes.”

Although each protocol is different for the I/O peripherals and the internal DMA, they all share the fact that they are used to read/write data. As such, one basic goal is to maximize the throughput while the peripheral is active in order to maximize efficiency and the time the peripheral/device can be in a low-power state, thus minimizing the active clock times.

The most basic way to do this is to increase transfer/burst size. For DMA, the programmer has control over burst size and transfer size in addition to the start/end address (and can follow the alignment and memory accessing rules we discussed in earlier subsections of data path optimization). Using the DMA, the programmer can decide not only the alignment, but also the transfer “shape”, for lack of a better word. What this means is that using the DMA, the programmer can transfer blocks in the form of two-dimensional, three- dimensional, and four-dimensional data chunks, thus transferring data types specific to specific applications on the alignment chosen by the programmer without spending cycles transferring unnecessary data. (Source: Embedded.com )

This section is followed by a look at “whether the core should move data from internal core memory or whether a DMA should be utilized in order to save power.” Other peripherals also come under discussion, with a good-sized section on I/O peripherals. There’s then a bit about polling – a no-no in terms of efficiency.

The article (and the series) concludes with a discussion on algorithmic power optimization which, according to Oshana and Kraeling, gives you the least power-savings bang for the work-required buck.

Algorithmic optimization includes optimization at the core application level, code structuring, data structuring (in some cases, this could be considered as data path optimization), data manipulation, and optimizing instruction selection.

There’s some info on the software pipelining technique (including code snippets), eliminating recursive procedure calls, and some advice on reducing accuracy. (Yes, you heard right: perfection can be the enemy of good.) Too much accuracy can overcomplicate things and suck up more cycles, without gaining you much of anything with respect to true precision. (Sort of like when kids carry out a division problem to a dozen decimal places…)

Overall, the Oshana-Kraeling series isn’t exactly beach reading, but I found it to be quite instructive.

—————————————————————————————————

The full series of articles is linked here:
Optimizing embedded software for power efficiency: Part 1 – Measuring power
Optimizing embedded software for power efficiency: Part 2 – Minimizing hardware power
Optimizing embedded software for power efficiency: Part 3 – Optimizing data flow and memory
Optimizing embedded software for power efficiency: Part 4 – Peripheral and algorithmic optimization

They are all excerpted from Oshana’s and Kaeling’s book, Software Engineering for Embedded Systems.