Tuesday, 13 May 2014

HammerPong #4: LED Strip Refresh Rate Headaches

So onwards and upwards (and back down the other side) .... Time to have a go at making the LED strips that will provide the display surface for the game. I'd already decided I wanted each game strip to be made up of three 5 metre, 150 LED strings mounted side by side and I thought that it might be nice to stagger the centre strip, placing each of its LEDs midway between the pairs of LEDs on either side. I thought that this might help give an illusion of denser pixels and also make it easier to animate the chevron style shapes I've been thinking about for the game.

I did some eBay surfing looking for suitable construction materials and found an adhesive backed roll of 5mm thick, 75mm wide, solid neoprene rubber tape.. This seemed perfect for mounting the 3 LED strips side by side on the sticky side with the remaining exposed adhesive and LED strips with a layer of clear sticky tape. After a bit more searching I decided to try some signmakers masking tape (100mm wide low-tack adhesive paper tape) instead since I thought this would give a nice diffusion of the LED colours.

Sticking down the LED strips went really well. I didn't remove their backing tape since there wasn't much point, they stuck down fine with the backing still in place and it keeps me a few options open if it all went wrong. Must confess I did get in a bit of an angry mess with the paper tape, which was frustratingly difficult to lay down on top of the adhesive in one go and was all too easy to crease or tear. I did end up with some joins, which I wasn't too happy about, but after applying a layer of clear sticky tape to the whole strip it didn't look so bad after all, and once I fired up one of the strips the effect was actually pretty good!

I applied themultiplexing approach I described before, using a modified version of the Adafruit Neopixel Library running on an Arduino Uno. I had replaced my original Toshiba 4514 multiplexer chip with a higher speed Texas Instruments CD74HC4514EN and it seemed to work fine driving the 3 strips together. I had fun trying some nice particle system sketches before having a go at some graphics for the Hammer Pong game.

I tried animating a “puck” shooting along the strip, which all seemed to work fine.... except I was a little bit disappointed with the maximum speed I was getting. I could not see any problems in the code, so I got out the calculator:

150 pixels per strip
x 24 bits per pixel
x 3 strips
x ~1.25us per bit
= ~13.5ms to refresh all three strips
= ~74 frames per second


OK, for video, 74fps would be pretty good! However moving the puck one pixel at a time means the fastest it can tun the length of the 150 pixel strip is about 2 seconds. It gets even worse if we add in the other strip the maximum frame update rate would be halved, and the distance doubled.. That would mean 8 seconds for the puck to reach the opposing player if it moved 1 pixel distance per frame. A bit slow. Hmmmmmm...

Yeah of course this is easily solved by making the puck image move more than one pixel between frames, which would be the usual way of doing things. The problem is that the LEDs are very bright - just like I want them to be - and persistence of vision effects make position jumps between the frames really obvious - you see the image frozen in several locations along the strip, spoiling the sense of fluid movement. I really want single pixel per frame motion to make it look smooth, dang!

So what to do about it? Well the WS2812 protocol sets some base restrictions: The data rate is fixed at 800kbps, so we cannot update faster than 1.25us per bit. Also we have to refresh the entire strip at once due to the serial nature of the load operation (we can't just load pixels that have changed and leave the others). So updating a 150 LED strip will always take a minimum of 4.5ms and there is nothing we can do about that (other than cutting the strip into smaller lengths and addressing them separately maybe... but I don't want to go there!).

But, we do have the possibility of loading the data to all the strips in parallel - so there is no specific reason why we can't load all 6 strips in the same 4.5ms cycle. So, what could stop us doing this?

Well firstly we will need to render all the data into a memory buffer before the strip update (at this data rate we will not have the time to render images on the fly) . Yikes.. thats going to be a lot of memory (by microcontroller standards).. 6 strips x 150 pixels x 3 bytes per pixel = 2700bytes... already more than the 2k RAM on the Atmega328 microcontroller (sad face)

We could reduce this using a lookup table (“palette”) of colour values and storing the palette index for each pixel instead of the 24 bit RGB colour. Lets say we have an 8 bit palette index (up to 256 colours) with perhaps 64 colours actually defined in that 8 bit colour-space.

6 strips x 150 pixels x 1 byte per pixel = 900bytes
plus palette; 64 colours x 3 bits per colour = 192bytes
=1092 bytes total

This is much more doable - and we can save more memory by avoiding storing the palette in RAM.. e.g. by using PROGMEM data stored in the much more spacious 32k FLASH. However the next problem is processing speed...

To be honest I have never had to be so concerned about performance at this level before, ever. But when we are talking about bit-banging at 800kHz every CPU cycle counts. For an ATMEGA328 running at 16Mhz each clock cycle is 62.5ns. Now that *is* pretty fast, but we have to bang these bits pretty fast too. Reading the assembly language code for the Neopixel library really shows how careful the timing of this stuff needs to be.

However, if we can update a single strip by writing LOW and HIGH byte values to an 8 bit port register at this data rate, there is really no reason we cannot update 6 strips (or even 8 - one for each port bit) at the same time ithout breaking a sweat.. same number of bits to load, right? The extra overhead will be preparing the next port byte value, where we'll need to load data from 6 different memory addresses (one per strip) instead of just one - and might need an palette lookup for each one too.

Reading from this really useful blog post (http://cpldcpu.wordpress.com/2014/01/14/light_ws2812-library-v2-0-part-i-understanding-the-ws2812/) we do have up to 9us of idle time to play with between bits. This might make it all possible on an ATMEGA328 with some tight assembly language code, but you know what, maybe its time to look at something with a bit more grunt. Like an ARM board....

I've ordered an Arduino Due as a start. I may yet try to get it working on the Uno, but the comparatively huge amount of memory and much faster CPU speed of the Due does cure a few headaches. Just need to wait for it to arrive now - watch this space.....



  1. The BeagleBone Black's Cortex processor has a built-in feature for these kinds of problems. Inside the MCU, there are two 32-bit Programmable Real-time Units(PRU) that run a small assembly instruction set designed for interfacing with hardware. Because they run alongside the main processor, you don't have to worry about it eating up clock cycles or missing a beat.

    There are already a few libraries out there for driving WS281x strips with these PRU's with example projects driving over 500 meters of LED strips. See http://trmm.net/Category:LEDscape and http://www.nycresistor.com/2013/07/27/ledscape/

    1. Interesting to know.. Thanks! I am not familiar with the Beaglebone but had looked at the PSOC4 for similar reasons. On this project I have it driving 6 strips in parallel over 1 update cycle using a C language routine on an Arduino Due, although interrupts are disabled during updates which caused some head scratching with the intended MIDI/Serial comms

    2. Yeah, trying to bit-bang multiple timing sensitive protocols on a single processor can get really tough.
      Because each of the processes has its own necessary running and idle time, you can usually fit in your own code during the breaks. When you have to deal with multiple protocols, though? Unless you can get them to synchronize, they end up overlapping and failing.

      Have you tried offloading the device communications to a separate device while keeping the Arduino for the big picture processing?

  2. This comment has been removed by the author.

  3. This comment has been removed by a blog administrator.