Library allowing up to 16 strings of 60 WS2811 or WS2812 LEDs to be driven from a single FRDM-KL25Z board. Uses hardware DMA to do a full 800 KHz rate without much CPU burden.

Dependents:   Multi_WS2811_test

After being frustrated by the SPI system's performance, I ended up using an approach inspired by Paul Stoffregen's OctoWS2811. This uses 3 of the 4 DMA channels triggered by the TPM0 timer PWM and overflow events.

This design will allow for up to 16 strings of up to 60 (limited by RAM space) WS2811/WS2812 LEDs to be driven on a single port. Adding more strings takes the same time to DMA, because the bits are output in parallel.

Here is my test program:

Import programMulti_WS2811_test

Test program for my Multi_WS2811 library that started out as a fork of heroic/WS2811. My library uses hardware DMA on the FRDM-KL25Z to drive up to 16 strings of WS2811 or WS2812 LEDs in parallel.

Here's 60 LEDs on a single string, at 10% brightness: https://www.icloud.com/sharedalbum/#B015oqs3qeGdFY

Note though that the 3.3V output from the FRDM-KL25Z's GPIO pins is OUT OF SPEC for driving the 5V WS2812 inputs, which require 3.5V for a logic HIGH signal. It only works on my board if I don't connect my scope or logic analyzer to the output pin. I recommend that you add a 5V buffer to the outputs to properly drive the LED strings. I added a CD4504 to do the 3.3 to 5V translation (mostly because I had one). You could use (say) a 74HCT244 to do 8 strings.

Each LED in a string takes 24/800e3 seconds to DMA, so if MAX_LEDS_PER_STRING is set to 60, then it takes 1.8 msec to actually do the DMA, plus 64 usec of guard time, or 1.87 msec per frame (538 frames/second). Of course, actually composing the frame will take most of the time in a real program.

The way I have my code set up, I can use up to 8 pins on PORTD. However, changing the defines at the top of WS2811.cpp will change the selected port.

Alternatively, you could use another port to get more strings. Watch out for pin mux conflicts, though.

Here are your choices:

  • PORTE: 15 total: PTE0-PTE5, PTE20-PTE25, PTE29-PTE31
  • PORTD: 8 total: PTD0-PTD7
  • PORTC: 16 total: PTC0-PTC13, PTC16-17
  • PORTB: 16 total: PTB0-PTB11, PTB16-19
  • PORTA: 15 total: PTA0-PTA5, PTA12-PTA20

Here is how the DMA channels are interleaved:

/media/uploads/bikeNomad/ws2812.png

The way I have it set up to generate the three phases of the required waveform is this:

I have timer TPM0 set up to generate events at overflow (OVF), at 250 nsec (CH0), and at 650 nsec (CH1). At 1250 nsec it resets to 0.

At timer count = 0, DMA0 fires, because it's triggered by TPM0's overflow (OVF) event. This results in the data lines being driven to a constant "1" level, as the data that DMA0 is programmed to transfer is a single, all-1's word. (This is the easiest way to explain what is happening; this is the way I'd wanted it to work, but I had to use as much precious RAM as for the RGB data to hold 1's to get it to work).

At 250 nsec, DMA1 fires, because it's triggered by TPM0's CH0 compare event. This drives either a 0 or 1 level to the pins, because DMA1 is programmed to transfer our data bytes to the pins.

At 650 nsec, DMA2 fires, because it's triggered by TPM0's CH1 compare event. This results in the data lines being driven to a constant "0" level, as the data that DMA2 is programmed to transfer is a single, all-0's word.

At 1250 nsec, the timer resets to 0, and the whole cycle repeats.

Because this library uses three of timer TPM0's six channels (and sets TPM0 to 800kHz), you will need to select TPM1 or TPM2 output pins if you want to use PwmOut pins in your program (for instance, for RC servos, which want a 50Hz frequency). If you just want to change discrete LED brightnesses, you can use TPM0's CH3, CH4, or CH5 pins. Just make sure that you set up your PwmOut instance at the same frequency.

Here is a table showing the assignment of timer resources to PwmOut capable pins in the FRDM-KL25Z:

KL25Z pinArduino nameTimerChannel
PTA3TPM0CH0
PTC1A5TPM0CH0
PTD0D10TPM0CH0
PTE24TPM0CH0
PTA4D4TPM0CH1
PTC2A4TPM0CH1
PTD1D13/LED_BLUETPM0CH1
PTE25TPM0CH1
PTA5D5TPM0CH2
PTC3TPM0CH2
PTD2D11TPM0CH2
PTE29TPM0CH2
PTC4TPM0CH3
PTD3D12TPM0CH3
PTE30TPM0CH3
PTC8D6TPM0CH4
PTD4D2TPM0CH4
PTE31TPM0CH4
PTA0TPM0CH5
PTC9D7TPM0CH5
PTD5D9TPM0CH5
PTE26TPM0CH5
PTA12D3TPM1CH0
PTB0A0TPM1CH0
PTE20TPM1CH0
PTA13D8TPM1CH1
PTB1A1TPM1CH1
PTE21TPM1CH1
PTA1D0/USBRXTPM2CH0
PTB18LED_REDTPM2CH0
PTB2A2TPM2CH0
PTE22TPM2CH0
PTA2D1/USBTXTPM2CH1
PTB19LED_GREENTPM2CH1
PTB3A3TPM2CH1
PTE23TPM2CH1

Files at this revision

API Documentation at this revision

Comitter:
bikeNomad
Date:
Sat Jan 04 00:40:08 2014 +0000
Child:
1:86a910560879
Commit message:
Initial revision of library.

Changed in this revision

Colors.cpp Show annotated file Show diff for this revision Revisions of this file
Colors.h Show annotated file Show diff for this revision Revisions of this file
LedStrip.cpp Show annotated file Show diff for this revision Revisions of this file
LedStrip.h Show annotated file Show diff for this revision Revisions of this file
WS2811.cpp Show annotated file Show diff for this revision Revisions of this file
WS2811.h Show annotated file Show diff for this revision Revisions of this file
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/Colors.cpp	Sat Jan 04 00:40:08 2014 +0000
@@ -0,0 +1,90 @@
+#include <math.h>
+#include <mbed.h>
+#include "Colors.h"
+
+void HSBtoRGB(float hue, float saturation, float brightness, uint8_t *pr, uint8_t *pg, uint8_t *pb)
+{
+    uint8_t r = 0, g = 0, b = 0;
+    if (saturation == 0) {
+        r = g = b = (uint8_t) (brightness * 255.0f + 0.5f);
+    } else {
+        float h = (hue - (float)floor(hue)) * 6.0f;
+        float f = h - (float)floor(h);
+        float p = brightness * (1.0f - saturation);
+        float q = brightness * (1.0f - saturation * f);
+        float t = brightness * (1.0f - (saturation * (1.0f - f)));
+        switch ((int) h) {
+            case 0:
+                r = (int) (brightness * 255.0f + 0.5f);
+                g = (int) (t * 255.0f + 0.5f);
+                b = (int) (p * 255.0f + 0.5f);
+                break;
+            case 1:
+                r = (int) (q * 255.0f + 0.5f);
+                g = (int) (brightness * 255.0f + 0.5f);
+                b = (int) (p * 255.0f + 0.5f);
+                break;
+            case 2:
+                r = (int) (p * 255.0f + 0.5f);
+                g = (int) (brightness * 255.0f + 0.5f);
+                b = (int) (t * 255.0f + 0.5f);
+                break;
+            case 3:
+                r = (int) (p * 255.0f + 0.5f);
+                g = (int) (q * 255.0f + 0.5f);
+                b = (int) (brightness * 255.0f + 0.5f);
+                break;
+            case 4:
+                r = (int) (t * 255.0f + 0.5f);
+                g = (int) (p * 255.0f + 0.5f);
+                b = (int) (brightness * 255.0f + 0.5f);
+                break;
+            case 5:
+                r = (int) (brightness * 255.0f + 0.5f);
+                g = (int) (p * 255.0f + 0.5f);
+                b = (int) (q * 255.0f + 0.5f);
+                break;
+        }
+    }
+    *pr = r;
+    *pg = g;
+    *pb = b;
+}
+
+float* RGBtoHSB(uint8_t r, uint8_t g, uint8_t b, float* hsbvals)
+{
+    float hue, saturation, brightness;
+    if (!hsbvals) {
+        hsbvals = new float[3];
+    }
+    uint8_t cmax = (r > g) ? r : g;
+    if (b > cmax) cmax = b;
+    uint8_t cmin = (r < g) ? r : g;
+    if (b < cmin) cmin = b;
+
+    brightness = ((float) cmax) / 255.0f;
+    if (cmax != 0)
+        saturation = ((float) (cmax - cmin)) / ((float) cmax);
+    else
+        saturation = 0;
+    if (saturation == 0)
+        hue = 0;
+    else {
+        float redc = ((float) (cmax - r)) / ((float) (cmax - cmin));
+        float greenc = ((float) (cmax - g)) / ((float) (cmax - cmin));
+        float bluec = ((float) (cmax - b)) / ((float) (cmax - cmin));
+        if (r == cmax)
+            hue = bluec - greenc;
+        else if (g == cmax)
+            hue = 2.0f + redc - bluec;
+        else
+            hue = 4.0f + greenc - redc;
+        hue = hue / 6.0f;
+        if (hue < 0)
+            hue = hue + 1.0f;
+    }
+    hsbvals[0] = hue;
+    hsbvals[1] = saturation;
+    hsbvals[2] = brightness;
+    return hsbvals;
+}
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/Colors.h	Sat Jan 04 00:40:08 2014 +0000
@@ -0,0 +1,51 @@
+#include <mbed.h>
+
+#ifndef __included_colors_h
+#define __included_colors_h
+
+/**
+ * Converts the components of a color, as specified by the HSB
+ * model, to an equivalent set of values for the default RGB model.
+ * <p>
+ * The <code>saturation</code> and <code>brightness</code> components
+ * should be floating-point values between zero and one
+ * (numbers in the range 0.0-1.0).  The <code>hue</code> component
+ * can be any floating-point number.  The floor of this number is
+ * subtracted from it to create a fraction between 0 and 1.  This
+ * fractional number is then multiplied by 360 to produce the hue
+ * angle in the HSB color model.
+ * <p>
+ * The integer that is returned by <code>HSBtoRGB</code> encodes the
+ * value of a color in bits 0-23 of an integer value that is the same
+ * format used by the method {@link #getRGB() <code>getRGB</code>}.
+ * This integer can be supplied as an argument to the
+ * <code>Color</code> constructor that takes a single integer argument.
+ * @param     hue   the hue component of the color
+ * @param     saturation   the saturation of the color
+ * @param     brightness   the brightness of the color
+ * @return    the RGB value of the color with the indicated hue,
+ *                            saturation, and brightness.
+ */
+void HSBtoRGB(float hue, float saturation, float brightness, uint8_t *pr, uint8_t *pg, uint8_t *pb);
+
+/**
+ * Converts the components of a color, as specified by the default RGB
+ * model, to an equivalent set of values for hue, saturation, and
+ * brightness that are the three components of the HSB model.
+ * <p>
+ * If the <code>hsbvals</code> argument is <code>null</code>, then a
+ * new array is allocated to return the result. Otherwise, the method
+ * returns the array <code>hsbvals</code>, with the values put into
+ * that array.
+ * @param     r   the red component of the color
+ * @param     g   the green component of the color
+ * @param     b   the blue component of the color
+ * @param     hsbvals  the array used to return the
+ *                     three HSB values, or <code>null</code>
+ * @return    an array of three elements containing the hue, saturation,
+ *                     and brightness (in that order), of the color with
+ *                     the indicated red, green, and blue components.
+ */
+float* RGBtoHSB(uint8_t r, uint8_t g, uint8_t b, float* hsbvals);
+
+#endif
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/LedStrip.cpp	Sat Jan 04 00:40:08 2014 +0000
@@ -0,0 +1,80 @@
+#include "LedStrip.h"
+
+LedStrip::LedStrip(int n)
+{
+   // Allocate 3 bytes per pixel:
+    numLEDs = n;
+    pixels = (uint8_t *)malloc(numPixelBytes());
+    if (pixels) {
+        memset(pixels, 0x00, numPixelBytes()); // Init to RGB 'off' state
+    }
+}
+
+LedStrip::~LedStrip()
+{
+    free(pixels);
+}
+ 
+uint32_t LedStrip::total_luminance(void)
+{
+    uint32_t running_total;
+    running_total = 0;
+    for (int i=0; i< numPixelBytes(); i++)
+        running_total += pixels[i];
+    return running_total;
+}
+
+// Convert R,G,B to combined 32-bit color
+uint32_t LedStrip::Color(uint8_t r, uint8_t g, uint8_t b)
+{
+    // Take the lowest 7 bits of each value and append them end to end
+    // We have the top bit set high (its a 'parity-like' bit in the protocol
+    // and must be set!)
+    return ((uint32_t)g << 16) | ((uint32_t)r << 8) | (uint32_t)b;
+}
+
+// store the rgb component in our array
+void LedStrip::setPixelColor(uint16_t n, uint8_t r, uint8_t g, uint8_t b)
+{
+    if (n >= numLEDs) return; // '>=' because arrays are 0-indexed
+
+    pixels[n*3  ] = g;
+    pixels[n*3+1] = r;
+    pixels[n*3+2] = b;
+}
+
+void LedStrip::setPixelR(uint16_t n, uint8_t r)
+{
+    if (n >= numLEDs) return; // '>=' because arrays are 0-indexed
+
+    pixels[n*3+1] = r;
+}
+
+void LedStrip::setPixelG(uint16_t n, uint8_t g)
+{
+    if (n >= numLEDs) return; // '>=' because arrays are 0-indexed
+
+    pixels[n*3] = g;
+}
+
+void LedStrip::setPixelB(uint16_t n, uint8_t b)
+{
+    if (n >= numLEDs) return; // '>=' because arrays are 0-indexed
+
+    pixels[n*3+2] = b;
+}
+
+void LedStrip::setPackedPixels(uint8_t * buffer, uint32_t n)
+{
+    if (n >= numLEDs) return;
+    memcpy(pixels, buffer, (size_t) (n*3));
+}
+
+void LedStrip::setPixelColor(uint16_t n, uint32_t c)
+{
+    if (n >= numLEDs) return; // '>=' because arrays are 0-indexed
+
+    pixels[n*3  ] = (c >> 16);
+    pixels[n*3+1] = (c >>  8);
+    pixels[n*3+2] =  c;
+}
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/LedStrip.h	Sat Jan 04 00:40:08 2014 +0000
@@ -0,0 +1,44 @@
+// Parent class for all addressable LED strips.
+// Partially based on work by and (c) 2011 Jelmer Tiete
+// whose library is ported from the Arduino implementation of Adafruit Industries
+// found at: http://github.com/adafruit/LPD8806
+// and their strips: http://www.adafruit.com/products/306
+// Released under the MIT License: http://mbed.org/license/mit
+
+// This is a pure virtual parent class for all LED strips, so that different types
+// of strip may be used in a single array or container.
+
+#include "mbed.h"
+
+#ifndef LEDSTRIP_H
+#define LEDSTRIP_H
+
+class LedStrip
+{
+public:
+    LedStrip(int n);
+    ~LedStrip();
+
+    virtual void begin(void)=0;
+    virtual void show(void)=0;
+    virtual void blank(void)=0;
+
+    static uint32_t Color(uint8_t r, uint8_t g, uint8_t b);
+
+    uint16_t numPixels(void) { return numLEDs; }
+    uint16_t numPixelBytes(void) { return numLEDs * 3; }
+    uint32_t total_luminance(void);
+
+    void setPixelB(uint16_t n, uint8_t b);
+    void setPixelG(uint16_t n, uint8_t g);
+    void setPixelR(uint16_t n, uint8_t r);
+    
+    void setPixelColor(uint16_t n, uint32_t c);
+    void setPixelColor(uint16_t n, uint8_t r, uint8_t g, uint8_t b);
+    void setPackedPixels(uint8_t * buffer, uint32_t n);
+
+protected:
+    uint8_t *pixels;     // Holds LED color values
+    uint16_t numLEDs;     // Number of RGB LEDs in strand
+};
+#endif
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/WS2811.cpp	Sat Jan 04 00:40:08 2014 +0000
@@ -0,0 +1,399 @@
+// 800 KHz WS2811 driver driving potentially many LED strings.
+// Uses 3-phase DMA
+// 16K SRAM less stack, etc.
+//
+// Per LED: 3 bytes (malloc'd) for RGB data
+//
+// Per LED strip / per LED
+//          96 bytes (static) for bit data
+//        + 96 bytes (static) for ones data
+//        = 192 bytes
+//
+//        40 LEDs max per string = 7680 bytes static
+//
+//        40 LEDs: 7680 + 40*3 = 7800 bytes
+//        80 LEDs: 7680 + 80*3 = 7920 bytes
+
+#include "MKL25Z4.h"
+#include "LedStrip.h"
+#include "WS2811.h"
+
+//
+// Configuration
+//
+
+// Define MONITOR_TPM0_PWM as non-zero to monitor PWM timing on PTD0 and PTD1
+// PTD0 TPM0/CH0 PWM_1 J2/06
+// PTD1 TPM0/CH1 PWM_2 J2/12 (also LED_BLUE)
+#define MONITOR_TPM0_PWM 0
+
+// define DEBUG_PIN to identify a pin in PORTD used for debug output
+// #define DEBUG_PIN 4 /* PTD4 debugOut */
+
+#ifdef DEBUG_PIN
+#define DEBUG 1
+#endif
+
+#if DEBUG
+#define DEBUG_MASK (1<<DEBUG_PIN)
+#define RESET_DEBUG (IO_GPIO->PDOR &= ~DEBUG_MASK)
+#define SET_DEBUG (IO_GPIO->PDOR |= DEBUG_MASK)
+#else
+#define DEBUG_MASK 0
+#define RESET_DEBUG (void)0
+#define SET_DEBUG (void)0
+#endif
+
+static PORT_Type volatile * const IO_PORT = PORTD;
+static GPIO_Type volatile * const IO_GPIO = PTD;
+
+// 48 MHz clock, no prescaling.
+#define NSEC_TO_TICKS(nsec) ((nsec)*48/1000)
+#define USEC_TO_TICKS(usec) ((usec)*48)
+static const uint32_t CLK_NSEC = 1250;
+static const uint32_t tpm_period    = NSEC_TO_TICKS(CLK_NSEC);
+static const uint32_t tpm_p0_period = NSEC_TO_TICKS(250);
+static const uint32_t tpm_p1_period = NSEC_TO_TICKS(650);
+static const uint32_t guardtime_period = USEC_TO_TICKS(55);   // guardtime minimum 50 usec.
+
+enum DMA_MUX_SRC {
+    DMA_MUX_SRC_TPM0_CH_0     = 24,
+    DMA_MUX_SRC_TPM0_CH_1,
+    DMA_MUX_SRC_TPM0_Overflow = 54,
+};
+
+enum DMA_CHAN {
+    DMA_CHAN_START = 0,
+    DMA_CHAN_0_LOW = 1,
+    DMA_CHAN_1_LOW = 2,
+    N_DMA_CHANNELS
+};
+
+volatile bool WS2811::dma_done = true;
+
+// class static
+bool WS2811::initialized = false;
+
+// class static
+uint32_t WS2811::enabledPins = 0;
+
+#define WORD_ALIGNED __attribute__ ((aligned(4)))
+
+#define DMA_LEADING_ZEROS  2
+#define BITS_PER_RGB       24
+#define DMA_TRAILING_ZEROS 1
+
+static struct {
+    uint32_t start_t1_low[ DMA_LEADING_ZEROS ];
+    uint32_t dmaWords[ BITS_PER_RGB * MAX_LEDS_PER_STRIP ];
+    uint32_t trailing_zeros_1[ DMA_TRAILING_ZEROS ];
+
+    uint32_t start_t0_high[ DMA_LEADING_ZEROS - 1 ];
+    uint32_t allOnes[ BITS_PER_RGB * MAX_LEDS_PER_STRIP ];
+    uint32_t trailing_zeros_2[ DMA_TRAILING_ZEROS + 1 ];
+} dmaData WORD_ALIGNED;
+
+// class static
+void WS2811::hw_init()
+{
+    if (initialized) return;
+
+    dma_data_init();
+    clock_init();
+    dma_init();
+    io_init();
+    tpm_init();
+
+    initialized = true;
+
+    SET_DEBUG;
+    RESET_DEBUG;
+}
+
+// class static
+void WS2811::dma_data_init()
+{
+    memset(dmaData.allOnes, 0xFF, sizeof(dmaData.allOnes));
+
+#if DEBUG
+    for (unsigned i = 0; i < BITS_PER_RGB * MAX_LEDS_PER_STRIP; i++)
+        dmaData.dmaWords[i] = DEBUG_MASK;
+#endif
+}
+
+// class static
+
+/// Enable PORTD, DMA and TPM0 clocking
+void WS2811::clock_init()
+{
+    SIM->SCGC5 |= SIM_SCGC5_PORTD_MASK;
+    SIM->SCGC6 |= SIM_SCGC6_DMAMUX_MASK | SIM_SCGC6_TPM0_MASK; // Enable clock to DMA mux and TPM0
+    SIM->SCGC7 |= SIM_SCGC7_DMA_MASK;  // Enable clock to DMA
+
+    SIM->SOPT2 |= SIM_SOPT2_TPMSRC(1); // Clock source: MCGFLLCLK or MCGPLLCLK
+}
+
+// class static
+
+/// Configure GPIO output pins
+void WS2811::io_init()
+{
+    uint32_t m = 1;
+    for (uint32_t i = 0; i < 32; i++) {
+        // set up each pin
+        if (m & enabledPins) {
+            IO_PORT->PCR[i] = PORT_PCR_MUX(1) // GPIO
+                              | PORT_PCR_DSE_MASK; // high drive strength
+        }
+        m <<= 1;
+    }
+
+    IO_GPIO->PDDR |= enabledPins;      // set as outputs
+
+#if MONITOR_TPM0_PWM
+    // PTD0 CH0 monitor: TPM0, high drive strength
+    IO_PORT->PCR[0] = PORT_PCR_MUX(4) | PORT_PCR_DSE_MASK;
+    // PTD1 CH1 monitor: TPM0, high drive strength
+    IO_PORT->PCR[1] = PORT_PCR_MUX(4) | PORT_PCR_DSE_MASK;
+    IO_GPIO->PDDR  |= 3;               // set as outputs
+    IO_GPIO->PDOR &= ~(enabledPins | 3);     // initially low
+#else
+    IO_GPIO->PDOR &= ~enabledPins;     // initially low
+#endif
+
+#if DEBUG
+    IO_PORT->PCR[DEBUG_PIN] = PORT_PCR_MUX(1) | PORT_PCR_DSE_MASK;
+    IO_GPIO->PDDR |= DEBUG_MASK;
+    IO_GPIO->PDOR &= ~DEBUG_MASK;
+#endif
+}
+
+// class static
+
+/// Configure DMA and DMAMUX
+void WS2811::dma_init()
+{
+    // reset DMAMUX
+    DMAMUX0->CHCFG[DMA_CHAN_START] = 0;
+    DMAMUX0->CHCFG[DMA_CHAN_0_LOW] = 0;
+    DMAMUX0->CHCFG[DMA_CHAN_1_LOW] = 0;
+
+    // wire our DMA event sources into the first three DMA channels
+    // t=0: all enabled outputs go high on TPM0 overflow
+    DMAMUX0->CHCFG[DMA_CHAN_START] = DMAMUX_CHCFG_ENBL_MASK | DMAMUX_CHCFG_SOURCE(DMA_MUX_SRC_TPM0_Overflow);
+    // t=tpm_p0_period: all of the 0 bits go low.
+    DMAMUX0->CHCFG[DMA_CHAN_0_LOW] = DMAMUX_CHCFG_ENBL_MASK | DMAMUX_CHCFG_SOURCE(DMA_MUX_SRC_TPM0_CH_0);
+    // t=tpm_p1_period: all outputs go low.
+    DMAMUX0->CHCFG[DMA_CHAN_1_LOW] = DMAMUX_CHCFG_ENBL_MASK | DMAMUX_CHCFG_SOURCE(DMA_MUX_SRC_TPM0_CH_1);
+
+    NVIC_SetVector(DMA0_IRQn, (uint32_t)&DMA0_IRQHandler);
+    NVIC_EnableIRQ(DMA0_IRQn);
+}
+
+// class static
+
+/// Configure TPM0 to do two different PWM periods at 800kHz rate
+void WS2811::tpm_init()
+{
+    // set up TPM0 for proper period (800 kHz = 1.25 usec ±600nsec)
+    TPM_Type volatile *tpm = TPM0;
+    tpm->SC = TPM_SC_DMA_MASK          // enable DMA
+              | TPM_SC_TOF_MASK        // reset TOF flag if set
+              | TPM_SC_CMOD(0)         // disable clocks
+              | TPM_SC_PS(0);          // 48MHz / 1 = 48MHz clock
+    tpm->MOD = tpm_period - 1;         // 48MHz / 800kHz
+
+    // No Interrupts; High True pulses on Edge Aligned PWM
+    tpm->CONTROLS[0].CnSC = TPM_CnSC_MSB_MASK | TPM_CnSC_ELSB_MASK | TPM_CnSC_DMA_MASK;
+    tpm->CONTROLS[1].CnSC = TPM_CnSC_MSB_MASK | TPM_CnSC_ELSB_MASK | TPM_CnSC_DMA_MASK;
+
+    // set TPM0 channel 0 for 0.35 usec (±150nsec) (0 code)
+    // 1.25 usec * 1/3 = 417 nsec
+    tpm->CONTROLS[0].CnV = tpm_p0_period;
+
+    // set TPM0 channel 1 for 0.7 usec (±150nsec) (1 code)
+    // 1.25 usec * 2/3 = 833 nsec
+    tpm->CONTROLS[1].CnV = tpm_p1_period;
+
+    NVIC_SetVector(TPM0_IRQn, (uint32_t)&TPM0_IRQHandler);
+    NVIC_EnableIRQ(TPM0_IRQn);
+}
+
+WS2811::WS2811(unsigned n, unsigned pinNumber)
+    : LedStrip(n)
+    , pinMask(1U << pinNumber)
+{
+    enabledPins |= pinMask;
+    initialized = false;
+}
+
+// class static
+void WS2811::startDMA()
+{
+    hw_init();
+    
+    wait_for_dma_done();
+    dma_done = false;
+
+    DMA_Type volatile * dma   = DMA0;
+    TPM_Type volatile *tpm   = TPM0;
+    uint32_t nBytes = sizeof(dmaData.start_t1_low)
+                      + sizeof(dmaData.dmaWords)
+                      + sizeof(dmaData.trailing_zeros_1);
+
+    tpm->SC = TPM_SC_DMA_MASK        // enable DMA
+              | TPM_SC_TOF_MASK  // reset TOF flag if set
+              | TPM_SC_CMOD(0)   // disable clocks
+              | TPM_SC_PS(0);    // 48MHz / 1 = 48MHz clock
+    tpm->MOD = tpm_period - 1;       // 48MHz / 800kHz
+
+    tpm->CNT = tpm_p0_period - 2 ;
+    tpm->STATUS = 0xFFFFFFFF;
+
+    dma->DMA[DMA_CHAN_START].DSR_BCR = DMA_DSR_BCR_DONE_MASK; // clear/reset DMA status
+    dma->DMA[DMA_CHAN_0_LOW].DSR_BCR = DMA_DSR_BCR_DONE_MASK; // clear/reset DMA status
+    dma->DMA[DMA_CHAN_1_LOW].DSR_BCR = DMA_DSR_BCR_DONE_MASK; // clear/reset DMA status
+
+    // t=0: all outputs go high
+    // triggered by TPM0_Overflow
+    // source is one word of 0 then 24 x 0xffffffff, then another 0 word
+    dma->DMA[DMA_CHAN_START].SAR     = (uint32_t)(void*)dmaData.start_t0_high;
+    dma->DMA[DMA_CHAN_START].DSR_BCR = DMA_DSR_BCR_BCR_MASK & nBytes; // length of transfer in bytes
+
+    // t=tpm_p0_period: some outputs (the 0 bits) go low.
+    // Triggered by TPM0_CH0
+    // Start 2 words before the actual data to avoid garbage pulses.
+    dma->DMA[DMA_CHAN_0_LOW].SAR     = (uint32_t)(void*)dmaData.start_t1_low; // set source address
+    dma->DMA[DMA_CHAN_0_LOW].DSR_BCR = DMA_DSR_BCR_BCR_MASK & nBytes; // length of transfer in bytes
+
+    // t=tpm_p1_period: all outputs go low.
+    // Triggered by TPM0_CH1
+    // source is constant 0x00000000 (first word of dmaWords)
+    dma->DMA[DMA_CHAN_1_LOW].SAR     = (uint32_t)(void*)dmaData.start_t1_low; // set source address
+    dma->DMA[DMA_CHAN_1_LOW].DSR_BCR = DMA_DSR_BCR_BCR_MASK & nBytes; // length of transfer in bytes
+
+    dma->DMA[DMA_CHAN_0_LOW].DAR
+    = dma->DMA[DMA_CHAN_1_LOW].DAR
+      = dma->DMA[DMA_CHAN_START].DAR
+        = (uint32_t)(void*)&IO_GPIO->PDOR;
+
+    SET_DEBUG;
+
+    dma->DMA[DMA_CHAN_0_LOW].DCR     = DMA_DCR_EINT_MASK // enable interrupt on end of transfer
+                                       | DMA_DCR_ERQ_MASK
+                                       | DMA_DCR_D_REQ_MASK // clear ERQ on end of transfer
+                                       | DMA_DCR_SINC_MASK // increment source each transfer
+                                       | DMA_DCR_CS_MASK
+                                       | DMA_DCR_SSIZE(0) // 32-bit source transfers
+                                       | DMA_DCR_DSIZE(0); // 32-bit destination transfers
+
+    dma->DMA[DMA_CHAN_1_LOW].DCR     = DMA_DCR_EINT_MASK // enable interrupt on end of transfer
+                                       | DMA_DCR_ERQ_MASK
+                                       | DMA_DCR_D_REQ_MASK // clear ERQ on end of transfer
+                                       | DMA_DCR_CS_MASK
+                                       | DMA_DCR_SSIZE(0) // 32-bit source transfers
+                                       | DMA_DCR_DSIZE(0); // 32-bit destination transfers
+
+    dma->DMA[DMA_CHAN_START].DCR     = DMA_DCR_EINT_MASK // enable interrupt on end of transfer
+                                       | DMA_DCR_ERQ_MASK
+                                       | DMA_DCR_D_REQ_MASK // clear ERQ on end of transfer
+                                       | DMA_DCR_SINC_MASK // increment source each transfer
+                                       | DMA_DCR_CS_MASK
+                                       | DMA_DCR_SSIZE(0) // 32-bit source transfers
+                                       | DMA_DCR_DSIZE(0);
+
+    tpm->SC |= TPM_SC_CMOD(1);         // enable internal clocking
+}
+
+void WS2811::writePixel(unsigned n, uint8_t *p)
+{
+    uint32_t *dest = dmaData.dmaWords + n * BITS_PER_RGB;
+    writeByte(*p++, pinMask, dest + 0); // G
+    writeByte(*p++, pinMask, dest + 8); // R
+    writeByte(*p, pinMask, dest + 16); // B
+}
+
+// class static
+void WS2811::writeByte(uint8_t byte, uint32_t mask, uint32_t *dest)
+{
+    for (uint8_t bm = 0x80; bm; bm >>= 1) {
+        // MSBit first
+        if (byte & bm)
+            *dest |= mask;
+        else
+            *dest &= ~mask;
+        dest++;
+    }
+}
+
+void WS2811::begin()
+{
+    blank();
+    show();
+}
+
+void WS2811::blank()
+{
+    memset(pixels, 0x00, numPixelBytes());
+
+#if DEBUG
+    for (unsigned i = DMA_LEADING_ZEROS; i < DMA_LEADING_ZEROS + BITS_PER_RGB; i++)
+        dmaData.dmaWords[i] = DEBUG_MASK;
+#else
+    memset(dmaData.dmaWords, 0x00, sizeof(dmaData.dmaWords));
+#endif
+}
+
+void WS2811::show()
+{
+
+    uint16_t i, n = numPixels(); // 3 bytes per LED
+    uint8_t *p = pixels;
+
+    for (i=0; i<n; i++ ) {
+        writePixel(i, p);
+        p += 3;
+    }
+}
+
+extern "C" void DMA0_IRQHandler()
+{
+    DMA_Type volatile *dma = DMA0;
+    TPM_Type volatile *tpm = TPM0;
+
+    uint32_t db;
+
+    db = dma->DMA[DMA_CHAN_0_LOW].DSR_BCR;
+    if (db & DMA_DSR_BCR_DONE_MASK) {
+        dma->DMA[DMA_CHAN_0_LOW].DSR_BCR = DMA_DSR_BCR_DONE_MASK; // clear/reset DMA status
+    }
+
+    db = dma->DMA[DMA_CHAN_1_LOW].DSR_BCR;
+    if (db & DMA_DSR_BCR_DONE_MASK) {
+        dma->DMA[DMA_CHAN_1_LOW].DSR_BCR = DMA_DSR_BCR_DONE_MASK; // clear/reset DMA status
+    }
+
+    db = dma->DMA[DMA_CHAN_START].DSR_BCR;
+    if (db & DMA_DSR_BCR_DONE_MASK) {
+        dma->DMA[DMA_CHAN_START].DSR_BCR = DMA_DSR_BCR_DONE_MASK; // clear/reset DMA status
+    }
+
+    tpm->SC = TPM_SC_TOF_MASK;  // reset TOF flag; disable internal clocking
+
+    SET_DEBUG;
+
+    // set TPM0 to interrrupt after guardtime
+    tpm->MOD = guardtime_period - 1; // 48MHz * 55 usec
+    tpm->CNT = 0;
+    tpm->SC  = TPM_SC_PS(0)        // 48MHz / 1 = 48MHz clock
+               | TPM_SC_TOIE_MASK  // enable interrupts
+               | TPM_SC_CMOD(1);   // and internal clocking
+}
+
+extern "C" void TPM0_IRQHandler()
+{
+    TPM0->SC = 0; // disable internal clocking
+    TPM0->SC = TPM_SC_TOF_MASK;        
+    RESET_DEBUG;
+    WS2811::dma_done = true;
+}
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/WS2811.h	Sat Jan 04 00:40:08 2014 +0000
@@ -0,0 +1,63 @@
+// Mbed library to control WS2801-based RGB LED Strips
+// some portions (c) 2011 Jelmer Tiete
+// This library is ported from the Arduino implementation of Adafruit Industries
+// found at: http://github.com/adafruit/LPD8806
+// and their strips: http://www.adafruit.com/products/306
+// Released under the MIT License: http://mbed.org/license/mit
+//
+/*****************************************************************************/
+
+// Heavily modified by Jas Strong, 2012-10-04
+// Changed to use a virtual base class and to use software SPI.
+//
+// Modified by Ned Konz, December 2013.
+// Using three-phase DMA ala Paul Stoffegren's version.
+
+#ifndef MBED_WS2811_H
+#define MBED_WS2811_H
+
+#include "mbed.h"
+#include "LedStrip.h"
+
+#define MAX_LEDS_PER_STRIP 60
+
+extern "C" void DMA0_IRQHandler();
+extern "C" void TPM0_IRQHandler();
+
+class WS2811 : public LedStrip
+{
+public:
+    WS2811(unsigned n, unsigned pinNumber);
+
+    virtual void begin();
+    virtual void show();
+    virtual void blank();
+
+    static void startDMA();
+
+private:
+    uint32_t pinMask;
+
+    void writePixel(unsigned n, uint8_t *p);
+
+    // Class Static:
+
+    static bool initialized;
+    static uint32_t enabledPins;
+    static volatile bool dma_done;
+    static void wait_for_dma_done() { while (!dma_done) __WFI(); }
+
+    static void writeByte(uint8_t byte, uint32_t mask, uint32_t *dest);
+
+    static void hw_init();
+        static void io_init();
+        static void clock_init();
+        static void dma_init();
+        static void tpm_init();
+        static void dma_data_init();
+        
+    friend void TPM0_IRQHandler();
+};
+
+#endif
+