DMA and fast GPIO

30 Sep 2010

Hi,

I have read the posts on fast GPIO and am interested in ideas on how to increase the bandwidth performance of the situation where I want to move a block of data from SRAM through an 8-bit GPIO port to an external clocked device.  The previous posts are good for toggling IO bits and bytes but not moving large amounts (10's of KBytes) of data.

A straightforward approach would be to write a byte to the GPIO port and then toggle a separate GPIO pin to clock the data into the external device.  This does not meet my performance needs (tens of MBytes/sec).  I've read the stuff on DMA but 1) are there any hooks to implement this with the mbed library and 2) how do I clock the external device or provide a synchronized clock enable (and use some other pin as a clock)?

 

Burt.

30 Sep 2010

Hi Burt and welcome.

Good question you pose. First up, no, there's (not that I know of) any parts of the Mbed library that's going to connect the GPDMA controller to a GPIO pin. Probably with good reason as you read on...

The first problem you face is finding an 8bit wide port on the Mbed Dip pins. The only contiguous pins of any one port are P0.4 through P0.11 and as you can see straight away, there's an alignment problem you have to overcome. Next problem you have is providing a "write strobe" pulse. The GPDMA doesn't connect to any GPIO pin to provide such an output so the only way I can see to provide it is with a timer-out or pwm output clocking on the same clock speed the DMA is moving the data. However you'd then need to provide a mechanism to sync the strobing pin to the DMA transfers, not easy.

I'm interested to know though, what's "coming in" that needs such a high bandwidth out? Perhaps if you elaborate slightly on your application we might be able to think of alternative stratagies that may be of more help.

--Andy

30 Sep 2010

Hi Andy,

I guess with the LPC1768 I don't need tens of MBytes/sec but only 6MBytes/sec.  Just a technique for tens of MBytes/sec with the LPC1800.  I am downloading large opcode lists from the USB port into memory and then from memory either into Flash or through the 8-bit port to a control engine which is implemented in an FPGA for fast real time processing.  Later I want to move the saved file from Flash to the 8-bit port.

The LPC1768 USB can download data at 12Mbits/sec (1.5MBytes/sec) and an external SPI flash should be able to read data at 50Mbits/sec (6.25MBytes/sec) using an SSP port.  I would like to move the data at least at this rate to the 8-bit port.  If not, then my only other high speed option is using the SSP port to move the data, but I was planning on using the second SSP port to communicate status back and forth with the FPGA.

The LPC1800 is claiming HS USB speeds (480Mbits/sec), and I would really like to move data at a 20-30MByte/sec rate using the same architecture and protocol for the FPGA's sake.  I am looking at the high end version of the product to move 20-30MBytes/sec.  The FPGA already can handle the higher rate in either configuration.

Burt.

30 Sep 2010

Burt,

In my project I just completed a DMA -> SSP -> Flash system and works well. I didn't need speed so tamed the SSP clock to 10Mb/sec. But the write time of the flash is the limiting factor for me at that point (reading the flash is fine, I get sustained throughput with that). Your best bet for a sustained data rate is probably to use an SSP -> FPGA at max clock and 16bit xfer under DMA. But an FPGA is always likely to outstrip you speed wise.

I have always wondered how Sky managed to get a data stream from a geostationary satellite to my TV faster than I can get it across the same PCB ;) Just kidding. But looks like you're going to have to scratch your head a bit.

--Andy

30 Sep 2010

Andy,

I read that the GPIO ports are byte addressable.  I can dedicate port 2 bits 7:0 to my 8-bit port even though only 6 of the bits are connected to the mbed pins.  In my "real" design, all those pins will be accessible.  Can the PortOut do byte writes?  If not, what is the best way for me to test it out?

Burt.

01 Oct 2010 . Edited: 01 Oct 2010

Burt,

Ah, I didn't realise your putting down the LPC1768 and will have all the pins available. OK, so PortOut.... you need the speed so this maybe the point that the Mbed libraries may get in the way. Let me give you a demo. Right now I'm working on SD Cards with means one of my Mbed's is on the logic analyzer with a connect to to P8 (p0.6) so I can do some tests.

The following mini-program is designed to bang Port0 using DMA. For simplicity I just send 16bytes from the buffer directly at the port. Using the LA I can basically see that time between edges is 60ns which will give you a throughput of more than 16MB/sec. The mini demo program also shows how to bash a port using memory-to-memory DMA transfers.

Have a play :)

regards,

--Andy

 

#include "mbed.h"

#define DMA_CHANNEL_ENABLE      1
#define DMA_TRANSFER_TYPE_M2M   (0UL << 11)   
#define DMA_CHANNEL_TCIE        (1UL << 31)
#define DMA_CHANNEL_SRC_INC     (1UL << 26)
#define DMA_MASK_IE             (1UL << 14)
#define DMA_MASK_ITC            (1UL << 15)

DigitalOut myled1(LED1);
DigitalOut myled2(LED2);
DigitalOut myled3(LED3);
DigitalOut myled4(LED4);
DigitalOut P20(p20);

int main() {
    char buffer[256];
    
    /* Prepare to trigger the LA. */
    P20 = 0;
    
    /* Ensure it leds is off. */
    myled1 = 0;
    myled2 = 0;
    myled3 = 0;
    myled4 = 0;
    
    /* Produce an alternating bit pattern to flood
       to P0. */
    for (int i = 0; i < 128; i += 2) {
        buffer[i] = 0xFF;
        buffer[i+1] = 0;
    }
    
    /* Setup P0.6 (p8) and P0.7 (p7) as outputs. */
    LPC_PINCON->PINSEL0 &= ~(3UL << 12);
    LPC_GPIO0->FIODIR   |=  (1UL <<  6);
    LPC_PINCON->PINSEL0 &= ~(3UL << 14);
    LPC_GPIO0->FIODIR   |=  (1UL <<  7);

    /* Power up the GPDMA. */
    LPC_SC->PCONP |= (1UL << 29);
    LPC_GPDMA->DMACConfig = 1;
    LPC_GPDMA->DMACIntTCClear = 0x1;
    NVIC_EnableIRQ(DMA_IRQn);
       
    /* Prep Channel0 to send the first 16 bytes of the buffer to the P0. */
    LPC_GPDMACH0->DMACCSrcAddr  = (uint32_t)buffer;
    LPC_GPDMACH0->DMACCDestAddr = (uint32_t)&LPC_GPIO0->FIOPIN;
    LPC_GPDMACH0->DMACCLLI      = 0;
    LPC_GPDMACH0->DMACCControl  = DMA_CHANNEL_TCIE | DMA_CHANNEL_SRC_INC | 16;

    /* Trigger the LA. */
    P20 = 1;
    
    /* Fire GPDMA Channel0 */
    LPC_GPDMACH0->DMACCConfig = DMA_CHANNEL_ENABLE | 
                                DMA_TRANSFER_TYPE_M2M |
                                DMA_MASK_IE |
                                DMA_MASK_ITC;
    
    while(1) {
        myled1 = 1;
        wait(0.2);
        myled1 = 0;
        wait(0.2);
    }
}

/** DMA_IRQHandler
 */
extern "C" void DMA_IRQHandler(void) __irq {
    P20 = 0;
    if (LPC_GPDMA->DMACIntStat & 1) {
        if (LPC_GPDMA->DMACIntTCStat & 1) {
            myled3 = 1;    
            LPC_GPDMA->DMACIntTCClear = 1;
        }
        if (LPC_GPDMA->DMACIntErrStat & 1) {
            myled4 = 1;
            LPC_GPDMA->DMACIntErrClr = 1;
        }
    }
}
16 Mar 2012

Hi Andy,

After read some of your reply, I found that you are the right person possible to have the answer. You test the link from GPDMA->SSP->FLASH, while I am trying an opposite way : from FPGA (NOT IN MBED)->SSP(IN MBED)->GPDMA(MBED)->DMA (USB IN MBED)->USB(Android-powered device).

In my application, raw data acquiring from DAQ system (24 bits * 128 channels (at least) each block with 2k sample rate) is going to transmit into mbed through SPI interface and pass through USB interface to an Android-powered device through Android Accessory Protocol.

Since the data is not in a SD card but from a FPGA, SSP interface (SPI in TI mode) seems to have higher performance for continuous bits stream. My question is that: 1. How much data rate in your test? And what is the peripheral of mbed in your test. More code to share with us?

2. In my app, Mbed is going to act as a master and DAQ system end with FPGA is going to be a slave. Do you have any suggestion of the following condition?

When I tested the AndordAccessory example from cookbook, I found that this program does not implement real USB host but make a trick with a method that using serial-USB port and all data transceiver under serial protocol instead of USB protocol for simplicity. But Samuel Mokrani tested the USBSerial, and replied me that for the moment he had never reached more than 580 kBytes/s on the USB bus (this max speed had been reached with USBSerial). And he recommend activate DMA inside USB port of MBED 1768.

I suppose that USB host protocol is still under developing in mbed, and no reliable and stable library is available for directly using by now. Furthermore, UART3, one of the LPC1768 UARTS, is connected to USB port in mbed board and a library for mapping serial data between USB data is available in mbed library. Then a trick is made for simplicity.

In this way, if building a stable library in mbed part, there is the possibility that reach a higher speed rate like 777.216 Kbytes per second (24 bits * 128 channels * 2024 sample rate) or just a few modification in USBSerial also could achieve this goal.

In a word, if I want to implement a data link from SSP bus to USB with at lease 777.216 Kbytes per second, what should I do in your opinion? Any code to recommend?

Best regards Quan