memcpy.s and DMA

The official pyboard running MicroPython.
This is the reference design and main target board for MicroPython.
You can buy one at the store.
Target audience: Users with a pyboard.
Post Reply
manitou
Posts: 73
Joined: Wed Feb 25, 2015 12:15 am

memcpy.s and DMA

Post by manitou » Thu Jun 04, 2015 12:14 pm

I did some experiments with memcpy() on pyboard (168mhz). memcpy() is used in the firmware by the interpeter and for doing various support services, e.g. memory copies for flash. memcpy() is implemented in C in lib/libc/string0.c. If the source and destination are word-aligned, then a loop copies words (ldr.w/str.w) otherwise bytes (ldrb/strb). I measured copy rates in the firmware with C code I added to the firmware using sys_tick_get_microseconds() and 1024-byte arrays aligned on multiple of 8. Time in microseconds (us), rate is megabits/second (mbs).

Code: Select all

     loop set 148.95 mbs 55 us                my own C tests  bytes
     loop copy 148.95 mbs 55 us              my own C tests bytes
     word loop copy 744.73 mbs 11 us    my own C tests words (32-bits)
     memset 819.20 mbs 10 us               firmware
     memcpy 512.00 mbs 16 us              firmware   word-aligned
     unaligned memcpy 167.02 mbs 49 us    firmware
     memcpy from flash 455.11 mbs 18 us   fimware word-aligned
     memcpys 1365.33 mbs 6 us   unrolled loop      newlib best-case
     DMA  1024.00 mbs 8 us              memory-to-memory 32-bit words
     DMA from flash  819.20 mbs 10 us   0x08004000
The table includes memcpys(), an ARM-assembler version of memcpy() available from newlib (embedded C library used on other MCU's). The newlib version uses unrolled (x8) 8-byte copies (ldrd/strd). I added memcpy.s to Makefile and to stmhal directory.

I also added a test of DMA memory-to-memory. I confirmed that the source was being copied to the destination. code below: (Edit, use WORDS instead of BYTES, OK)

Code: Select all

// mem2mem using pyboard DMA2, 32-bits
#include <string.h>

#define micros sys_tick_get_microseconds

#define BYTES 1024
#define WORDS BYTES/sizeof(int)
char src[BYTES] __attribute__ ((aligned (8)));
char dst[BYTES] __attribute__ ((aligned (8)));
int *srcw = (int *) src, *dstw = (int *)dst;

unsigned int t1;

void prmbs(char *lbl,int us,int bits) {
    float mbs = (float)bits/us;
    printf("%s %.2f mbs %d us\n",lbl,(double)mbs,us);
}


void tom() {
    int i;
    DMA_HandleTypeDef DMA_Handle;

    __DMA2_CLK_ENABLE();
    DMA_Handle.Instance = DMA2_Stream0;

    // Need to deinit DMA first
    DMA_Handle.State = HAL_DMA_STATE_READY;
    HAL_DMA_DeInit(&DMA_Handle);

    DMA_Handle.Init.Channel = DMA_CHANNEL_0;
    DMA_Handle.Init.Direction = DMA_MEMORY_TO_MEMORY;
    DMA_Handle.Init.PeriphInc = DMA_PINC_ENABLE;
    DMA_Handle.Init.MemInc = DMA_MINC_ENABLE;
    DMA_Handle.Init.PeriphDataAlignment = DMA_PDATAALIGN_WORD;
    DMA_Handle.Init.MemDataAlignment = DMA_MDATAALIGN_WORD;
    DMA_Handle.Init.Mode = DMA_NORMAL;
    DMA_Handle.Init.Priority = DMA_PRIORITY_HIGH;
    DMA_Handle.Init.FIFOMode = DMA_FIFOMODE_DISABLE;
    DMA_Handle.Init.FIFOThreshold = DMA_FIFO_THRESHOLD_HALFFULL;
//  DMA_Handle.Init.MemBurst = DMA_MBURST_SINGLE;
//   DMA_Handle.Init.PeriphBurst = DMA_PBURST_SINGLE;
    DMA_Handle.Init.MemBurst = DMA_MBURST_INC4;
    DMA_Handle.Init.PeriphBurst = DMA_PBURST_INC4;
    HAL_DMA_Init(&DMA_Handle);

    int errs=0;
    memset(dst,0,BYTES);
    for (i=0;i<BYTES;i++) src[i] =  i;
    t1 = micros();
    HAL_DMA_Start(&DMA_Handle,(uint32_t)srcw,(uint32_t)dstw, WORDS);
    HAL_DMA_PollForTransfer(&DMA_Handle, HAL_DMA_FULL_TRANSFER , 2000);
    t1 = micros() -t1;
    prmbs("dma",t1,BYTES*8);
    for (i=0;i<BYTES;i++) if (dst[i] !=  i%256)errs++;
    printf("errs %d\n",errs);
}

EDIT:
Corrected DMA code, need to HAL_DMA_Start with WORDS not BYTES. result table updated.

Earlier results for DUE, maple, and teensy 3 at
https://github.com/manitou48/DUEZoo/blo ... em2mem.txt

Post Reply