memcpy.s and DMA
Posted: Thu Jun 04, 2015 12:14 pm
I did some experiments with memcpy() on pyboard (168mhz). memcpy() is used in the firmware by the interpeter and for doing various support services, e.g. memory copies for flash. memcpy() is implemented in C in lib/libc/string0.c. If the source and destination are word-aligned, then a loop copies words (ldr.w/str.w) otherwise bytes (ldrb/strb). I measured copy rates in the firmware with C code I added to the firmware using sys_tick_get_microseconds() and 1024-byte arrays aligned on multiple of 8. Time in microseconds (us), rate is megabits/second (mbs).
The table includes memcpys(), an ARM-assembler version of memcpy() available from newlib (embedded C library used on other MCU's). The newlib version uses unrolled (x8) 8-byte copies (ldrd/strd). I added memcpy.s to Makefile and to stmhal directory.
I also added a test of DMA memory-to-memory. I confirmed that the source was being copied to the destination. code below: (Edit, use WORDS instead of BYTES, OK)
EDIT:
Corrected DMA code, need to HAL_DMA_Start with WORDS not BYTES. result table updated.
Earlier results for DUE, maple, and teensy 3 at
https://github.com/manitou48/DUEZoo/blo ... em2mem.txt
Code: Select all
loop set 148.95 mbs 55 us my own C tests bytes
loop copy 148.95 mbs 55 us my own C tests bytes
word loop copy 744.73 mbs 11 us my own C tests words (32-bits)
memset 819.20 mbs 10 us firmware
memcpy 512.00 mbs 16 us firmware word-aligned
unaligned memcpy 167.02 mbs 49 us firmware
memcpy from flash 455.11 mbs 18 us fimware word-aligned
memcpys 1365.33 mbs 6 us unrolled loop newlib best-case
DMA 1024.00 mbs 8 us memory-to-memory 32-bit words
DMA from flash 819.20 mbs 10 us 0x08004000
I also added a test of DMA memory-to-memory. I confirmed that the source was being copied to the destination. code below: (Edit, use WORDS instead of BYTES, OK)
Code: Select all
// mem2mem using pyboard DMA2, 32-bits
#include <string.h>
#define micros sys_tick_get_microseconds
#define BYTES 1024
#define WORDS BYTES/sizeof(int)
char src[BYTES] __attribute__ ((aligned (8)));
char dst[BYTES] __attribute__ ((aligned (8)));
int *srcw = (int *) src, *dstw = (int *)dst;
unsigned int t1;
void prmbs(char *lbl,int us,int bits) {
float mbs = (float)bits/us;
printf("%s %.2f mbs %d us\n",lbl,(double)mbs,us);
}
void tom() {
int i;
DMA_HandleTypeDef DMA_Handle;
__DMA2_CLK_ENABLE();
DMA_Handle.Instance = DMA2_Stream0;
// Need to deinit DMA first
DMA_Handle.State = HAL_DMA_STATE_READY;
HAL_DMA_DeInit(&DMA_Handle);
DMA_Handle.Init.Channel = DMA_CHANNEL_0;
DMA_Handle.Init.Direction = DMA_MEMORY_TO_MEMORY;
DMA_Handle.Init.PeriphInc = DMA_PINC_ENABLE;
DMA_Handle.Init.MemInc = DMA_MINC_ENABLE;
DMA_Handle.Init.PeriphDataAlignment = DMA_PDATAALIGN_WORD;
DMA_Handle.Init.MemDataAlignment = DMA_MDATAALIGN_WORD;
DMA_Handle.Init.Mode = DMA_NORMAL;
DMA_Handle.Init.Priority = DMA_PRIORITY_HIGH;
DMA_Handle.Init.FIFOMode = DMA_FIFOMODE_DISABLE;
DMA_Handle.Init.FIFOThreshold = DMA_FIFO_THRESHOLD_HALFFULL;
// DMA_Handle.Init.MemBurst = DMA_MBURST_SINGLE;
// DMA_Handle.Init.PeriphBurst = DMA_PBURST_SINGLE;
DMA_Handle.Init.MemBurst = DMA_MBURST_INC4;
DMA_Handle.Init.PeriphBurst = DMA_PBURST_INC4;
HAL_DMA_Init(&DMA_Handle);
int errs=0;
memset(dst,0,BYTES);
for (i=0;i<BYTES;i++) src[i] = i;
t1 = micros();
HAL_DMA_Start(&DMA_Handle,(uint32_t)srcw,(uint32_t)dstw, WORDS);
HAL_DMA_PollForTransfer(&DMA_Handle, HAL_DMA_FULL_TRANSFER , 2000);
t1 = micros() -t1;
prmbs("dma",t1,BYTES*8);
for (i=0;i<BYTES;i++) if (dst[i] != i%256)errs++;
printf("errs %d\n",errs);
}
Corrected DMA code, need to HAL_DMA_Start with WORDS not BYTES. result table updated.
Earlier results for DUE, maple, and teensy 3 at
https://github.com/manitou48/DUEZoo/blo ... em2mem.txt