Advice needed - optimize performance: converting dictionary to string (csv))

General discussions and questions abound development of code with MicroPython that is not hardware specific.
Target audience: MicroPython Users.
Post Reply
dukeduck
Posts: 22
Joined: Thu Aug 29, 2019 2:06 pm

Advice needed - optimize performance: converting dictionary to string (csv))

Post by dukeduck » Tue Jul 28, 2020 3:12 pm

Hi,

I'm trying to log data from some sensors to the dictionary, and then convert the dictionary to string in order to save them as a .csv file on the sd card.

The dictionary has 13 items, and each of them has a list in which there are about 60-80 data points (float number). I generated mock data with below code:

Code: Select all

dataset = {
    'item0': [a for a in range(60)],
    'item1': [a for a in range(60)],
    'item2': [a for a in range(60)],
    'item3': [a for a in range(60)],
    'item4': [a for a in range(60)],
    'item5': [a for a in range(60)],
    'item6': [a for a in range(60)],
    'item7': [a for a in range(60)],
    'item8': [a for a in range(60)],
    'item9': [a for a in range(60)],
    'item10': [a for a in range(60)],
    'item11': [a for a in range(60)],
    'item12': [True if a % 2 == 0 else False for a in range(60)]
}
And below code was what I used to convert the above dictionary to csv formatted string:

Code: Select all

def dict_to_csv(the_dict, omit_keys=False, delimiter=',', end='\r\n', sort_keys=True):
    """
    Convert dictionary to string in csv format
    The delimiter will be stripped from the content to avoid format error.
    :param the_dict: dictionary to be converted
    :param delimiter: string, default to be comma
    :param end: string, line-break, default to be '\r\n'
    :param sort_keys: boolean, whether or not sort the keys, in order to align
            multiple dictionaries which share the same keys
    :param omit_keys: boolean, whether or not omit keys. in the case of combining
            multiple dictionaries which share the same keys, only the 1st dict needs
            to show the keys.
    :return: String
    """

    def cleared_value(the_value, the_delimiter):
        if the_value is None:
            the_value = ""
        if the_value is True:
            the_value = "True"
        if the_value is False:
            the_value = "False"
        return str(the_value).strip(the_delimiter)

    data_len = 0
    for data_list in the_dict.values():
        data_len = len(data_list)
        break
    csv_content = ''
    key_list = [key for key in the_dict.keys()]
    if sort_keys:
        key_list.sort()
    if not omit_keys:
        for key in key_list:
            key = cleared_value(key, delimiter)
            csv_content += (key + delimiter)
        csv_content = csv_content.rstrip(delimiter)
        csv_content += end
    for i in range(data_len):
        for key in key_list:
            value = cleared_value(the_dict[key][i], delimiter)
            csv_content += (value + delimiter)
        csv_content = csv_content.rstrip(delimiter)
        csv_content += end
    return csv_content
I tested the performance with below code:

Code: Select all

def run_test():
    start = utime.ticks_ms()
    content = dict_to_csv(dataset)
    convert_time = utime.ticks_diff(utime.ticks_ms(), start)
    print('convertion took ' + str(convert_time) + ' ms')
It took at least 2500ms to finish the convertion which was way too long than I expected.

Any suggestion that I can refine the code to optimize the performance?

The board I'm using is an ESP32 wrover, and the firmware is the lastest unstable build.

Thanks in advance.

Kaiyuan

User avatar
jimmo
Posts: 2754
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: Advice needed - optimize performance: converting dictionary to string (csv))

Post by jimmo » Tue Jul 28, 2020 11:49 pm

One thing that would probably help is to write out the csv file directly rather than appending strings together.

However I'd probably suggest instead to write out the file using JSON (ujson.dump(file) -- which is a native built-in so will be much faster), and then if you still need it in CSV you can convert it when you extract the file from the device (i.e. on your PC).

dukeduck
Posts: 22
Joined: Thu Aug 29, 2019 2:06 pm

Re: Advice needed - optimize performance: converting dictionary to string (csv))

Post by dukeduck » Thu Jul 30, 2020 6:56 am

Thanks.

I tried writing out json to sd card with built-in ujson dump, it took 50~ms. Writing out string in csv format to the same sd card took 30~ms. I think I need to reorganize my code to concat the sensor data directly to string rather than in dictionary.

dukeduck
Posts: 22
Joined: Thu Aug 29, 2019 2:06 pm

Re: Advice needed - optimize performance: converting dictionary to string (csv))

Post by dukeduck » Tue Dec 08, 2020 3:32 pm

Well, I found a quicker way to do the convertion - it may not be a nice way, but it does help speed things up.

Firstly, I do the transpose by:

Code: Select all

for key, item in the_dict.items():
    key_list.append(key)
    item_list.append(item)

transposed = zip(*item_list)
rather than multiple iterations.

Then I directly turn the list/tuple into string, and remove the square/round brackets as well as the space:

Code: Select all

for line in transposed:
    csv_content += str(line).strip('()').replace(' ', '') + end
Finally in case the desired delimiter for the csv is not comma, I will just replace it with the desired ones.

Code: Select all

if delimiter != ',':
    csv_content = csv_content.replace(',', delimiter)
Now the whole process is under 500ms vs. previously 2500ms with the same amount of raw data.

I won't say it's a universal solution, but for my project, as I know what raw data I'm getting, it works for me ;-)

dukeduck
Posts: 22
Joined: Thu Aug 29, 2019 2:06 pm

Re: Advice needed - optimize performance: converting dictionary to string (csv))

Post by dukeduck » Tue Dec 08, 2020 4:35 pm

This is even faster:

Code: Select all

        key_list = []
        item_list = []
        for key, item in the_dict.items():
            key_list.append(key)
            item_list.append(item)

        transposed = [t for t in zip(*item_list)]

        if not omit_keys:
            csv_title = delimiter.join(key_list) + end
        else:
            csv_title = ''

        csv_body = str(transposed).strip('[()]').replace(' ', '').replace('),(', end)

        csv_content = csv_title + csv_body
Now the converting itself is cut down to 45ms, the bottle neck is the speed of writing file to the disk.

Post Reply