url decode

General discussions and questions abound development of code with MicroPython that is not hardware specific.
Target audience: MicroPython Users.
Post Reply
User avatar
devnull
Posts: 473
Joined: Sat Jan 07, 2017 1:52 am
Location: Singapore / Cornwall
Contact:

url decode

Post by devnull » Sun Feb 26, 2017 4:11 am

How can I decode a url-encoded string as ujson() is unable to convert it and I urllib does not appear to have parse() method ?

Code: Select all

%7B%22_mode%22:%22func%22,%22_name%22:%22download%22,%22file_path%22:%22main.py%22,%22update_host%22:%22192.168.0.35%22,%22update_port%22:3001%7D
Converts to:

Code: Select all

{"_mode":"func","_name":"download","file_path":"main.py","update_host":"192.168.0.35","update_port":3001}
Assuming that there is no built in library for this, is this the most efficient alternative:

Code: Select all

  def urldecode(self,str):
    dic = {"%21":"!","%22":'"',"%23":"#","%24":"$","%26":"&","%27":"'","%28":"(","%29":")","%2A":"*","%2B":"+","%2C":",","%2F":"/","%3A":":","%3B":";","%3D":"=","%3F":"?","%40":"@","%5B":"[","%5D":"]","%7B":"{","%7D":"}"}
    for k,v in dic.items(): str=str.replace(k,v)
    return str
 

SpotlightKid
Posts: 463
Joined: Wed Apr 08, 2015 5:19 am

Re: url decode

Post by SpotlightKid » Mon Feb 27, 2017 4:26 pm

Here's the implementation from CPython standard library (urllib.parse), which also work with MicroPython. The function is originally called unquote_to_bytes and is used by unquote internally. I renamed it and cleaned it up very slightly.

Code: Select all

_hexdig = '0123456789ABCDEFabcdef'
_hextobyte = None


def unquote(string):
    """unquote('abc%20def') -> b'abc def'."""
    global _hextobyte

    # Note: strings are encoded as UTF-8. This is only an issue if it contains
    # unescaped non-ASCII characters, which URIs should not.
    if not string:
        return b''

    if isinstance(string, str):
        string = string.encode('utf-8')

    bits = string.split(b'%')
    if len(bits) == 1:
        return string

    res = [bits[0]]
    append = res.append

    # Delay the initialization of the table to not waste memory
    # if the function is never called
    if _hextobyte is None:
        _hextobyte = {(a + b).encode(): bytes([int(a + b, 16)])
                      for a in _hexdig for b in _hexdig}

    for item in bits[1:]:
        try:
            append(_hextobyte[item[:2]])
            append(item[2:])
        except KeyError:
            append(b'%')
            append(item)

    return b''.join(res)
This implementation isn't very memory efficient, though. The first time it is used, it builds a dictionary mapping hex codes to chars with 484 entries. That's quite a big chunk of memory.

Here's a variant of this function, that doesn't build the whole mapping upfront, but only caches codes, which are actually used, thus trading less memory for a minimal performance loss if there are a lot of escape codes in the passed URLs. Still the mapping is stored as a global variable and will grow over time when URLs with different escape codes are passed. If you don't want that, just make the _hextobytes_cache variable local and initialize it to an empty dictionary each call or leave it out all together and do the hexstring to bytes char conversion each time.

Code: Select all

_hextobyte_cache = None

def unquote(string):
    """unquote('abc%20def') -> b'abc def'."""
    global _hextobyte_cache

    # Note: strings are encoded as UTF-8. This is only an issue if it contains
    # unescaped non-ASCII characters, which URIs should not.
    if not string:
        return b''

    if isinstance(string, str):
        string = string.encode('utf-8')

    bits = string.split(b'%')
    if len(bits) == 1:
        return string

    res = [bits[0]]
    append = res.append

    # Build cache for hex to char mapping on-the-fly only for codes
    # that are actually used
    if _hextobyte_cache is None:
        _hextobyte_cache = {}

    for item in bits[1:]:
        try:
            code = item[:2]
            char = _hextobyte_cache.get(code)
            if char is None:
                char = _hextobyte_cache[code] = bytes([int(code, 16)])
            append(char)
            append(item[2:])
        except KeyError:
            append(b'%')
            append(item)

    return b''.join(res)

tylersuard
Posts: 9
Joined: Mon Jan 21, 2019 4:09 pm

Re: url decode

Post by tylersuard » Fri Apr 26, 2019 12:41 am

Thank you!!!! You wrote an entire script just to answer one question. You, sir, are among the finest.

VladVons
Posts: 60
Joined: Sun Feb 12, 2017 6:49 pm
Location: Ukraine

Re: url decode

Post by VladVons » Wed Feb 03, 2021 8:59 am

Code: Select all

def UrlPercent(aData: bytearray) -> str:
    Bits = aData.split(b'%')
    Arr = [Bits[0]]
    for Item in Bits[1:]:
        Code = Item[:2]
        Char = bytes([int(Code, 16)])
        Arr.append(Char)
        Arr.append(Item[2:].replace(b'+', b' '))
    Res = b''.join(Arr)
    return Res.decode('utf-8')

see also https://github.com/micropython/micropyt ... b/parse.py

SpotlightKid
Posts: 463
Joined: Wed Apr 08, 2015 5:19 am

Re: url decode

Post by SpotlightKid » Wed Feb 03, 2021 11:13 am

VladVons wrote:
Wed Feb 03, 2021 8:59 am

Code: Select all

def UrlPercent(aData: bytearray) -> str:
...
That will also work and is a bit more terse, but behaves differently:

a) doesn't take str
b) returns str not bytes
c) does not handle invalid escape sequences
c) additionally does '+' replacement.

It is also slower, even if the '+' replacement (which should be done on the final result, not in the loop) and the final conversion to str is removed.

If you want a faster version without a cache, try this one:

Code: Select all

def unquote(string):
    """unquote('abc%20def') -> b'abc def'.

    Note: if the input is a str instance it is encoded as UTF-8.
    This is only an issue if it contains unescaped non-ASCII characters,
    which URIs should not.
    """
    if not string:
        return b''

    if isinstance(string, str):
        string = string.encode('utf-8')

    bits = string.split(b'%')
    if len(bits) == 1:
        return string

    res = bytearray(bits[0])
    append = res.append
    extend = res.extend

    for item in bits[1:]:
        try:
            append(int(item[:2], 16))
            extend(item[2:])
        except KeyError:
            append(b'%')
            extend(item)

    return bytes(res)
That's basically just a copy of the CPython stdlib code and doesn't work on bare metal MicroPython, only on the unix port.

Post Reply