Here's the implementation from CPython standard library (urllib.parse), which also work with MicroPython. The function is originally called
unquote_to_bytes and is used by
unquote internally. I renamed it and cleaned it up very slightly.
Code: Select all
_hexdig = '0123456789ABCDEFabcdef'
_hextobyte = None
def unquote(string):
"""unquote('abc%20def') -> b'abc def'."""
global _hextobyte
# Note: strings are encoded as UTF-8. This is only an issue if it contains
# unescaped non-ASCII characters, which URIs should not.
if not string:
return b''
if isinstance(string, str):
string = string.encode('utf-8')
bits = string.split(b'%')
if len(bits) == 1:
return string
res = [bits[0]]
append = res.append
# Delay the initialization of the table to not waste memory
# if the function is never called
if _hextobyte is None:
_hextobyte = {(a + b).encode(): bytes([int(a + b, 16)])
for a in _hexdig for b in _hexdig}
for item in bits[1:]:
try:
append(_hextobyte[item[:2]])
append(item[2:])
except KeyError:
append(b'%')
append(item)
return b''.join(res)
This implementation isn't very memory efficient, though. The first time it is used, it builds a dictionary mapping hex codes to chars with 484 entries. That's quite a big chunk of memory.
Here's a variant of this function, that doesn't build the whole mapping upfront, but only caches codes, which are actually used, thus trading less memory for a minimal performance loss if there are a lot of escape codes in the passed URLs. Still the mapping is stored as a global variable and will grow over time when URLs with different escape codes are passed. If you don't want that, just make the
_hextobytes_cache variable local and initialize it to an empty dictionary each call or leave it out all together and do the hexstring to bytes char conversion each time.
Code: Select all
_hextobyte_cache = None
def unquote(string):
"""unquote('abc%20def') -> b'abc def'."""
global _hextobyte_cache
# Note: strings are encoded as UTF-8. This is only an issue if it contains
# unescaped non-ASCII characters, which URIs should not.
if not string:
return b''
if isinstance(string, str):
string = string.encode('utf-8')
bits = string.split(b'%')
if len(bits) == 1:
return string
res = [bits[0]]
append = res.append
# Build cache for hex to char mapping on-the-fly only for codes
# that are actually used
if _hextobyte_cache is None:
_hextobyte_cache = {}
for item in bits[1:]:
try:
code = item[:2]
char = _hextobyte_cache.get(code)
if char is None:
char = _hextobyte_cache[code] = bytes([int(code, 16)])
append(char)
append(item[2:])
except KeyError:
append(b'%')
append(item)
return b''.join(res)