HTTPS stream halts before finished

General discussions and questions abound development of code with MicroPython that is not hardware specific.
Target audience: MicroPython Users.
Post Reply
cefn
Posts: 230
Joined: Tue Aug 09, 2016 10:58 am

HTTPS stream halts before finished

Post by cefn » Sat Feb 03, 2018 11:25 pm

I have a HTTPS stream negotiated with Twitter, which seems to stop at an arbitrary point (but the same point every time) while the identical code in Cpython can consume the stream to completion.

The issue is present on unix and esp32 running 1.9.3, it's not just triggered by the special TLS limitations of ESP8266. There don't seem to be any notes about this kind of failure being expected as far as I can see, e.g. in http://docs.micropython.org/en/latest/e ... imitations there is recognition that records can be too long for micros to handle, but I don't think that explains the form of failure experienced.

The reference code for stream creation I am using can be found mostly in https://github.com/ShrimpingIt/medea/bl ... a/https.py but the test example I've been using to prove the issue is documented by https://github.com/ShrimpingIt/medea/bl ... DumpAll.py

The example accesses a single recent trump tweet from the Twitter API serving a total of 3881 bytes of JSON cruft, (including the tweet id and tweet text I actually need for my application). Running in CPython, the full complement of JSON bytes are successfully dumped to the console. However, running in Micropython, the stream mystifyingly halts at 3584 bytes every time.

I have been invoking it from the medea repository root folder like...

Code: Select all

python3 -m examples.scripts.twitterDumpAll

Code: Select all

micropython -m examples.scripts.twitterDumpAll
If anyone wants to test out the routine, they will need to insert a Twitter Bearer Token secret in the medea/auth.py file to get Twitter to stream the bytes. I personally requested mine via https://www.npmjs.com/package/get-twitter-bearer-token after getting a Twitter developer account set up.

I'm posting in case there is something blindingly obvious or well-known in terms of limitations on HTTPS-style socket use across platforms in micropython which would trivially cause this issue.

cefn
Posts: 230
Joined: Tue Aug 09, 2016 10:58 am

Re: HTTPS stream halts before finished

Post by cefn » Mon Feb 05, 2018 3:58 pm

OK from more experimentation this seems to be to do with socket/ssl-wrapped-socket stream flushing in Micropython.

The remaining bytes when it halts seem to be a function of buffer size, (always less than the buffer size, as if it is halted waiting for a full 512 bytes to fill the buffer when only 227 are available, for example). I am surprised because I understood the following...
Read up to size bytes from the socket.
...from http://docs.micropython.org/en/latest/w ... ocket.html to mean that if size bytes were not available, it would read a lesser number of bytes, and I assumed that ssl's readinto() function would be calling this. My expectation was reinforced by there being a return value in the CPython socket#read equivalent, meaning it could notify the caller of the actual number of bytes read if the full amount wasn't available. This was further reinforced by the fact that the library as a whole functions perfectly with CPython, so there must be something right about the invocations I am making.

Micropython must behave differently. This phrase from the documentation may be revealing...
This function tries to read as much data as requested (no “short reads”).
...sounds a lot like it might hang waiting if the number of bytes isn't available. However, it is hard to know how you can possibly read all available bytes assuming that attempting to write into a fixed size buffer will cause a hang if there is not an exact multiple of that buffer size in the stream.

As a workaround, actually making the buffer just one byte long causes the stream to complete, since there is never a partial buffer - either a failure to read or a complete buffer.

If you choose any other buffer size, it leaves the last buffer (which would be partially filled) unfilled while it blocks, apparently waiting for enough to fill the buffer.

Not sure if there's any configuration option of the underlying socket or ssl wrapper which will change this behaviour.

cefn
Posts: 230
Joined: Tue Aug 09, 2016 10:58 am

Re: HTTPS stream halts before finished

Post by cefn » Mon Feb 05, 2018 5:06 pm

To simplify for those who want to see the behaviour, I have put together the below reference HTTPS request based on pfalcon's original example, (at https://github.com/micropython/micropyt ... ent_ssl.py ) but which includes a pre-crafted Twitter URL for testing, and a Twitter Bearer Token intended for experimental work which I will revoke soon.

I suppose if thousands of people test-run this code, then the rate limit for my application could kick in, but I doubt there will be that many experimenters using it.

If you run this with python3, you will see that all of the data arrives, but it hangs at the end because Twitter leaves the connection open rather than forcing an EOF. This is fine in the case of HTTP because the content-length property can be used to tell that all the bytes have arrived and the client can take responsibility for closing the connection.

However if you run this with micropython, you will see that the final part-buffer-worth of data does not arrive meaning the content-length is never reached. It hangs at read before receiving all the data.

Code: Select all

try:
    import usocket as _socket
except:
    import _socket
try:
    import ussl as ssl
except:
    import ssl


def main():
    sock = _socket.socket()

    ai = _socket.getaddrinfo("api.twitter.com", 443)
    print("Address infos:", ai)
    addr = ai[0][-1]

    print("Connect address:", addr)
    sock.connect(addr)

    sslsock = ssl.wrap_socket(sock)
    print(sslsock)
    
    sslsock.write(b"GET /1.1/statuses/user_timeline.json?screen_name=realDonaldTrump&count=1&include_rts=false HTTP/1.1\r\nHost: api.twitter.com\r\n")
    sslsock.write(b"Authorization: Bearer AAAAAAAAAAAAAAAAAAAAAJKj1gAAAAAAS7Lo%2BpzmCRW%2FqpZjN2yOzmZyKVE%3DHBc42VkI2zpJ2v3z7zcwVErFQqG1IJT1LbFvXpMmwNg7B3fhDY\r\n\r\n")
    
    try:
        while True:
            data = sslsock.read(512)
            if len(data) > 0:
                print(data)
            else:
                break
    finally:
        sock.close()

main()


Damien
Site Admin
Posts: 647
Joined: Mon Dec 09, 2013 5:02 pm

Re: HTTPS stream halts before finished

Post by Damien » Wed Feb 07, 2018 2:03 am

MicroPython's ssl.read(n) method will try to read exactly n bytes so will block forever if the server doesn't send enough.

I think the fix simply comes down to the fact that you need to parse the "content-length" header and then only request exactly that many bytes of data after the headers are finished.

The following script works correctly on CPython and MicroPython (unix version with axtls) and runs to completion, including closing the socket:

Code: Select all

try:
    import usocket as _socket
except:
    import _socket
try:
    import ussl as ssl 
except:
    import ssl 


def main():
    sock = _socket.socket()

    ai = _socket.getaddrinfo("api.twitter.com", 443)
    print("Address infos:", ai) 
    addr = ai[0][-1]

    print("Connect address:", addr)
    sock.connect(addr)

    sslsock = ssl.wrap_socket(sock)
    print(sslsock)
    
    sslsock.write(b"GET /1.1/statuses/user_timeline.json?screen_name=realDonaldTrump&count=1&include_rts=false HTTP/1.1\r\nHost: api.twitter.com\r\n")
    sslsock.write(b"Authorization: Bearer AAAAAAAAAAAAAAAAAAAAAJKj1gAAAAAAS7Lo%2BpzmCRW%2FqpZjN2yOzmZyKVE%3DHBc42VkI2zpJ2v3z7zcwVErFQqG1IJT1LbFvXpMmwNg7B3fhDY\r\n\r\n")

    # MicroPython's ssl has readline() already but CPython needs makefile
    try:
        sslsock = sslsock.makefile('rwb')
    except:
        pass

    # Read headers
    while True:
        line = sslsock.readline()
        if line == b'\r\n':
            break
        print(line)
        if line.startswith(b'content-length: '):
            content_length = int(line.split(b': ')[1].strip())

    # Read content_length amount of data
    data = b'' 
    while len(data) < content_length:
        print(len(data), content_length)
        # The "min 512" is not needed, it's just to show that you can chunk the data
        data += sslsock.read(min(512, content_length - len(data)))
        sock.close()
    print(data)

main()
In testing the above code I did run into cases where even CPython would hang on the sslsock.read(512) call because it was waiting for more data. That's why the call appears as sslsock.read(min(512, content_length - len(data))), so it never tries to read too much data.

cefn
Posts: 230
Joined: Tue Aug 09, 2016 10:58 am

Re: HTTPS stream halts before finished

Post by cefn » Wed Feb 07, 2018 8:58 am

Sorry, I had attempted this but using readinto() and a memoryview with a reduced size for the last chunk to avoid buffer allocations and heap fragmentation and seemed to continue to experience the 'hang'. I will go back to that code and figure out what I did wrong to get a false negative on that strategy when testing.

cefn
Posts: 230
Joined: Tue Aug 09, 2016 10:58 am

Re: HTTPS stream halts before finished

Post by cefn » Wed Feb 14, 2018 11:52 pm

Damien, thanks again for your effort implementing a working HTTPS routine!

I have translated this strategy of calculating the remaining bytes and calling readinto with a truncated memoryview of the buffer, into my JSON API parsing library at https://github.com/ShrimpingIt/medea

It is now functioning with live Twitter API JSON test cases working in ESP32 and ESP8266. See e.g. https://github.com/ShrimpingIt/medea/bl ... lFields.py

I thought I should record that there was a hard reset behaviour from wrap_socket() as it seemed a little surprising (compared to e.g. getting an out of memory warning). When originally running my code in ESP8266 there was a graceless hard reset when calling the line

Code: Select all

sslsock = ssl.wrap_socket(sock)
I worked back from my complex JSON parsing code (which dies on ESP8266) to your more minimal code (which runs successfully on ESP8266) to find that the available memory was the key factor, since medea's JSON generator and parsing logic uses quite a bit. I can therefore recreate the hard reset at wrap_socket() on ESP8266 (NodeMCUv2) by prefixing your code example with the line...

Code: Select all

buf = bytearray(14000)
The wrap_socket() fails when there is around 11k memory or less remaining.

If I put medea source files in a bespoke image as frozen modules, (placing function definitions byte strings etc. outside of RAM), that gives me enough memory available to resolve the issue on ESP8266 as well.

By default the working ESP8266 version (with frozen modules) is limited to 1 tweet because of the 5k TLS buffer size and no support for RFC 6606 ( see https://github.com/micropython/micropyt ... -361776049 ) although it is possible to tweak for a larger TLS buffer by modifying values in the ports/esp8266/Makefile as per https://github.com/micropython/micropyt ... e00207c47f.

For example I have proven that increasing this value to 8192 enabled the ESP8266 to successfully retrieve 10 tweets at a time, although it takes around 15 seconds to process them right now, it does at least work.

nic
Posts: 6
Joined: Tue Dec 18, 2018 12:08 pm

Re: HTTPS stream halts before finished

Post by nic » Thu Jan 31, 2019 7:15 pm

Thank you for the code , Damian. It works well on my wipy board to connect to a HTTPS server. But i have the issue, that i need to integrate this approach to asynchronous system("asynyio"). Is it possible to somehow "await" the reading from the socket for a response from the server? It tried to use the Stream Class from asyncio, where a non-blocking socket is created, but when i translate your code to the following approach, when i come to the point of reader.readline(), nothing is happening the system halts there:

Code: Select all

import asyncio
import urllib.parse
import sys

async def print_http_headers(url):
    url = urllib.parse.urlsplit(url)
    if url.scheme == 'https':
        reader, writer = await asyncio.open_connection(
            url.hostname, 443, ssl=True)
    else:
        reader, writer = await asyncio.open_connection(
            url.hostname, 80)

    query = (
        f"HEAD {url.path or '/'} HTTP/1.0\r\n"
        f"Host: {url.hostname}\r\n"
        f"\r\n"
    )

    writer.write(query.encode('latin-1'))
    while True:
        line = await reader.readline()
        if not line:
            break

        line = line.decode('latin1').rstrip()
        if line:
            print(f'HTTP header> {line}')

    # Ignore the body, close the socket
    writer.close()

url = sys.argv[1]
asyncio.run(print_http_headers(url))
Thanks in advance!

Post Reply