Okay, I would not call this a minimal repro but this is the best thing I can come up with for now.
See attachment.
- The folder utrio includes a WIP port from python trio async await lib. Docs can be found here
https://trio.readthedocs.io/en/stable/.
- uamqtt is an async rewrite of the mqtt lib from
https://github.com/micropython/micropyt ... mple/umqtt beeing async.
- run main.py
I could reproduce the error on the newest firmware esp32spiram-idf4-20200531-unstable-v1.12-483-g22806ed5d.bin
Sorry, this is a very noisy application with a lot of output...
In order to reproduce:
- edit the wlan ap credentials in main.py
- let it run.
- Switch the ap on and off at will
- Suddenly the worker thread won't run anymore -> observed by seeing, that the queue gets longer and longer and not debug output from the worker thread does appear anymore. Be patient, sometimes this takes minutes ...
Usually the worker thread not responding coincides with an event from the wlan stack such as STA_DISCONNECTED which will trigger exceptions in "subtask_connect" task. However I also see, that the "Crash Task" crashing can coincide with the worker thread stalling. The trio nursery will catch the exception of the first task which throws an exception. Inject a Canceled exception to the other tasks in that nursery and finally raise an exception which will propagate "upstream".
I don't have insight into the internals of exception handling. Might the cause of locking the thread originate from there?
Example log output below (watch out for "from thread" messages suddenly disappearing).
Length of Thread queue is now 1
from thread length of queue is: 1
from thread <function> (0.1,) <Task Object: sync_sleep, CPU: 0.001423109>
publishing, 44
from thread length of queue is: 0
publishing, 45
sync sleep
schedule <function> in thread from task <Task Object: sync_sleep, CPU: 0.001415517>
Length of Thread queue is now 1
from thread length of queue is: 1
from thread <function> (0.1,) <Task Object: sync_sleep, CPU: 0.001469956>
publishing, 46
from thread length of queue is: 0
publishing, 47
sync sleep
schedule <function> in thread from task <Task Object: sync_sleep, CPU: 0.001462102>
Length of Thread queue is now 1
I (260937) wifi: state: run -> init (3c0)
I (260937) wifi: pm stop, total sleep time: 28803713 us / 78385947 us
I (260937) wifi: new:<6,0>, old:<6,0>, ap:<255,255>, sta:<6,0>, prof:1
I (260947) wifi: STA_DISCONNECTED, reason:3
<Task Object: subtask_connect, CPU: 0.008480228> received exception
Traceback (most recent call last):
File "utrio/__init__.py", line 120, in resume
File "main.py", line 30, in subtask_connect
File "uamqtt.py", line 165, in publish
File "uamqtt.py", line 51, in _send_str
File "utrio/socket.py", line 185, in send_all
OSError: [Errno 113] EHOSTUNREACH
<Nursery object at 3fa9c0d0> get canceled with
Traceback (most recent call last):
File "utrio/__init__.py", line 120, in resume
File "main.py", line 30, in subtask_connect
File "uamqtt.py", line 165, in publish
File "uamqtt.py", line 51, in _send_str
File "utrio/socket.py", line 185, in send_all
OSError: [Errno 113] EHOSTUNREACH
<Task Object: subtask_connect, CPU: 0.008478289> cancel called
<Task Object: sync_sleep, CPU: 0.001515745> cancel called
<Task Object: crash, CPU: 3.876495e-06> cancel called
<Task Object: subtask_connect, CPU: 0.008477763> exit
<Task Object: sync_sleep, CPU: 0.001515639> received exception
Traceback (most recent call last):
File "utrio/__init__.py", line 120, in resume
File "main.py", line 37, in sync_sleep
File "utrio/to_thread.py", line 72, in run_sync
File "utrio/to_thread.py", line 33, in schedule_in_thread
File "utrio/__init__.py", line 185, in wait_task_rescheduled
Canceled:
<Nursery object at 3fa9c0d0> get canceled with
Traceback (most recent call last):
File "utrio/__init__.py", line 120, in resume
File "main.py", line 37, in sync_sleep
File "utrio/to_thread.py", line 72, in run_sync
File "utrio/to_thread.py", line 33, in schedule_in_thread
File "utrio/__init__.py", line 185, in wait_task_rescheduled
Canceled:
<Task Object: sync_sleep, CPU: 0.001515287> cancel called
<Task Object: crash, CPU: 3.875323e-06> cancel called
<Task Object: sync_sleep, CPU: 0.001515222> exit
<Task Object: crash, CPU: 3.875158e-06> received exception
Traceback (most recent call last):
File "utrio/__init__.py", line 120, in resume
File "main.py", line 42, in crash
File "utrio/_time.py", line 55, in sleep
File "utrio/_time.py", line 34, in delay_task
File "utrio/__init__.py", line 185, in wait_task_rescheduled
Canceled:
<Nursery object at 3fa9c0d0> get canceled with
Traceback (most recent call last):
File "utrio/__init__.py", line 120, in resume
File "main.py", line 42, in crash
File "utrio/_time.py", line 55, in sleep
File "utrio/_time.py", line 34, in delay_task
File "utrio/__init__.py", line 185, in wait_task_rescheduled
Canceled:
<Task Object: crash, CPU: 3.874302e-06> cancel called
<Task Object: crash, CPU: 3.874212e-06> exit
Canceled:
restart
try exit but not allowed
schedule <function> in thread from task <Task Object: sync_sleep, CPU: 0.0>
Length of Thread queue is now 2
I (263017) wifi: STA_DISCONNECTED, reason:201
no AP found
I (265087) wifi: STA_DISCONNECTED, reason:201
no AP found
I (267147) wifi: STA_DISCONNECTED, reason:201
no AP found
I (269217) wifi: STA_DISCONNECTED, reason:201
no AP found
I (271287) wifi: STA_DISCONNECTED, reason:201
no AP found
I (273347) wifi: STA_DISCONNECTED, reason:201
no AP found
I (275417) wifi: STA_DISCONNECTED, reason:201
no AP found