How can I tell what caused an ESP32 board to restart?

timg11 · Post by **timg11** » Sun Jun 27, 2021 8:19 pm

I've got an ESP-32 Micropython with a simple embedded application. It is in a location where it is impractical to connect a computer to the USB port to access the serial terminal. In operation, the program normally logs events to Wi-Fi, and I'm seeing unexpected startup messages in the log.

I have a general Exception handler that attempts to send out the cause over Wi-Fi. This has been helpful in identifying code bugs, but now I'm only seeing startup events with no Exception messages preceding them, so the ESP32 is resetting without executing the Exception fully.
I understand some error situations may not make it that far in the Exception process, or Wi-Fi itself may have failed.

Code: Select all

except Exception as e:
    print ("Caught Exception!", e)
    b = BytesIO()
    sys.print_exception(e, b)
    msg = b.getvalue().decode()
    SendWiFi("Caught Exception! "+str(e)+" msg: "+msg)
    time.sleep(10)
    reset()

At Startup, I check machine.reset_cause(). It provides 5 options:

machine.PWRON_RESET
machine.HARD_RESET
machine.WDT_RESET
machine.DEEPSLEEP_RESET
machine.SOFT_RESET

Is there any other information that is stored in the ESP32 that survives a reset and can be accessed from MicroPython during startup? That could provide more details to identify the cause of the last reset with more precision. A traceback like what is provided by the REPL would be handy.

marcidy · Post by **marcidy** » Sun Jun 27, 2021 9:40 pm

You can inspect the reset source register as noted in the technical manual:
https://www.espressif.com/sites/default ... ual_en.pdf

Specifically RTC_CNTL_RESET_STATE_REG. The interpretation is given in the Reset and Clock section of the manual.

Looks like micropython maps brown out to power on reset, so it's possible you can glean slightly more granular data on the reset cause.

timg11 · Post by **timg11** » Sun Jun 27, 2021 10:08 pm

Thanks for the reply, @marcidy. That register (Register 31.12. RTC_CNTL_RESET_STATE_REG (0x3FF48034)) does look useful to access more information.
Can you point me towards a tutorial on how to access at the register level from MicroPython?
The term "register" is not found on the "Getting started with MicroPython on the ESP32" page"

marcidy · Post by **marcidy** » Sun Jun 27, 2021 10:16 pm

machine.mem32[addr] will do it.

Code: Select all

>>> machine.mem32[0x3FF48034]
13185
>>> print("{:032b}".format(machine.mem32[0x3FF48034]))
00000000000000000011001110000001

And compare that with the bits mentioned in the technical manual.

FYI, you can't get too much info about a reset. If the chip was caused to go into reset, there's no time/power to store more information than a few bits, which is what this register is. Knowing it was a brown out will let you know that you aren't supplying enough power. That's probably the case if you are getting power on resets.

These aren't like software exceptions. If the hardware is going into reset, the problem is too low level of the interpreter to catch. It's possible it's a firmware bug, but not much you can do about that without additional debugging.

timg11 · Post by **timg11** » Sun Jun 27, 2021 10:19 pm

Also, reset_cause() is returning a 6th value of 5 sometimes. The 5 constants in defined in machine only correspond to 0 to 4, right?

IE in the following, resetstr returns "Unknown - 5"

Code: Select all

    MyResetCause = reset_cause()
    resetstr = "Unknown - "+str(MyResetCause)
    if ( MyResetCause == machine.PWRON_RESET ): resetstr = "PWRON_RESET"
    if ( MyResetCause == machine.HARD_RESET ): resetstr = "HARD_RESET"
    if ( MyResetCause == machine.WDT_RESET ): resetstr = "WDT_RESET"
    if ( MyResetCause == machine.DEEPSLEEP_RESET ): resetstr = "DEEPSLEEP_RESET"
    if ( MyResetCause == machine.SOFT_RESET ): resetstr = "SOFT_RESET"

marcidy · Post by **marcidy** » Sun Jun 27, 2021 10:27 pm

You can inspect the value of the constants themselves before assuming they start at any particular number.

micropython defines then in an enum starting at 1:

Code: Select all

typedef enum {
    MP_PWRON_RESET = 1,
    MP_HARD_RESET,
    MP_WDT_RESET,
    MP_DEEPSLEEP_RESET,
    MP_SOFT_RESET
} reset_reason_t;

Code: Select all

>>> machine.PWRON_RESET
1
>>> machine.HARD_RESET
2
>>> machine.WDT_RESET
3
>>> machine.DEEPSLEEP_RESET
4
>>> machine.SOFT_RESET
5
>>>

5 is soft reset.

Don't know why your code is not setting it right, but 5 is fine.

marcidy · Post by **marcidy** » Sun Jun 27, 2021 10:39 pm

FYI, "0" is set for an unknown value, so there are actually 6 total. Technically it's possible to get a 0, but there would be a bug in the esp-idf if that ever happened.

Code: Select all

STATIC mp_obj_t machine_reset_cause(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args) {
    if (is_soft_reset) {
        return MP_OBJ_NEW_SMALL_INT(MP_SOFT_RESET);
    }
    switch (esp_reset_reason()) {
        case ESP_RST_POWERON:
        case ESP_RST_BROWNOUT:
            return MP_OBJ_NEW_SMALL_INT(MP_PWRON_RESET);
            break;

        case ESP_RST_INT_WDT:
        case ESP_RST_TASK_WDT:
        case ESP_RST_WDT:
            return MP_OBJ_NEW_SMALL_INT(MP_WDT_RESET);
            break;

        case ESP_RST_DEEPSLEEP:
            return MP_OBJ_NEW_SMALL_INT(MP_DEEPSLEEP_RESET);
            break;

        case ESP_RST_SW:
        case ESP_RST_PANIC:
        case ESP_RST_EXT: // Comment in ESP-IDF: "For ESP32, ESP_RST_EXT is never returned"
            return MP_OBJ_NEW_SMALL_INT(MP_HARD_RESET);
            break;

        case ESP_RST_SDIO:
        case ESP_RST_UNKNOWN:
        default:
            return MP_OBJ_NEW_SMALL_INT(0);
            break;
    }
}

Note the case for ESP_RST_UNKNOWN returning 0.

timg11 · Post by **timg11** » Sun Jun 27, 2021 10:48 pm

marcidy wrote: ↑
Sun Jun 27, 2021 10:27 pm
Don't know why your code is not setting it right, but 5 is fine.

Yep, it was a bug in the code elsewhere. 5 is indeed valid.

timg11 · Post by **timg11** » Fri Jul 02, 2021 10:54 pm

I have updated the ESP32 code so it reports the reset cause when starting up.
I've seen some spontaneous resets, and have the log results.

The "reset_cause()" function returns PWRON_RESET
The check of reset_state_reg = machine.mem32[0x3FF48034] returns 0x330C

According to the documentation:

Code: Select all

    # RTC_CNTL_RESET_STATE_REG
    # Register 31.12. RTC_CNTL_RESET_STATE_REG (0x3FF48034)
    # RTC_CNTL_RESET_CAUSE_APPCPUReset cause for APP_CPU. (RO)   bits 11-6
    # RTC_CNTL_RESET_CAUSE_PROCPUReset cause for PRO_CPU. (RO)   bits 5-0 
    #
    #
    # PRO		APP		Source				    Reset Type	        Note
    # 0x01	    0x01	Chip Power On Reset     System Reset        -
    # 0x10      0x10    RWDT System Reset       System Reset        SeeWDT Chapter.
    # 0x0F      0x0F    Brown Out Reset         System Reset        SeePower Management Chapter.
    # 0x03      0x03    Software System Reset   Core Reset          Configure RTC_CNTL_SW_SYS_RST register.
    # 0x05      0x05    Deep Sleep Reset        Core Reset          SeePower Management Chapter.
    # 0x07      0x07    MWDT0 Global Reset      Core Reset          SeeWDT Chapter.
    # 0x08      0x08    MWDT1 Global Reset      Core Reset          SeeWDT Chapter.
    # 0x09      0x09    RWDT Core Reset         Core Reset          SeeWDT Chapter.
    # 0x0B      -       MWDT0 CPU Reset         CPU Reset           SeeWDT Chapter.
    # 0x0C      -       Software CPU Reset      CPU Reset           Configure RTC_CNTL_SW_APPCPU_RST register.
    # -         0x0B    MWDT1 CPU Reset         CPU Reset           SeeWDT Chapter.
    # -         0x0C    Software CPU Reset      CPU Reset           Configure RTC_CNTL_SW_APPCPU_RST register.
    # 0x0D      0x0D    RWDT CPU Reset          CPU Reset           SeeWDT Chapter.
    # -         0xE     PRO CPU Reset           CPU Reset           Indicates that the PRO CPU has independently reset the APP CPU by configuring the DPORT_APPCPU_RESETTING register

the register value returned 0x330C which is 0011001100001100

Breaking into the bit groups
0011 APP and PRO_CPU state vector selection (?)
001100 APP_CPU 0x0C Software CPU Reset Configure RTC_CNTL_SW_APPCPU_RST register.
001100 PROC_CPU 0x0C Software CPU Reset Configure RTC_CNTL_SW_APPCPU_RST register.

That seems different than the value Python returns which is PWRON_RESET. That is not correct because the ESP-32 is not being powered down during these resets. The power supply is stable, regulated from 12V to 5V DC on the board holding the ESP32 module, with plenty of decoupling capacitors.

I search for "RTC_CNTL_SW_APPCPU_RS" in the ESP32 reference manual, but don't find any details on what it means or how to set it.

I believe something in the software is causing the reset - how can I learn more about the source of the reset?

marcidy · Post by **marcidy** » Fri Jul 02, 2021 11:47 pm

You're going to have to debug what's going on. collect data to see if it's reproducible and under what conditions.

if you can run it via serial, you can log the terminal output and collect more data, like maybe there's an error that gets printed.

otherwise you'll have to log to flash and hope to narrow it down by understanding the last successful operation and knowing what's going on after it, while also not filling up flash and running out of space.

you can turn on extra debugging info via compile time flags as well, but I don't know them off-hand, but that's assuming you can deploy the base firmware.

there's esp.osdebug(), but if you don't have access to the shell, you won't see the debug data.

it takes significant investigation. keep a log of what you've tried and keep trying to narrow down the code to see if you can trigger the crash or not intentionally. or at least isolate which parts have a crash vs not. start commenting out stuff to see if the crash happens or not.

there really is no systematic way to prevent / capture crash data, they are the hardest to debug, and doing it while the device is in the field is exponentially harder. ideally you have a duplicate setup that you can run in parallel to see if it crashes at the same time / under the same conditions but on the bench to gain some insight.

start looking at the code and seeing if you can force a crash via unexpected output.

try logging the exception to a file, reset, and send the file/data. empty the file after a successful send.

errors can happen anywhere, maybe your SendWifi function is a problem. maybe the network goes down, throws an error, and your system reboots before networking is back up.

making that exception handler do less might also help, expecting the SendWifi function to work while handling every exception may not be robust.

good luck, it's not easy.

MicroPython Forum (Archive)

How can I tell what caused an ESP32 board to restart?

How can I tell what caused an ESP32 board to restart?

Re: How can I tell what caused an ESP32 board to restart?

Re: How can I tell what caused an ESP32 board to restart?

Re: How can I tell what caused an ESP32 board to restart?

Re: How can I tell what caused an ESP32 board to restart?

Re: How can I tell what caused an ESP32 board to restart?

Re: How can I tell what caused an ESP32 board to restart?

Re: How can I tell what caused an ESP32 board to restart?

Re: How can I tell what caused an ESP32 board to restart?

Re: How can I tell what caused an ESP32 board to restart?