catch fatal errors during lockup

27 Sep 2011

I have a program that I prototyped with the mbed and then moved to LPC1769 hardware for production and it occassionally (but consistantly) locks up with both platforms, most likey something to do with serial interrupts. It locks solid and stops responding and a full power cycle is required to restart.

I want to figure out what condition its getting in on the fault. I know with non-mbed devices I could put debug printf messages in all the fault conditions and start debugging from there. How do I get there from here?

I guess this would be similar to printing a message instead of the Blue lights of death??

27 Sep 2011

Okay so I now put the following in the program to catch the fatal error

extern "C" void HardFault_Handler() {
    pc.printf("Hard Fault!\n");
    //NVIC_SystemReset();
}

If I get the lockup it should print Hard Fault! on the USB serial. I will test and report. I also stumbled across the system reset which I have also tested that it in fact causes a system reset (same as hitting the reset button) which will restart the system. I have not actually tried resetting in the HardFault_Handler yet but seems like it might be a way to cause it to reset when it gets locked up like a watchdog reset.

My big problem is to find out why its locking up in the first place.....more to come.

27 Sep 2011

Okay so the HardFault_Handler detection doesnt get fired when the system locks up. So when it locks up it stops reading and writing to the serial port but doesnt goto the HardFault handler. What other handler would it be going to?

28 Sep 2011

I managed to get the system to not lock up (seems not to anyway) by playing with the serial interrupts and the dreaded printf's in the main loop etc. It looks like it might not have been the printf's in the main() that was actually killing it though. I originally thought it was and tried disabling IRQ around printf's in main().

What looks like it was killing me was a serial buffer that was being modified in the INT and then parsed in the main using string methods that were modifying the buffer. I created a copy of the buffer and used it for parsing the lockups seem to have stopped.

comments?

29 Sep 2011

I have added the following handlers to try and catch what state the system gets into when it locks up and so far if it does lock up it never fires any of these handlers.....where is it going?

//            Reset_Handler
//            NMI_Handler
//            HardFault_Handler
//            MemManage_Handler
//            BusFault_Handler
//            UsageFault_Handler
extern "C" void HardFault_Handler() {
    pc.printf("Hard Fault!\n");
    //NVIC_SystemReset();
}
extern "C" void NMI_Handler() {
    pc.printf("NMI Fault!\n");
    //NVIC_SystemReset();
}
extern "C" void MemManage_Handler() {
    pc.printf("MemManage Fault!\n");
    //NVIC_SystemReset();
}
extern "C" void BusFault_Handler() {
    pc.printf("BusFault Fault!\n");
    //NVIC_SystemReset();
}
extern "C" void UsageFault_Handler() {
    pc.printf("UsageFault Fault!\n");
    //NVIC_SystemReset();
}
03 Oct 2011

Are there any other handlers that I can check to see if they are being called? I dont know where the device goes when it locks up and stops responding, but its clearly not getting into the ones above.

03 Oct 2011

I have hit a similar issue before and it was caused by an interrupt handler getting stuck in an infinite loop. I added an InterruptIn object with pull-up that would printf() some text (the contents of the stack in my case) on a falling edge and then call the mbed error() macro to cause the blue lights of death to blink. If I connected a wire from this pin to ground, then it would break into the running program, display the stack contents and then blink the LEDs. When everything was running normally, I could force this interrupt and get the dump but when I hit the hang, not even this code would fire. The code wasn't hitting a fault, it was just stuck in some interrupt handling code at priority level 0 so my interrupt couldn't fire either.

You could try lowering the priority level of the interrupts (using the NVIC_SetPriority() API) that you know you are using and then try out the InterruptIn (this interrupt left at priority level 0) hack to see if it can break in when your program hangs. If printf()'s will work from your device when it is in this state, then you can dump state about your application to try and figure out what is leading to the hang.