11 years, 1 month ago.

EthernetInterface does not appear to be robust

Hi

I have an application where I continually use socket connections; to scrape a webpage and to update a server.

I put the scraping in one thread and the updating in another thread. After a few hours the mbed would lock up.

I have reproduced the problem using the code below in a single thread (the main one). After just under 4 hours of activity a lock up occurred - I suppose it is random so it could lock up in more or less time. I will run the code again to see how much it varies.

The point is that the EthernetInterface is unusable like this. Is it possible that this could be reproduced (and then debugged) by the mbed team?

Thanks Daniel

PS This is based on the sample code as a starting point. I have updated all the libraries to the latest versions.

Import programTCPSocket_HelloWorldTest

Test to demonstrate that TCP sockets lock up

Question relating to:

After a reset, the code ran for just over 9 hours before lock up. I've restarted it again this morning.

posted by Daniel Peter 27 Feb 2013

After another reset, ran for just over 6 hours. I've restarted it again.

posted by Daniel Peter 27 Feb 2013

4 Answers

11 years, 1 month ago.

Hi Daniel, for the networking libraries we are not yet in the robustness testing phase. There is still a small queue of bug fixes to be applied to the Ethernet driver from NXP and we have still to add support for certain features.

For example, in the last couple of days (apart releasing support for the new Freedom board from Freescale) I am working on adding IGMP support for multicast join requests where I did hit a lwIP bug: https://savannah.nongnu.org/bugs/?38165

Anyway, it is coming along and slowly improving. If you want to help, all our source code is open and released under an Apache v2 license. Pull requests are welcome: https://github.com/mbedmicro/mbed

Cheers, Emilio

11 years, 1 month ago.

Hi Emilio

Thanks for the response. I'm keen to help make the stack robust. I think this would really benefit people like myself who want to develop applications with long uptimes, rather than demos, and value availability over throughput.

Is there a way to find out what the known issues/bugs are? For example, the queue of fixes to the NXP driver. Last time I helped debug an LPC driver (not the NXP one), it suspiciously locked up after a few hours as well.

Also, how are you capturing test cases? I'd like to write some for the robustness testing phase, so networking can become as trusted as other parts of the SDK.

Regards Daniel

> I'm keen to help make the stack robust.

Thanks Daniel.

> Is there a way to find out what the known issues/bugs are?

At the moment, the closest thing we have to a public issue tracker is this forum: http://mbed.org/forum/bugs-suggestions

That is clearly a sub optimal solution, the main problems with it are:

  • There is no way to get a clear list of open issues
  • There is no way to filter the issues by component (rtos, tcp/ip, USB device, DSP, NXP drivers, Freescale drivers, etc). There are tags, but a tag search returns results from the whole site, not only from the "Bugs & Suggestions" forum.

Of course, we do have an internal issue tracker where we try to mirror the forum posts in actual tickets, but we have no way to provide a public view of it.

The mbed web team has been busy adding a lot of other features and the addition of a proper issue tracker got always postponed.

Now that the mbed SDK is completely open source having a public issue tracker is growing of importance. I hope the mbed web team will manage to get its development scheduled in.

Perhaps, as a temporary solution, we could actually use the mbed github issue tracker: https://github.com/mbedmicro/mbed/issues?state=open

> For example, the queue of fixes to the NXP driver.
> Last time I helped debug an LPC driver (not the NXP one),
> it suspiciously locked up after a few hours as well.

I summarized the sources of the code used in our TCP/IP stack in this document: https://github.com/mbedmicro/mbed/blob/master/libraries/doc/net/source.txt

The NXP LPC repository is under maintenance (no web view), but it is still possible to clone it: http://sw.lpcware.com/ (git clone http://git.lpcware.com/lwip_lpc.git)

There were a lot of changes affecting example programs, but at the end the fix related to the Ethernet driver was only one: https://github.com/mbedmicro/mbed/commit/a80058dc5fac3c1d7569209cc0d441eef4bdebc3#L2R401

I summarized the architecture of the main 3 TCP/IP threads here: https://github.com/mbedmicro/mbed/blob/master/libraries/doc/net/doc.txt

Cheers, Emilio

posted by Emilio Monti 03 Mar 2013
11 years, 1 month ago.

Daniel, if you want some more help, let me know. I could try building your sample with the offline compiler and send you some binaries and/or offline build able sources. If the problem reproduced with these binaries, we could connect GDB, the GNU Debugger, and take a peek at what is going on when the program stops running.

-Adam

Hi Adam

It would be good to try something offline with the debugger. I did think about this, read a little bit, and saw a few weeks ago that you were working on getting the new EthernetInterface sample code to compile offline.

I'll revisit my test code, check that Emilio's latest changes are included, then send you a link.

Thanks Daniel

posted by Daniel Peter 28 Mar 2013
11 years ago.

ping <ip> -s 4800 Locks my mbed in just 3 packets.. Is there any rules/filters that can be put into the Ethernet MAC to drop packets over a certain length? Also holding down the reload button in my browser with the ip of the mbed loaded locks my device

Maybe for reliability reasons it would be best to use an external chip to handle all the networking like the WIZ5200 or something similar.. For my remote real-time ac line monitor, lock-ups are not an option...

just my 2c

did some testing with the W5200 and it lockups within 4-6 hrs without doing a thing.. Have to reset it... When it dose work it also locks up if I hold F5 reloading the page even if it's not serving a thing.. large ping packets lock it up aswell... Hmm... Wonder where the part of the code is that responds to ICMP requests... I should start by deleting that or limiting the payload

posted by Hardcore Developer 04 Apr 2013