Unable to send second message on single connection. EthernetInterface broken?

03 Aug 2012

It seems that one can only call send_all/send on a single TCPSocketConnection only once. The second time I send something it won't appear on network, endpoint won't receive it. See example below.

#include "mbed.h"
#include "EthernetInterface.h"

Serial pc(USBTX, USBRX);

int main() {
    pc.baud(115200);

    EthernetInterface eth;
    eth.init(); //Use DHCP
    eth.connect();
    
    printf("IP Address is %s\n", eth.getIPAddress());
    
    TCPSocketConnection sock;
    sock.connect("192.168.1.113", 6000);
    
    char *msg1 = "Hello world!\n";
    char *msg2 = "Hello mbed!\n";

    int bytes;
    bytes = sock.send_all(msg1, strlen(msg1));
    printf("Bytes sent: %d\n",bytes);                   // Output: Bytes sent: 13
    bytes = sock.send_all(msg2, strlen(msg2));
    printf("Bytes sent: %d\n",bytes);                   // Output: Bytes sent: 12        !!!! TCP packet not sent
    
      
    sock.close();
    
    eth.disconnect();
    
    while(1) {}
}

04 Aug 2012

Hi Sheldon,

Sheldon Cooper wrote:

It seems that one can only call send_all/send on a single TCPSocketConnection only once. The second time I send something it won't appear on network, endpoint won't receive it. See example below.

You may be doing something wrong at the server side. Here is an example server communicating with the mbed client above where the results are fine:

import socket

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(('', 6000))
s.listen(1)

while True:
    conn, addr = s.accept()
    print 'Connected by', addr
    
    while True:
        data = conn.recv(1024)
        if not data: break
        
        print repr(data)
    
    print 'closing client connection'
    conn.close()

This is the output I received:

Connected by ('192.168.0.50', 49153)
'Hello world!\n'
'Hello mbed!\n'
closing client connection

A useful tool to help diagnose problems is a network protocol analyzer such as Wireshark; this might help understand at which end and what the problem is. Hope this helps you trace the problem. Please post back with your results, especially if you continue to have problems.

Thanks, Emilio

04 Aug 2012

Hi Emilio,

Emilio Monti wrote:

I am afraid you are doing something wrong at the server side.

The mbed client above is working fine.

No, I'm not. :) I did not modify the server. It was a working server with a working mbed app. All I did was migrating the mbed from Netservices API to RTOS and the new EthernetInterface API. And it suddenly stopped working. Then I verified it with the simple program above.

Emilio Monti wrote:

In the future, before assuming that the problem is with the mbed code, could you please try to use a free network protocol analyzer like Wireshark, to understand at which end the problem is?

I did that already. I always use wireshark to debug network errors. This time I see that only the first packet gets to the server.

After sending the "Hello world!" string the server replies with an ACK packet and that's all. No packet with "Hello mbed!" received on server. I attached the wireshark output: /media/uploads/csorbag/mbed_packets.txt

I am able to verify the problem with standard tools like netcat for windows (http://joncraton.org/blog/46/netcat-for-windows). I get the same results. It only displays the first message.

04 Aug 2012

All I can assume now is that the new network library sets invalid TCP headers that some routers discards and that's why not received on server side, while some routers are happy with them.

04 Aug 2012

Sheldon Cooper wrote:

All I can assume now is that the new network library sets invalid TCP headers that some routers discards and that's why not received on server side, while some routers are happy with them.

Could you describe your network settings, trying to describe each type of switch/router that is dispatching your network traffic? It would be interesting to investigate it further, but we need to be able to reproduce the problem. On our network is working fine.

Cheers, Emilio

04 Aug 2012

Emilio Monti wrote:

Could you describe your network settings, trying to describe each type of switch/router that is dispatching your network traffic?

It's really simple. mbed is connected to a Linksys WRT45G router running DD-WRT v23 SP2 firmware. Server is also connected to the router directly either with ethernet or wifi. I tried both. Server is running Win7 64 bit. Also tried running the server on a different machine (with same router) but still not working. There is no VLAN, NAT or any other layer between server and mbed. They're in the same LAN.

Now I'm trying to either capture packets on router somehow, or connected mbed directly to server pc somehow.

04 Aug 2012

I managed to install tcpdump on the router. I attached the output of command:

tcpdump -nnvvXSs dst 192.168.1.116 or src 192.168.1.116

/media/uploads/csorbag/wrt_packets.txt

I see the same as with wireshark on the server. Mbed sends only the first packet. So its clearly not reaching the router.

What I found strange in the dump is that after the first packet is sent mbed sends a DHCP Release message every time.

From that I figured out that the problem is after sending the second message I close the socket immediately. But it precedes the ACK packet for the first message and because the first has not been acknowledeged, the second is never sent and will be discarded forever because the socket is already closed when the ACK finally arrives.

By inserting a Thread::wait(1000) before closing the socket the server finally receives the second packet as well. I don't think it's normal, is it? I think Socket.close() should wait until there are no more unsent/unackknowledged/pending packets.

Inserting the wait() fixes the simplified test application, but it does not explain why my complex application fails. In that I never close the socket on client side first. That's the responsibility of the server. So I have to investigate the problem further.

04 Aug 2012

Thank you, Sheldon, for reporting this issue.

There must be something wrong in our lwIP configuration.

I will investigate it first thing Monday morning.

Cheers, Emilio

04 Aug 2012

I have to correct myself. The problem is not closing the socket immediately, but calling EthernetInterface.disconnect() immediately after that. But I still think that Socket.close() should not return until pending packets are sent.

04 Aug 2012

Another, but related problem: Calling EthernetInterface.disconnect() immediately after Socket.close() will cause mbed not to send FIN packets, and server thinks that connections are still alive. Maybe the cause of the problem is the same, as it will not send FIN packets until previous packets are ACKed.

04 Aug 2012

Additional information: The same problem occurs when there are no Socket.close() or EthernetInterface.disconnect() calls in the application, but the app sends messages really fast. If I send a message before the previous packet has been ACKed the second message is discarded.

I was only able to fix my application by inserting a Thread::wait(100) after each Socket.send_all().

So either send_all() should wait until ACKed or the second call to send_all should block until previous packets are ACKed. I think the second one would give better application performance.

04 Aug 2012

Sheldon, may I ask you a favour, because you have this "lucky" environment presenting this issue, could you try to change line 54 of "EthernetInterface.cpp", replace this:

netif_add(&lpcNetif, ipaddr, netmask, gw, NULL, lpc_enetif_init, ethernet_input);

With this:

netif_add(&lpcNetif, ipaddr, netmask, gw, NULL, lpc_enetif_init, tcp_input);

I have still to investigate the consequences. I thought that it would be a better initialization, but before changing it, I was looking for a test case proving that it is the right thing to do.

Cheers, Emilio

04 Aug 2012

Emilio Monti wrote:

I have still to investigate the consequences

OK, digging further in lwip documentation I found this:

The final parameter input is the function that a driver will call when it has received a new packet. This parameter typically takes one of the following values:

  • ethernet_input: If you are not using a threaded environment and the driver should use ARP (such as for an Ethernet device), the driver will call this function which permits ARP packets to be handled, as well as IP packets. ip_input: If you are not using a threaded environment and the interface is not an Ethernet device, the driver will directly call the IP stack.
  • tcpip_ethinput: If you are using the tcpip application thread (see lwIP and threads), the driver uses ARP, and has defined the ETHARP_TCPIP_ETHINPUT lwIP option. This function is used for drivers that passes all IP and ARP packets to the input function.
  • tcpip_input: If you are using the tcpip application thread and have defined ETHARP_TCPIP_INPUT option. This function is used for drivers that pass only IP packets to the input function. (The driver probably separates out ARP packets and passes these directly to the ARP module). (Someone please recheck this: in lwip 1.4.1 there is no tcpip_ethinput() ; tcp_input() handles ARP packets as well).

I am now sure we will have to change this. It would be interesting to know if it fixes your issue.

Cheers, Emilio

04 Aug 2012

Emilio Monti wrote:

Sheldon, may I ask you a favour, because you have this "lucky" environment presenting this issue, could you try to change line 54 of "EthernetInterface.cpp", replace this:

netif_add(&lpcNetif, ipaddr, netmask, gw, NULL, lpc_enetif_init, ethernet_input);

With this:

netif_add(&lpcNetif, ipaddr, netmask, gw, NULL, lpc_enetif_init, tcp_input);

I did that. (I guess "tcp_input" was a typo, I changed it to "tcpip_input"). I don't see any change.

Here are the test cases and results I tried:

Test case #1

Pseudo code:

  1. socket.send_all(msg1)
  2. socket.send_all(msg2)
  3. socket.close()
  4. ethernetinterface.disconnect()

Result:

  • server receives msg1
  • connection on server remains alive forever (no FIN packet was captured)

Test case #2

Pseudo code:

  1. socket.send_all(msg1)
  2. socket.send_all(msg2)
  3. wait(500)
  4. socket.close()
  5. ethernetinterface.disconnect()

Result:

  • server receives msg1 and msg2
  • connection closed normally on server (FIN received)
  • everything works as expected

Test case #3

Pseudo code:

  1. socket.send_all(msg1)
  2. socket.send_all(msg2)
  3. wait(500)
  4. socket.send_all(msg3)
  5. socket.send_all(msg4)
  6. socket.close()
  7. ethernetinterface.disconnect()

Result:

  • server receives msg1, msg2 and msg3
  • msg4 lost
  • connection on server remains alive forever (no FIN packet was captured)

Test case #4

Pseudo code:

  1. socket.send_all(msg1)
  2. socket.send_all(msg2)
  3. socket.send_all(msg3)
  4. socket.send_all(msg4)
  5. socket.close()
  6. wait(2500)
  7. ethernetinterface.disconnect()

Result:

  • server receives msg1
  • msg2, msg3,msg4 lost
  • connection on server remains alive forever (no FIN packet was captured)

Test case #5

Pseudo code:

  1. socket.send_all(msg1)
  2. socket.send_all(msg2)
  3. socket.send_all(msg3)
  4. socket.send_all(msg4)
  5. wait(2500)
  6. socket.close()
  7. ethernetinterface.disconnect()

Result:

  • server receives msg1
  • msg2, msg3,msg4 lost
  • connection on server remains alive forever (no FIN packet was captured)

Test case #6

Pseudo code:

  1. socket.send_all(msg1)
  2. wait(500)
  3. socket.send_all(msg2)
  4. wait(500)
  5. socket.send_all(msg3)
  6. wait(500)
  7. socket.send_all(msg4)
  8. wait(500)
  9. socket.close()
  10. ethernetinterface.disconnect()

Result:

  • server receives all messages
  • connection closed normally on server (FIN received)
  • everything works as expected
05 Aug 2012

Thank you, Sheldon, I'll investigate it tomorrow morning.

Cheers, Emilio

10 Aug 2012

Hi Emilio,

Do you have update on this issue? I've seen new commits to EthernetInterface. Are any of those related?

10 Aug 2012

Hi Sheldon,

Sheldon Cooper wrote:

Do you have update on this issue? I've seen new commits to EthernetInterface. Are any of those related?

Not much, we tested with many different routers, but we did not manage to reproduce this issue.

We wanted to buy the specific router model you reported, but it appears to be too old and we could not find any place in the UK selling it.

Not being able to reproduce the issue, it is difficult to tackle it.

Cheers, Emilio

10 Aug 2012

Thanks for the effort. I'll be thinking of how you could reproduce it without special HW.

22 Nov 2012

I have been kicking exactly this same problem about for the last three weeks. My hardware is a standard BT Home Hub and a Netgear GS116 port switch if this helps.

21 Dec 2012

hi sheldon, i have the same problem, but in my mbed i have a server running.

I try this

1) send(msg1); 2) receive(msg2); 3) send(msg3);

but my client only receive msg1 and finish, and my server never receive msg2

Do you think the wait_ms(500) will help?

Have you found any solution for this issue?

Greetings

08 Jul 2013

Did anyone have any solutions for this? I have been experiencing the same problem, if call send more than twice one after the other, the other end only receives 1 of the messages, but if i put a delay of at least 60ms between the sends, all the messages arrive fine.