Discussion:
Help with TCP Server.
(too old to reply)
Michael Ober
2009-12-12 22:16:20 UTC
Permalink
The code below appears to work, but it eventually freezes and I have to
reboot the server. Killing and restarting the program itself doesn't work
as the listener socket is still bound. I have put debugging statements in
to verify that the ThreadPool threads are being released and the most thread
pool threads that this code used was 3, even under heavy load. The problem
appears to be when there are no clients for an extended period and then a
client attempts to connect. At that point, the application freezes. I have
left out the BankInfo object which contains the bank name, address, phone
number, and routing number as well as Library Routines such as WriteLog and
SMTPMail.<method>. The library routines have been in use in other
applications for several years now and are thoroughly debugged.

The server code itself is a MustInherit (virtual) class that the inheriter
must provide the business logic to handle the individual commands from the
client. The server and its associated client class handles the buffering,
blocking, and deblocking of inbound and outbound communications. This is a
Windows Forms application and the lbClients item is a standard ListBox.

''''''''' Start of code.

Option Compare Text
Option Strict On
Option Explicit On
Option Infer Off

Imports System.Net.Sockets
Imports System.Net
Imports System.Threading
Imports System.Text.ASCIIEncoding
Imports System.IO

Public Module Data
Public RoutingNumbers As New BankTable
Public UpdatedRoutingNumbers As New BankTable
End Module

Public Class BankTable
Inherits Dictionary(Of String, BankInfo)

End Class

Public Class frmBankRoutingNumbers
Public WithEvents Clients As ClientConnections

Private Sub frmBankRoutingNumbers_FormClosing(ByVal sender As Object,
ByVal e As System.Windows.Forms.FormClosingEventArgs) Handles Me.FormClosing
On Error Resume Next
Clients.Dispose()
End Sub

Private Sub frmBankRoutingNumbers_Load(ByVal sender As System.Object,
ByVal e As System.EventArgs) Handles MyBase.Load
Me.Text = AppName()
Me.Show()

Dim bi As BankInfo
Dim lastpercent As String = ""
Dim currentpercent As String = ""

''' Initialize server data structures clipped out.

WriteLog("Starting Listener")
Me.Text = AppName("Starting Listener")
Me.lbClients.Items.Clear()
Clients = New ClientConnections()
Me.Text = AppName("Listening")
End Sub

#Region "IP Server Connection Display Interface"
Delegate Sub AddClientCallBack(ByVal connection As
IPServer.ConnectionInformation)
Delegate Sub RemoveClientCallBack(ByVal connection As
IPServer.ConnectionInformation)
Private Sub AddClient(ByVal connection As
IPServer.ConnectionInformation)
On Error Resume Next
Me.lbClients.Items.Add(connection.ToString())
Me.Text = AppName(Me.lbClients.Items.Count)
End Sub
Private Sub RemoveClient(ByVal connection As
IPServer.ConnectionInformation)
On Error Resume Next
Me.lbClients.Items.Remove(connection.ToString())
Me.Text = AppName(Me.lbClients.Items.Count)
End Sub

Private Sub Clients_NewConnection(ByRef connection As
IPServer.ConnectionInformation, ByVal Count As Integer) Handles
Clients.NewConnection
Dim UpdateClientList As New AddClientCallBack(AddressOf AddClient)
Me.Invoke(UpdateClientList, connection)
End Sub
Private Sub Clients_ConnectionClosed(ByRef connection As
IPServer.ConnectionInformation, ByVal Count As Integer) Handles
Clients.ConnectionClosed
Dim UpdateClientList As New RemoveClientCallBack(AddressOf
RemoveClient)
Me.Invoke(UpdateClientList, connection)
End Sub
#End Region
End Class

Public Class ClientConnections
Inherits IPServer.IPSocketHandler

Private so As New ReaderWriterLockSlim

Public Sub New()
MyBase.New(OSInterface.iniWrapper.ReadInt("ABADatabase", "Port",
"wakefield.ini"), _
New TimeSpan(0,
OSInterface.iniWrapper.ReadInt("ABADatabase", "ClientTTL", "wakefield.ini",
15), 0))
End Sub

Public Overrides Function ProcessMessage(ByVal message As String) As
String
Try
Dim cmd As String = ""
Dim i As Integer = InStr(message, ":")
If i > 0 Then
cmd = Left$(message, i - 1)
message = Mid$(message, i + 1)
End If

Select Case
CType([Enum].Parse(GetType(BankInfo.ClientServerCommandStrings), cmd),
BankInfo.ClientServerCommandStrings)
Case BankInfo.ClientServerCommandStrings.GetRoutingNumber
Try
so.EnterReadLock()
Return RoutingNumbers(message).ToString()

Catch ex As Exception
Dim bi As New BankInfo(message)
bi.ExceptionMessage = "Unknown Routing Number"
Return bi.ToString()
End Try

Case BankInfo.ClientServerCommandStrings.UpdateBankInfo
Try
Dim bi As New BankInfo(New csvLine(message,
BankInfo.CSVHeader))
bi.HasBeenUpdated = True ' Just in case this is a
new bank

so.EnterWriteLock()
bi = RoutingNumbers.Add(bi)
If bi.HasBeenUpdated Then
UpdatedRoutingNumbers.Add(bi)

Using fOut As New StreamWriter(ABAUpdates, False,
System.Text.Encoding.ASCII)
fOut.WriteLine(BankInfo.strHeader)
For Each bi In From ChangedBankInfo As BankInfo
In UpdatedRoutingNumbers.Values Order By ChangedBankInfo.BankName
WriteLog(bi.ToString())
fOut.WriteLine(bi.ToString())
Next
End Using
Return bi.ToString()

Catch ex As Exception
WriteLog(ex)
Return ""
End Try

Case Else
Dim bi As New BankInfo()
bi.ExceptionMessage = "Unknown Request"
Return bi.ToString()
End Select

Catch ex As Exception
WriteLog(ex)
SMTPMail.SendMessage("mis", AppName() & " - Client Handler
Crash", ex.ToString())
Return ""

Finally
Do While so.IsReadLockHeld
so.ExitReadLock()
Loop
Do While so.IsWriteLockHeld
so.ExitWriteLock()
Loop
End Try
End Function

End Class

Namespace IPServer
Module ClientCollection
Public Clients As New ClientList
End Module
Public Class ClientList
Inherits SynchronizedCollection(Of ConnectionInformation)

Public Overrides Function ToString() As String
Dim aCI() As ConnectionInformation = MyBase.ToArray()
Dim s As String = "Connections: " & aCI.Count.ToString("#,##0")
Try
For Each ci As ConnectionInformation In aCI
s &= vbNewLine & ci.ToString()
Next
Catch ex As Exception
End Try
Return s
End Function
End Class

Public Class ConnectionInformation
Implements IDisposable

Public MessageIn As String = ""

' Who's talking to me?
Private _ClientName As String = ""
Private _ClientPort As Integer = 0

' Private structures and storage for sending and receiving data
Private _sock As Socket = Nothing
Private Const _bufSize As Integer = 1500
Private _bufIn(_bufSize) As Byte
Private _ReceiveResults As IAsyncResult = Nothing
Private _SendResults As IAsyncResult = Nothing
Private _bufOut() As Byte

' TimeOut support; Async sockets don't directly support timeouts
Private _InactivityTimer As Timer = Nothing
Private _Activity As Boolean = True ' All socket activity must
set this value to True

Public Sub New(ByVal sock As Socket, ByVal SocketTimeout As
TimeSpan)
Me.New(SocketTimeout)
Me.sock = sock
End Sub
Public Sub New(ByVal SocketTimeout As TimeSpan)
If SocketTimeout <> Nothing Then _InactivityTimer = New
Timer(AddressOf TimeOutHandler, Me, SocketTimeout, SocketTimeout)
Clients.Add(Me)
End Sub

Private Sub TimeOutHandler(ByVal objConnectionInformation As Object)
Dim ci As ConnectionInformation =
CType(objConnectionInformation, ConnectionInformation)
If ci._Activity Then
ci._Activity = False
Else
WriteLog(ci.ToString() & " Socket Timeout")
ci.Close()
End If
End Sub

Public Function BeginReceive(ByVal callback As AsyncCallback) As
IAsyncResult
_ReceiveResults = _sock.BeginReceive(_bufIn, 0, _bufSize,
SocketFlags.None, callback, Me)
_Activity = True
Return _ReceiveResults
End Function
Public Function EndReceive(ByVal asyncResult As IAsyncResult) As
Integer
Dim ci As ConnectionInformation = CType(asyncResult.AsyncState,
ConnectionInformation)
Try
If ci._sock Is Nothing Then Return 0

ci._Activity = True
Dim bytesReceived As Integer =
ci._sock.EndReceive(asyncResult)
ci.MessageIn &= ASCII.GetString(ci._bufIn, 0, bytesReceived)
Return bytesReceived

Catch exSock As SocketException
Dim sockErr As SocketError = exSock.SocketErrorCode
Dim msg As String = [Enum].GetName(GetType(SocketError),
sockErr)
WriteLog("Socket Exception: " & msg, exSock)
Return 0
Catch ex As Exception
WriteLog(ex)
Return 0
Finally
ci._Activity = True
End Try

Return 0 ' Forces the socket to close elsewhere in code
End Function

Public Sub Send(ByVal MessageOut As String)
Try
_Activity = True
Do Until _SendResults Is Nothing OrElse _bufOut.Length = 0
_SendResults.AsyncWaitHandle.WaitOne()
Loop

_bufOut = System.Text.Encoding.ASCII.GetBytes(MessageOut)
_SendResults = _sock.BeginSend(_bufOut, 0, _bufOut.Length,
SocketFlags.None, AddressOf OnSend, Me)

Catch ex As Exception
WriteLog(ex)
Finally
_Activity = True
End Try
End Sub

Private Sub OnSend(ByVal ar As IAsyncResult)
Dim ci As ConnectionInformation = CType(ar.AsyncState,
ConnectionInformation)
ci._Activity = True
Try
Dim bytesSent As Integer = ci._sock.EndSend(ar)

' Did we send the entire buffer?
If bytesSent >= _bufOut.Length Then
Array.Resize(_bufOut, 0)

Else
' No; reset the buffer to contain only the bytes needed
to be sent
Dim tBuf(_bufOut.Length - bytesSent - 1) As Byte
Array.Copy(_bufOut, bytesSent, tBuf, 0, tBuf.Length)
_bufOut = tBuf
' Send them
_SendResults = ci._sock.BeginSend(_bufOut, 0,
_bufOut.Length, SocketFlags.None, AddressOf OnSend, ci)
End If
ci._Activity = True
Catch ex As Exception
WriteLog(ci.ToString() & vbNewLine & ex.Message, True)
Finally
ci._Activity = True
End Try
End Sub

Public WriteOnly Property sock() As Socket
Set(ByVal value As Socket)
_sock = value

If _sock Is Nothing Then
_ClientName = ""
_ClientPort = 0
_Activity = False
Else
Dim ClientEndPoint As IPEndPoint =
CType(_sock.RemoteEndPoint, IPEndPoint)
_ClientName =
Dns.GetHostEntry(ClientEndPoint.Address.ToString()).HostName
_ClientPort = ClientEndPoint.Port
_Activity = True
End If
End Set
End Property

Public ReadOnly Property ClientName() As String
Get
Return _ClientName
End Get
End Property
Public ReadOnly Property ClientPort() As Integer
Get
Return _ClientPort
End Get
End Property

Public Shadows Function ToString() As String
Return _ClientName & ":" & _ClientPort.ToString()
End Function

Public ReadOnly Property ClientCount() As Integer
Get
Return Clients.Count
End Get
End Property

Public Function Close() As Integer
On Error Resume Next
_InactivityTimer.Dispose()
WriteLog("Closing Client Connection: " & Me.ToString())
If _sock IsNot Nothing Then
_sock.Shutdown(SocketShutdown.Both)
_sock.Close()
_sock = Nothing
End If
Clients.Remove(Me)
Return Clients.Count
End Function

#Region " IDisposable Support "
' This code added by Visual Basic to correctly implement the
disposable pattern.
Protected Overridable Sub Dispose(ByVal disposing As Boolean)
Close()
End Sub

Public Sub Dispose() Implements IDisposable.Dispose
' Do not change this code. Put cleanup code in Dispose(ByVal
disposing As Boolean) above.
Dispose(True)
GC.SuppressFinalize(Me)
End Sub
#End Region

End Class

Public MustInherit Class IPSocketHandler
Implements IDisposable

Public Event NewConnection(ByRef connection As
ConnectionInformation, ByVal Count As Integer)
Public Event ConnectionClosed(ByRef connection As
ConnectionInformation, ByVal Count As Integer)
Public MustOverride Function ProcessMessage(ByVal message As String)
As String

Private _tcpServer As Socket = Nothing
Private ReadOnly _CommandTermination As String = BEL
Protected ReadOnly _SocketTimeout As TimeSpan

Public Sub New(ByVal Port As Integer)
Me.New(Port, BEL, Nothing)
End Sub
Public Sub New(ByVal Port As Integer, ByVal SocketTimeout As
TimeSpan)
Me.New(Port, BEL, SocketTimeout)
End Sub
Public Sub New(ByVal Port As Integer, ByVal CommandTermination As
String)
Me.New(Port, CommandTermination, Nothing)
End Sub
Public Sub New(ByVal Port As Integer, ByVal CommandTermination As
String, ByVal SocketTimeout As TimeSpan)
' save the termination and TTL before doing anything else to
avoid threading issues
_CommandTermination = CommandTermination
_SocketTimeout = SocketTimeout

Try
Dim myAddr As IPAddress =
Dns.GetHostEntry(My.Computer.Name).AddressList(0)
Dim sEndPoint As String = My.Computer.Name & "(" &
myAddr.ToString() & "):" & Port.ToString()
WriteLog("Configure Listener on " & sEndPoint)

_tcpServer = New Socket(AddressFamily.InterNetwork,
SocketType.Stream, ProtocolType.Tcp)
Dim LocalEP As IPEndPoint = New IPEndPoint(myAddr, Port)
_tcpServer.Bind(LocalEP)
_tcpServer.Listen(5) ' 5 is the standard backlog
value

_tcpServer.BeginAccept(AddressOf OnAccept, Nothing)
WriteLog("Listening for connections on " & sEndPoint)
Catch ex As Exception
WriteLog(ex)
SMTPMail.SendMessage("mis", AppName() & ": Unable to create
Listener", ex.Message())
Throw ex
End Try
End Sub

'Handle connection requests
Private Sub OnAccept(ByVal ar As System.IAsyncResult)
Dim ci As New ConnectionInformation(_SocketTimeout)
Try
ci.sock = _tcpServer.EndAccept(ar)

' Start listening for the next connection
_tcpServer.BeginAccept(AddressOf OnAccept, Nothing)

' Wait for data
ci.BeginReceive(AddressOf OnReceive)
RaiseEvent NewConnection(ci, ci.ClientCount)

Catch ex As Exception
WriteLog(ci.ToString(), ex)
RaiseEvent ConnectionClosed(ci, ci.Close())
Finally
WriteLog(Clients.ToString(), True)
End Try
End Sub

Private Sub OnReceive(ByVal ar As IAsyncResult)
Dim ci As ConnectionInformation = CType(ar.AsyncState,
ConnectionInformation)
Try
Dim bytesReceived As Integer = ci.EndReceive(ar)

Select Case bytesReceived
Case 0
RaiseEvent ConnectionClosed(ci, ci.Close())
WriteLog(Clients.ToString())

Case Else
Dim i As Integer = InStr(ci.MessageIn,
_CommandTermination)
Do While i > 0
' Process msg
Dim msg As String = Left$(ci.MessageIn, i - 1)
WriteLog(ci.ToString() & " => " & msg)
Dim msgOut As String = ProcessMessage(msg)
If msgOut <> "" Then
ci.Send(msgOut & _CommandTermination)
WriteLog(ci.ToString() & " <= " & msgOut)
End If

ci.MessageIn = Mid$(ci.MessageIn, i +
_CommandTermination.Length)
i = InStr(ci.MessageIn, _CommandTermination)
Loop

' Finally, read the next chunk of data
ci.BeginReceive(AddressOf OnReceive)
End Select

Catch ex As Exception
WriteLog(ex)
RaiseEvent ConnectionClosed(ci, ci.Close())
WriteLog(Clients.ToString())
End Try
End Sub

Public Sub Close()
On Error Resume Next
WriteLog("Server Shutdown Starting")

' Close the listener
WriteLog("Listener being closed")
_tcpServer.Close()
_tcpServer = Nothing

' Stop the clients; Don't use "for each" as closing a connection
removes it from clients
' Closing a client connection has the side effect of removing
the connection from the clients collection
WriteLog("Closing Clients: " & Clients.ToString())
Do While Clients.Count > 0
Clients(0).Close()
Loop
End Sub

#Region " IDisposable Support "
' This code added by Visual Basic to correctly implement the
disposable pattern.

Protected Overridable Sub Dispose(ByVal disposing As Boolean)
Close()
End Sub

Public Sub Dispose() Implements IDisposable.Dispose
' Do not change this code. Put cleanup code in Dispose(ByVal
disposing As Boolean) above.
Dispose(True)
GC.SuppressFinalize(Me)
End Sub
#End Region

End Class
End Namespace

'''' I added the following, but these events don't appear to getting
executed.

Namespace My

' The following events are available for MyApplication:
'
' Startup: Raised when the application starts, before the startup form
is created.
' Shutdown: Raised after all application forms are closed. This event
is not raised if the application terminates abnormally.
' UnhandledException: Raised if the application encounters an unhandled
exception.
' StartupNextInstance: Raised when launching a single-instance
application and the application is already active.
' NetworkAvailabilityChanged: Raised when the network connection is
connected or disconnected.
Partial Friend Class MyApplication

Private Sub MyApplication_Shutdown(ByVal sender As Object, ByVal e
As System.EventArgs) Handles Me.Shutdown
On Error Resume Next
frmBankRoutingNumbers.Clients.Dispose()
End Sub

Private Sub MyApplication_UnhandledException(ByVal sender As Object,
ByVal e As
Microsoft.VisualBasic.ApplicationServices.UnhandledExceptionEventArgs)
Handles Me.UnhandledException
On Error Resume Next
SMTPMail.SendMessage("mis", My.Application.Info.AssemblyName &
" - CRASH", e.ToString())
frmBankRoutingNumbers.Clients.Dispose()
logs.WriteLog(e.ToString())
End Sub
End Class

End Namespace
Armin Zingler
2009-12-13 00:35:17 UTC
Permalink
The code below appears to work, but it eventually freezes [...]
I am not able to analyze the whole code, therefore I give only a general
answer. If an application freezes for an unknown reason, I use to press Ctrl+Break
and look at the callstacks of all threads to see what it's currently doing.
Have you tried it already? Dead locks can be one reason.
--
Armin
Peter Duniho
2009-12-13 01:07:48 UTC
Permalink
Post by Michael Ober
The code below appears to work, but it eventually freezes and I have to
reboot the server. Killing and restarting the program itself doesn't
work as the listener socket is still bound.
For what it's worth, if you wait for the TCP CLOSE_WAIT to finish,
you'll be able to restart the server. While it's possible to do so, you
don't want to bypass the CLOSE_WAIT, because there is theoretically the
possibility of network messages still in transit that, if received after
you've restarted the server, could corrupt the state of your server.
Post by Michael Ober
I have put debugging
statements in to verify that the ThreadPool threads are being released
and the most thread pool threads that this code used was 3, even under
heavy load. The problem appears to be when there are no clients for an
extended period and then a client attempts to connect. At that point,
the application freezes.
When you break in the debugger, where are all the threads? What are
they doing?
Post by Michael Ober
I have left out the BankInfo object which
contains the bank name, address, phone number, and routing number as
well as Library Routines such as WriteLog and SMTPMail.<method>. The
library routines have been in use in other applications for several
years now and are thoroughly debugged.
Unfortunately, your code example is far from concise, nor is it complete
(in particular, lacking the entire other half of the network
connection). Without any way for anyone else to run your program and
see it fail, there's no way to know for sure what's wrong.

That said, here are some highlights of what appear to me to be problem
areas. Some may even be related to the problem you're seeing:

-- Use of "On Error Resume Next". Don't do this. You are
basically just ignoring exceptions. Bad. Even in VB.NET, it's bad.

-- Use of ReaderWriterLockSlim.IsReadLockHeld and .IsWriteLockHeld.
It's bad enough that, contrary to the documentation's instructions,
you are using the property to control program flow. But to do it in a loop?

-- Unsynchronized access to the _ReceiveResults and _SendResults
members. You should at a minimum be using Thread.VolatileRead() and
.VolatileWrite(). Even better would be to just use SyncLock or similar.

-- That said, the mere presence of _ReceiveResults and _SendResults
is a red flag to me. It's possible that with a purely transactional
connection, this could work, and maybe that's what you're dealing with.
But it's bad form in any case, and if you have _any_ possibility of
overlapping send or receive operations, you've got big trouble. This
design is a maintenance problem, and potentially a serious one.

-- The code potentially raises a ConnectionClosed event for a
connection for which a NewConnection event was never raised. If nothing
else, this is IMHO poor design. If you fix the blank-check error
handling, it will likely be a real problem.

There may be other problems I didn't notice. Between the fact that I
hardly ever use VB.NET and the fact that the code example is not a
concise-but-complete one, I admit there may be things I overlooked.
But, at the very least, there are some things in the code that I
consider to be serious problems, and it's possible that fixing those
aspects of the code will make the problem go away (I'm particularly
concerned about the lack of synchronized access to fields that are used
to coordinate between threads and operations).

Pete
Michael Ober
2009-12-13 03:07:14 UTC
Permalink
Post by Michael Ober
The code below appears to work, but it eventually freezes and I have to
reboot the server. Killing and restarting the program itself doesn't
work as the listener socket is still bound.
For what it's worth, if you wait for the TCP CLOSE_WAIT to finish, you'll
be able to restart the server. While it's possible to do so, you don't
want to bypass the CLOSE_WAIT, because there is theoretically the
possibility of network messages still in transit that, if received after
you've restarted the server, could corrupt the state of your server.
The IPServer class is actually used in two different server applications.
In this particular instance, which runs on Windows Server 2003 R2, I can
indeed wait for the TCP CLOSE_WAIT - it's faster to reboot. In the other
application, which runs on Windows XP, it appears that XP never closes the
listening socket. I suspect the XP behavior has more to do with another
server that must run on that particular box for which I have no source code
than it does with XP itself.
Post by Michael Ober
I have put debugging statements in to verify that the ThreadPool threads
are being released and the most thread pool threads that this code used
was 3, even under heavy load. The problem appears to be when there are
no clients for an extended period and then a client attempts to connect.
At that point, the application freezes.
When you break in the debugger, where are all the threads? What are they
doing?
Monday I'll modify our network to use my workstation as the server for this
particular application and I'll start a VS 2008 session. On Tuesday I'll be
able to see what happens. In the meantime I'm hoping for feedback on the
code to try to correct the problems it has. What's really strange is that
this application had been running fine for several months until I had to
rebuild the server after a disk crash.
Post by Michael Ober
I have left out the BankInfo object which contains the bank name,
address, phone number, and routing number as well as Library Routines
such as WriteLog and SMTPMail.<method>. The library routines have been
in use in other applications for several years now and are thoroughly
debugged.
Unfortunately, your code example is far from concise, nor is it complete
(in particular, lacking the entire other half of the network connection).
Without any way for anyone else to run your program and see it fail,
there's no way to know for sure what's wrong.
The other end is a currently a standard TCP client. It sends a request and
waits for the response. The client applications do have to be coded so
that if the server timeout expires they detect the dropped connection and
reconnect if they need to send again.
That said, here are some highlights of what appear to me to be problem
-- Use of "On Error Resume Next". Don't do this. You are basically
just ignoring exceptions. Bad. Even in VB.NET, it's bad.
I only use this where I don't care about the error but do want each
instruction to attempt execution. Is there a clean way of handling this
with try ... catch ... end try? In this particular case, removing the On
Error Resume Next statement was easily handled by checking for Nothing
(null) and collection.Contains() before doing the work.
-- Use of ReaderWriterLockSlim.IsReadLockHeld and .IsWriteLockHeld.
It's bad enough that, contrary to the documentation's instructions, you
are using the property to control program flow. But to do it in a loop?
After static analisys of the code, a finally clause on each of the two try
blocks was sufficient to handle this. Since OnReceive would be called in
different threads and not be reentrant, the loops can safely be removed.
The purpose of this was to ensure I didn't have a ReaderWriterLockSlim
causing a connection to stall.
-- Unsynchronized access to the _ReceiveResults and _SendResults
members. You should at a minimum be using Thread.VolatileRead() and
.VolatileWrite(). Even better would be to just use SyncLock or similar.
-- That said, the mere presence of _ReceiveResults and _SendResults is
a red flag to me. It's possible that with a purely transactional
connection, this could work, and maybe that's what you're dealing with.
But it's bad form in any case, and if you have _any_ possibility of
overlapping send or receive operations, you've got big trouble. This
design is a maintenance problem, and potentially a serious one.
Do you have a suggestion for a better design? I'm trying to keep this as
asynchronous as possible. The design concept was that if the routine
MustOverride routine has to do a lot of work and send back large numbers of
results, it can call the underlying send method multiple times before
returning, although looking at the code, I don't provide that particular
routine the ability to access the Send method.

Like I said in my original post, this program seems to lock up when there
are long periods (measured in hours) of no clients. If access to these two
variables was a problem, I would expect problems to show up during heavy
usage, not during idle periods. This server class is also used in another
application running on an XP SP3 system that is under significantly heavier
load with more multithreading going on than this particular application.
-- The code potentially raises a ConnectionClosed event for a
connection for which a NewConnection event was never raised. If nothing
else, this is IMHO poor design. If you fix the blank-check error
handling, it will likely be a real problem.
In my logs, I have never seen the exception handler for the OnAccept(ar as
IAsyncResults) subroutine called. I understand your comment, however. The
RaiseEvent ConnectionClosed event was actually added as part of the
troubleshooting so removing it isn't an issue.
There may be other problems I didn't notice. Between the fact that I
hardly ever use VB.NET and the fact that the code example is not a
concise-but-complete one, I admit there may be things I overlooked. But,
at the very least, there are some things in the code that I consider to be
serious problems, and it's possible that fixing those aspects of the code
will make the problem go away (I'm particularly concerned about the lack
of synchronized access to fields that are used to coordinate between
threads and operations).
Pete
Pete, I definitely thank you for the feedback.

Mike.
Tom Shelton
2009-12-13 03:20:46 UTC
Permalink
Post by Michael Ober
Post by Michael Ober
The code below appears to work, but it eventually freezes and I have to
reboot the server. Killing and restarting the program itself doesn't
work as the listener socket is still bound.
For what it's worth, if you wait for the TCP CLOSE_WAIT to finish, you'll
be able to restart the server. While it's possible to do so, you don't
want to bypass the CLOSE_WAIT, because there is theoretically the
possibility of network messages still in transit that, if received after
you've restarted the server, could corrupt the state of your server.
The IPServer class is actually used in two different server applications.
In this particular instance, which runs on Windows Server 2003 R2, I can
indeed wait for the TCP CLOSE_WAIT - it's faster to reboot. In the other
application, which runs on Windows XP, it appears that XP never closes the
listening socket. I suspect the XP behavior has more to do with another
server that must run on that particular box for which I have no source code
than it does with XP itself.
Post by Michael Ober
I have put debugging statements in to verify that the ThreadPool threads
are being released and the most thread pool threads that this code used
was 3, even under heavy load. The problem appears to be when there are
no clients for an extended period and then a client attempts to connect.
At that point, the application freezes.
When you break in the debugger, where are all the threads? What are they
doing?
Monday I'll modify our network to use my workstation as the server for this
particular application and I'll start a VS 2008 session. On Tuesday I'll be
able to see what happens. In the meantime I'm hoping for feedback on the
code to try to correct the problems it has. What's really strange is that
this application had been running fine for several months until I had to
rebuild the server after a disk crash.
Post by Michael Ober
I have left out the BankInfo object which contains the bank name,
address, phone number, and routing number as well as Library Routines
such as WriteLog and SMTPMail.<method>. The library routines have been
in use in other applications for several years now and are thoroughly
debugged.
Unfortunately, your code example is far from concise, nor is it complete
(in particular, lacking the entire other half of the network connection).
Without any way for anyone else to run your program and see it fail,
there's no way to know for sure what's wrong.
The other end is a currently a standard TCP client. It sends a request and
waits for the response. The client applications do have to be coded so
that if the server timeout expires they detect the dropped connection and
reconnect if they need to send again.
That said, here are some highlights of what appear to me to be problem
-- Use of "On Error Resume Next". Don't do this. You are basically
just ignoring exceptions. Bad. Even in VB.NET, it's bad.
I only use this where I don't care about the error but do want each
instruction to attempt execution. Is there a clean way of handling this
with try ... catch ... end try? In this particular case, removing the On
Error Resume Next statement was easily handled by checking for Nothing
(null) and collection.Contains() before doing the work.
-- Use of ReaderWriterLockSlim.IsReadLockHeld and .IsWriteLockHeld.
It's bad enough that, contrary to the documentation's instructions, you
are using the property to control program flow. But to do it in a loop?
After static analisys of the code, a finally clause on each of the two try
blocks was sufficient to handle this. Since OnReceive would be called in
different threads and not be reentrant, the loops can safely be removed.
The purpose of this was to ensure I didn't have a ReaderWriterLockSlim
causing a connection to stall.
-- Unsynchronized access to the _ReceiveResults and _SendResults
members. You should at a minimum be using Thread.VolatileRead() and
.VolatileWrite(). Even better would be to just use SyncLock or similar.
-- That said, the mere presence of _ReceiveResults and _SendResults is
a red flag to me. It's possible that with a purely transactional
connection, this could work, and maybe that's what you're dealing with.
But it's bad form in any case, and if you have _any_ possibility of
overlapping send or receive operations, you've got big trouble. This
design is a maintenance problem, and potentially a serious one.
Do you have a suggestion for a better design? I'm trying to keep this as
asynchronous as possible. The design concept was that if the routine
MustOverride routine has to do a lot of work and send back large numbers of
results, it can call the underlying send method multiple times before
returning, although looking at the code, I don't provide that particular
routine the ability to access the Send method.
Like I said in my original post, this program seems to lock up when there
are long periods (measured in hours) of no clients. If access to these two
variables was a problem, I would expect problems to show up during heavy
usage, not during idle periods. This server class is also used in another
application running on an XP SP3 system that is under significantly heavier
load with more multithreading going on than this particular application.
-- The code potentially raises a ConnectionClosed event for a
connection for which a NewConnection event was never raised. If nothing
else, this is IMHO poor design. If you fix the blank-check error
handling, it will likely be a real problem.
In my logs, I have never seen the exception handler for the OnAccept(ar as
IAsyncResults) subroutine called. I understand your comment, however. The
RaiseEvent ConnectionClosed event was actually added as part of the
troubleshooting so removing it isn't an issue.
There may be other problems I didn't notice. Between the fact that I
hardly ever use VB.NET and the fact that the code example is not a
concise-but-complete one, I admit there may be things I overlooked. But,
at the very least, there are some things in the code that I consider to be
serious problems, and it's possible that fixing those aspects of the code
will make the problem go away (I'm particularly concerned about the lack
of synchronized access to fields that are used to coordinate between
threads and operations).
Pete
Pete, I definitely thank you for the feedback.
Mike.
Mike... You mention that it seems to happen after long periods of
inactivity. I remember at a former employer having a similar issue with a
server - it had to do with the network card drivers power settings.
Basically, the card was being shutdown and not waking up properly when a
connection was made.... Just a thought :)
--
Tom Shelton
Michael Ober
2009-12-13 04:22:24 UTC
Permalink
Post by Tom Shelton
Post by Michael Ober
Post by Michael Ober
The code below appears to work, but it eventually freezes and I have to
reboot the server. Killing and restarting the program itself doesn't
work as the listener socket is still bound.
For what it's worth, if you wait for the TCP CLOSE_WAIT to finish, you'll
be able to restart the server. While it's possible to do so, you don't
want to bypass the CLOSE_WAIT, because there is theoretically the
possibility of network messages still in transit that, if received after
you've restarted the server, could corrupt the state of your server.
The IPServer class is actually used in two different server applications.
In this particular instance, which runs on Windows Server 2003 R2, I can
indeed wait for the TCP CLOSE_WAIT - it's faster to reboot. In the other
application, which runs on Windows XP, it appears that XP never closes the
listening socket. I suspect the XP behavior has more to do with another
server that must run on that particular box for which I have no source code
than it does with XP itself.
Post by Michael Ober
I have put debugging statements in to verify that the ThreadPool threads
are being released and the most thread pool threads that this code used
was 3, even under heavy load. The problem appears to be when there are
no clients for an extended period and then a client attempts to connect.
At that point, the application freezes.
When you break in the debugger, where are all the threads? What are they
doing?
Monday I'll modify our network to use my workstation as the server for this
particular application and I'll start a VS 2008 session. On Tuesday I'll be
able to see what happens. In the meantime I'm hoping for feedback on the
code to try to correct the problems it has. What's really strange is that
this application had been running fine for several months until I had to
rebuild the server after a disk crash.
Post by Michael Ober
I have left out the BankInfo object which contains the bank name,
address, phone number, and routing number as well as Library Routines
such as WriteLog and SMTPMail.<method>. The library routines have been
in use in other applications for several years now and are thoroughly
debugged.
Unfortunately, your code example is far from concise, nor is it complete
(in particular, lacking the entire other half of the network
connection).
Without any way for anyone else to run your program and see it fail,
there's no way to know for sure what's wrong.
The other end is a currently a standard TCP client. It sends a request and
waits for the response. The client applications do have to be coded so
that if the server timeout expires they detect the dropped connection and
reconnect if they need to send again.
That said, here are some highlights of what appear to me to be problem
-- Use of "On Error Resume Next". Don't do this. You are basically
just ignoring exceptions. Bad. Even in VB.NET, it's bad.
I only use this where I don't care about the error but do want each
instruction to attempt execution. Is there a clean way of handling this
with try ... catch ... end try? In this particular case, removing the On
Error Resume Next statement was easily handled by checking for Nothing
(null) and collection.Contains() before doing the work.
-- Use of ReaderWriterLockSlim.IsReadLockHeld and .IsWriteLockHeld.
It's bad enough that, contrary to the documentation's instructions, you
are using the property to control program flow. But to do it in a loop?
After static analisys of the code, a finally clause on each of the two try
blocks was sufficient to handle this. Since OnReceive would be called in
different threads and not be reentrant, the loops can safely be removed.
The purpose of this was to ensure I didn't have a ReaderWriterLockSlim
causing a connection to stall.
-- Unsynchronized access to the _ReceiveResults and _SendResults
members. You should at a minimum be using Thread.VolatileRead() and
.VolatileWrite(). Even better would be to just use SyncLock or similar.
-- That said, the mere presence of _ReceiveResults and _SendResults is
a red flag to me. It's possible that with a purely transactional
connection, this could work, and maybe that's what you're dealing with.
But it's bad form in any case, and if you have _any_ possibility of
overlapping send or receive operations, you've got big trouble. This
design is a maintenance problem, and potentially a serious one.
Do you have a suggestion for a better design? I'm trying to keep this as
asynchronous as possible. The design concept was that if the routine
MustOverride routine has to do a lot of work and send back large numbers of
results, it can call the underlying send method multiple times before
returning, although looking at the code, I don't provide that particular
routine the ability to access the Send method.
Like I said in my original post, this program seems to lock up when there
are long periods (measured in hours) of no clients. If access to these two
variables was a problem, I would expect problems to show up during heavy
usage, not during idle periods. This server class is also used in another
application running on an XP SP3 system that is under significantly heavier
load with more multithreading going on than this particular application.
-- The code potentially raises a ConnectionClosed event for a
connection for which a NewConnection event was never raised. If nothing
else, this is IMHO poor design. If you fix the blank-check error
handling, it will likely be a real problem.
In my logs, I have never seen the exception handler for the OnAccept(ar as
IAsyncResults) subroutine called. I understand your comment, however.
The
RaiseEvent ConnectionClosed event was actually added as part of the
troubleshooting so removing it isn't an issue.
There may be other problems I didn't notice. Between the fact that I
hardly ever use VB.NET and the fact that the code example is not a
concise-but-complete one, I admit there may be things I overlooked. But,
at the very least, there are some things in the code that I consider to be
serious problems, and it's possible that fixing those aspects of the code
will make the problem go away (I'm particularly concerned about the lack
of synchronized access to fields that are used to coordinate between
threads and operations).
Pete
Pete, I definitely thank you for the feedback.
Mike.
Mike... You mention that it seems to happen after long periods of
inactivity. I remember at a former employer having a similar issue with a
server - it had to do with the network card drivers power settings.
Basically, the card was being shutdown and not waking up properly when a
connection was made.... Just a thought :)
--
Tom Shelton
It's a VMWare guest. I just checked the guest OS from home and I'll check
the host on Monday. Thanks for the idea.

Mike.
Michael Ober
2009-12-13 05:20:08 UTC
Permalink
Let me clarify - in one post I said it was a disk crash and here I say it's
a VMWare guest. Both statements are true. We had a series of power hits
close enough together that our UPS systems couldn't get clean shutdowns on
the final power hit. The result was that the host file system was corrupted
which led to the VMWare files that comprise the guest OS filesystem being
damaged. The end result was that we lost two of our virtual servers to what
were essentially disk crashes. I had to rebuild and then restore services
from tape both servers. Fortunately the actual disks weren't physically
damaged - just the logical file system on the disk was damaged. Needless to
say it wasn't a fun weekend to recover. The application I posted the code
from has been running since last winter with no problems. I recompiled and
released and have been having problems since then.
Post by Michael Ober
It's a VMWare guest. I just checked the guest OS from home and I'll check
the host on Monday. Thanks for the idea.
Mike.
Peter Duniho
2009-12-13 06:54:46 UTC
Permalink
[...] The application I posted the code from has been running
since last winter with no problems. I recompiled and released and have
been having problems since then.
What happens when you go back to the previous version of the
application? What changed? If nothing, why recompile at all?

If the previous version, which was working fine for nearly a year,
exhibits the same failures as the new version, perhaps you simply need
to do a fresh install of the OS and VM guest.

I have one other quick comment in reply to your other post...I'll post a
separate message.

Pete
Peter Duniho
2009-12-13 07:32:06 UTC
Permalink
Post by Michael Ober
[...]
The IPServer class is actually used in two different server
applications. In this particular instance, which runs on Windows Server
2003 R2, I can indeed wait for the TCP CLOSE_WAIT - it's faster to
reboot.
Note: rebooting is bypassing the CLOSE_WAIT. The same issues exist when
you do that.
Post by Michael Ober
In the other application, which runs on Windows XP, it appears
that XP never closes the listening socket. I suspect the XP behavior
has more to do with another server that must run on that particular box
for which I have no source code than it does with XP itself.
Not sure what "the other application" is. But it is strange if a socket
remains in CLOSE_WAIT indefinitely, for an application that has crashed
(CLOSE_WAIT for a socket owned by a process that hasn't exited is
another matter entirely).
Post by Michael Ober
[...]
The other end is a currently a standard TCP client. It sends a request
and waits for the response. The client applications do have to be
coded so that if the server timeout expires they detect the dropped
connection and reconnect if they need to send again.
A code example that can be used to reproduce the problem _must_ include
the client side. Saying it's "a standard TCP client" doesn't provide
the code needed to exercise the server. In addition, with any networked
application, it is possible that a problem is a consequence of the
interaction between client and server. There needs to be code for the
client side that is known to reproduce the same problem when used with
the server side.
Post by Michael Ober
Post by Peter Duniho
That said, here are some highlights of what appear to me to be problem
-- Use of "On Error Resume Next". Don't do this. You are
basically just ignoring exceptions. Bad. Even in VB.NET, it's bad.
I only use this where I don't care about the error but do want each
instruction to attempt execution. Is there a clean way of handling this
with try ... catch ... end try?
You should only catch errors that you know you can do something with.
Sometimes, at the very highest level in your code, it may make sense to
catch any error and report it to the user, explaining that the operation
couldn't complete.

Even in that case, you need to ensure at all points in the call stack
that you haven't left any indeterminate state, and of course at the
lower levels in your code (deeper in the call stack), it is appropriate
to catch and handle exceptions only if there is a known reasonable,
correct algorithm for safely handling the exception and proceeding
without propagating the exception further up.

Catching and ignoring errors can leave your code open to any number of
corrupted data scenarios, because you obviously can't anticipate every
single error that might occur. And an error you haven't anticipated is
best handled by either not handling it at all (letting your program
crash, however painful that may be), or at least handling it only at the
top-most level and discarding any work that might have taken place.
Post by Michael Ober
[...]
Post by Peter Duniho
-- Use of ReaderWriterLockSlim.IsReadLockHeld and
.IsWriteLockHeld. It's bad enough that, contrary to the
documentation's instructions, you are using the property to control
program flow. But to do it in a loop?
After static analisys of the code, a finally clause on each of the two
try blocks was sufficient to handle this. Since OnReceive would be
called in different threads and not be reentrant, the loops can safely
be removed. The purpose of this was to ensure I didn't have a
ReaderWriterLockSlim causing a connection to stall.
I inferred the purpose. But, if there's a perceived need to write the
loop, that implies some re-entrancy. And if there's some re-entrancy,
then having one level in your call stack remove _all_ the lock
acquisitions that thread obtained means that the higher levels in your
call stack no longer own the lock and are no longer protected.

Of course, in your case you have asserted there is no re-entrancy, and I
believe that is the case (based on the code posted), but that obviously
means that the loop is not helpful even in a broken way.

As you have probably done now, the proper way is to acquire the lock,
then release it explicitly. There should never be any need to check to
see whether you have the lock. The flow of your code should provide
that guarantee. At worst, you may have a local variable that records
whether you've successfully acquired the lock, and which is checked
later in the method before releasing.
Post by Michael Ober
Post by Peter Duniho
-- Unsynchronized access to the _ReceiveResults and _SendResults
members. You should at a minimum be using Thread.VolatileRead() and
.VolatileWrite(). Even better would be to just use SyncLock or similar.
-- That said, the mere presence of _ReceiveResults and
_SendResults is a red flag to me. It's possible that with a purely
transactional connection, this could work, and maybe that's what
you're dealing with. But it's bad form in any case, and if you have
_any_ possibility of overlapping send or receive operations, you've
got big trouble. This design is a maintenance problem, and
potentially a serious one.
Do you have a suggestion for a better design?
Sure. Normally, asynchronous code just keeps the state information with
the operation. That is, each operation creates its own data structure
related to the operation. Typically, you only ever see this data
structure as something passed around (e.g. to a callback), and of course
that's the kind of data structure you have here. Your callback is
passed the data structure; there should be no need to store it within
the class itself.
Post by Michael Ober
I'm trying to keep this
as asynchronous as possible. The design concept was that if the routine
MustOverride routine has to do a lot of work and send back large numbers
of results, it can call the underlying send method multiple times before
returning, although looking at the code, I don't provide that particular
routine the ability to access the Send method.
Like I said in my original post, this program seems to lock up when
there are long periods (measured in hours) of no clients. If access to
these two variables was a problem, I would expect problems to show up
during heavy usage, not during idle periods. This server class is also
used in another application running on an XP SP3 system that is under
significantly heavier load with more multithreading going on than this
particular application.
Based on the code you posted, one possible scenario for the application
getting stuck is that the variable containing the IAsyncResult for the
send is somehow optimized such that the assignment is never seen by the
thread waiting for it to become null.

Upon closer examination, it turns out that the variable that's
potentially the problem here is actually the _bufOut variable, not the
results variables (those are problematic for other reasons, but I don't
think they would lead to the code getting stuck). With no
synchronization for the array, you may set its length to 0 in one
thread, but have another thread never see the length as 0, and so get
stuck waiting for it to be 0.

Is this actually your problem? I don't know. But I do know that you've
got asynchronous code, where different threads may be sharing a
variable, and sharing a variable between threads without any form of
synchronization can lead to incorrect behavior in the code.

Pete
Michael Ober
2009-12-13 17:57:07 UTC
Permalink
Post by Peter Duniho
Post by Michael Ober
[...]
The IPServer class is actually used in two different server applications.
In this particular instance, which runs on Windows Server 2003 R2, I can
indeed wait for the TCP CLOSE_WAIT - it's faster to reboot.
Note: rebooting is bypassing the CLOSE_WAIT. The same issues exist when
you do that.
Yes it does, but since each client query is stateless, this turns out to be
a non-issue.
Post by Peter Duniho
Post by Michael Ober
In the other application, which runs on Windows XP, it appears that XP
never closes the listening socket. I suspect the XP behavior has more to
do with another server that must run on that particular box for which I
have no source code than it does with XP itself.
Not sure what "the other application" is. But it is strange if a socket
remains in CLOSE_WAIT indefinitely, for an application that has crashed
(CLOSE_WAIT for a socket owned by a process that hasn't exited is another
matter entirely).
I have actually watched the XP box's socket status and the CLOSE_WAIT never
clears. I have strong suspicions that the other application is mucking with
the IP stack on that machine.
Post by Peter Duniho
Post by Michael Ober
[...]
The other end is a currently a standard TCP client. It sends a request
and waits for the response. The client applications do have to be coded
so that if the server timeout expires they detect the dropped connection
and reconnect if they need to send again.
A code example that can be used to reproduce the problem _must_ include
the client side. Saying it's "a standard TCP client" doesn't provide the
code needed to exercise the server. In addition, with any networked
application, it is possible that a problem is a consequence of the
interaction between client and server. There needs to be code for the
client side that is known to reproduce the same problem when used with the
server side.
I can duplicate this with a simple connect/disconnect. The problem occurs
on the connection.
Post by Peter Duniho
Post by Michael Ober
Post by Peter Duniho
That said, here are some highlights of what appear to me to be problem
-- Use of "On Error Resume Next". Don't do this. You are basically
just ignoring exceptions. Bad. Even in VB.NET, it's bad.
I only use this where I don't care about the error but do want each
instruction to attempt execution. Is there a clean way of handling this
with try ... catch ... end try?
You should only catch errors that you know you can do something with.
Sometimes, at the very highest level in your code, it may make sense to
catch any error and report it to the user, explaining that the operation
couldn't complete.
Even in that case, you need to ensure at all points in the call stack that
you haven't left any indeterminate state, and of course at the lower
levels in your code (deeper in the call stack), it is appropriate to catch
and handle exceptions only if there is a known reasonable, correct
algorithm for safely handling the exception and proceeding without
propagating the exception further up.
Catching and ignoring errors can leave your code open to any number of
corrupted data scenarios, because you obviously can't anticipate every
single error that might occur. And an error you haven't anticipated is
best handled by either not handling it at all (letting your program crash,
however painful that may be), or at least handling it only at the top-most
level and discarding any work that might have taken place.
Post by Michael Ober
[...]
Post by Peter Duniho
-- Use of ReaderWriterLockSlim.IsReadLockHeld and
.IsWriteLockHeld. It's bad enough that, contrary to the documentation's
instructions, you are using the property to control program flow. But
to do it in a loop?
After static analisys of the code, a finally clause on each of the two
try blocks was sufficient to handle this. Since OnReceive would be
called in different threads and not be reentrant, the loops can safely be
removed. The purpose of this was to ensure I didn't have a
ReaderWriterLockSlim causing a connection to stall.
I inferred the purpose. But, if there's a perceived need to write the
loop, that implies some re-entrancy. And if there's some re-entrancy,
then having one level in your call stack remove _all_ the lock
acquisitions that thread obtained means that the higher levels in your
call stack no longer own the lock and are no longer protected.
Of course, in your case you have asserted there is no re-entrancy, and I
believe that is the case (based on the code posted), but that obviously
means that the loop is not helpful even in a broken way.
As you have probably done now, the proper way is to acquire the lock, then
release it explicitly. There should never be any need to check to see
whether you have the lock. The flow of your code should provide that
guarantee. At worst, you may have a local variable that records whether
you've successfully acquired the lock, and which is checked later in the
method before releasing.
Actually - the examples with the documentation shows the following

Try
Acquire lock
catch exception
finally
Release lock
end try

I have switched the code to this structure for the two locks.
Post by Peter Duniho
Post by Michael Ober
Post by Peter Duniho
-- Unsynchronized access to the _ReceiveResults and _SendResults
members. You should at a minimum be using Thread.VolatileRead() and
.VolatileWrite(). Even better would be to just use SyncLock or similar.
-- That said, the mere presence of _ReceiveResults and _SendResults
is a red flag to me. It's possible that with a purely transactional
connection, this could work, and maybe that's what you're dealing with.
But it's bad form in any case, and if you have _any_ possibility of
overlapping send or receive operations, you've got big trouble. This
design is a maintenance problem, and potentially a serious one.
Do you have a suggestion for a better design?
Sure. Normally, asynchronous code just keeps the state information with
the operation. That is, each operation creates its own data structure
related to the operation. Typically, you only ever see this data
structure as something passed around (e.g. to a callback), and of course
that's the kind of data structure you have here. Your callback is passed
the data structure; there should be no need to store it within the class
itself.
Let me see if I understand. Dump the two class level variables and pass the
class (Me) through the Begin... method. Then pull the class out of the
AsyncState property for access to the buffers.
Post by Peter Duniho
Post by Michael Ober
I'm trying to keep this as asynchronous as possible. The design concept
was that if the routine MustOverride routine has to do a lot of work and
send back large numbers of results, it can call the underlying send
method multiple times before returning, although looking at the code, I
don't provide that particular routine the ability to access the Send
method.
Like I said in my original post, this program seems to lock up when there
are long periods (measured in hours) of no clients. If access to these
two variables was a problem, I would expect problems to show up during
heavy usage, not during idle periods. This server class is also used in
another application running on an XP SP3 system that is under
significantly heavier load with more multithreading going on than this
particular application.
Based on the code you posted, one possible scenario for the application
getting stuck is that the variable containing the IAsyncResult for the
send is somehow optimized such that the assignment is never seen by the
thread waiting for it to become null.
I'll take a look at that.
Post by Peter Duniho
Upon closer examination, it turns out that the variable that's potentially
the problem here is actually the _bufOut variable, not the results
variables (those are problematic for other reasons, but I don't think they
would lead to the code getting stuck). With no synchronization for the
array, you may set its length to 0 in one thread, but have another thread
never see the length as 0, and so get stuck waiting for it to be 0.
I have made several changes based on your comments. I haven't touched the
_bufOut or the _SendResults variables, but the _ReceiveResults variable was
actually not used anywhere and is now gone. I've started the program inside
the VS debugger. I've confirmed that it is accepting connections and
returning results today and tomorrow morning I'll see what happens when I
try to connect.
Post by Peter Duniho
Is this actually your problem? I don't know. But I do know that you've
got asynchronous code, where different threads may be sharing a variable,
and sharing a variable between threads without any form of synchronization
can lead to incorrect behavior in the code.
Thanks for that observation.
Post by Peter Duniho
Pete
I appreciate you taking the time to help.

Mike.
Peter Duniho
2009-12-14 07:40:23 UTC
Permalink
Post by Michael Ober
[...]
I can duplicate this with a simple connect/disconnect. The problem
occurs on the connection.
According to your original problem statement, the problem occurs _after_
there has been some activity, followed by a period of idle time, and
then a new connection. Until you've completely isolated the problem,
it's not possible to rule out potential dependencies on the client with
respect to the problem.

If you _can_ in fact reproduce the problem with a simple connect,
perhaps after some specific period of being idle, even so...you need to
provide the code for that. If you expect people to reproduce the
problem, you can't expect them to write new code to do that.

If you are satisfied with conjecture, then don't provide a
concise-but-complete code example.
Post by Michael Ober
[...]
Let me see if I understand. Dump the two class level variables and pass
the class (Me) through the Begin... method. Then pull the class out of
the AsyncState property for access to the buffers.
I'm not sure I understand the above. Generally, an i/o system will
track two kinds of data structures: "per-connection", and
"per-operation". You seem to have a per-connection structure now. That
structure would not contain any per-operation data (i.e. buffers).
Instead, you'd have a separate data structure for that data.
Post by Michael Ober
[...]
I have made several changes based on your comments. I haven't touched
the _bufOut or the _SendResults variables, but the _ReceiveResults
variable was actually not used anywhere and is now gone. I've started
the program inside the VS debugger. I've confirmed that it is accepting
connections and returning results today and tomorrow morning I'll see
what happens when I try to connect.
Just to be clear: I have not identified any "smoking gun" in the code
you posted. There are a number of things that I feel are potential
issues, at least with respect to maintainability if not correctness.
But none are obviously causing problems, never mind the problem you
describe.

I remain curious about the answers to the questions I asked in my other
reply, with respect to your comments that the problem started occurring
after you had other problems with the computer and had recompiled the
code that had been working fine for nearly a year.

In spite of all the "fishy" things about the code, it's entirely
possible that there's nothing wrong with the code that would lead to
incorrect behavior, and that the problem lies elsewhere.

Pete
Michael Ober
2009-12-15 02:09:39 UTC
Permalink
Peter,

I think I found it this morning. It actually wasn't in the IPServer
namespace at all. Using Remote Desktop, I started the server inside the
debugger on my workstation yesterday and checked it a couple of times during
the day several hours apart. No problems - making the "long idle" period
unlikely. This morning I put a breakpoint on the OnAccept method and
connected, single stepping through it. Inside this method there is a call
to WriteLog, which creates consistent log files across multiple Windows
development environments (not dotNet logging) and during the change of day
processing an infinite recursive loop occurred resulting in a stack
overflow. I fixed that problem and tested by changing the clock on my
system. I'll check tomorrow morning to verify. What baffles me is why I
didn't get a Stack Overflow error on the console, even in a Debug release.
The My Namespace event handlers were a late addition to the application and
I would have expected a Stack Overflow error to be reported prior to my
adding them. That would have made it a lot quicker to find and fix.

I think I corrected all but the _SendResults and _bufOut issues that you
pointed out and I'm simply not sure how to fix that one, short of using
SyncLock. Assuming the server is working tomorrow morning, I'll post the
updated code tomorrow evening.

I also realized that creating a disconnect timer for each connection was
extremely inefficient so I recoded this feature to use a single timer in the
IPServer class and a decrement counter in each ClientInformation object.
Since timeouts are always measured in minutes and the exact timeout period
isn't critical, this change too hard to do. The decrement counter is set to
the max inactivity time passed to the IPServer.New method and decremented
once a minute by a timertick handler in the IPServer class itself. The Send
and Receive methods of the ClientInformation class reset this counter to the
time out value. Yes, there can be a race condition here, but if it occurs,
one of two things will happen. Either the server will break down the
connection, which still complies with the server's contract with its
clients, or the connection timeout will be reset, which will simply keep the
connection alive longer. In either case, if the client disconnects, the
EndRecieve method of the socket will return 0 bytes and the connection will
be shutdown by the server.

The events being raised were really for user feedback, so I combined them
into a new Event that better describes what it's for. This broke the
interface "contract" presented by the IPServer class to the application, but
as there were only two applications using this class, that wasn't a big
deal. If this had been for external use or there had been a lot of
applications using this interface, I would have added the new event and left
the old ones there.

The new code actually hides the Clients List and the ConnectionInformation
class by making them private to the IPServer class as well as flattening the
NameSpace. The original desired design was to inherit this class and
override the ProcessMessage method for the business logic, hiding all the
complexity of handling the TCP communications. There wasn't supposed to be
external access to the client connections. I was close previously, but not
quite there. This has the additional benefit of allowing multiple IPServers
to exist in the same application without the possibility of them interfering
with each other. Each IPServer override could then implement different
business logic, simplifying the ProcessMessage override. In the environment
I work in, this is useful as I have a server that collects information from
multiple sources and updates multiple external applications. This
particular application is the other one that uses this class. It uses the
IPServer class and the MSMQ interface to collect data and it updates three
different systems, providing an integrated view of the data in a near
real-time environment. The vendor of the external system would like to do
away with the MSMQ interface and go strictly TCP for updates.

As for the recompile - I'm actually surprised this application worked at all
during date changes. After the rebuild I did restore from tape and the
restored executable also crashed. The WriteLog stack overflow problem has
been in the code since it was ported from VC6 and VB6 to VB2005 several
years ago. This bug doesn't exist in the VC6 version as I have another
application server, writtin in VC++/MFC, running on the same server that
doesn't exhibit this problem. I don't know if it exists in the VB6 version
of the class and will probably never find out as all new development is in
C# and VB 2008. In my environment, there are no extended execution TCP
servers written in VB6. I used VC++/MFC for those as it was much easier for
me to handle TCP messages in VC++/MFC than in VB6.

Once again, thanks for all your feedback.

Mike Ober.
Post by Peter Duniho
Post by Michael Ober
[...]
I can duplicate this with a simple connect/disconnect. The problem
occurs on the connection.
According to your original problem statement, the problem occurs _after_
there has been some activity, followed by a period of idle time, and then
a new connection. Until you've completely isolated the problem, it's not
possible to rule out potential dependencies on the client with respect to
the problem.
If you _can_ in fact reproduce the problem with a simple connect, perhaps
after some specific period of being idle, even so...you need to provide
the code for that. If you expect people to reproduce the problem, you
can't expect them to write new code to do that.
If you are satisfied with conjecture, then don't provide a
concise-but-complete code example.
Post by Michael Ober
[...]
Let me see if I understand. Dump the two class level variables and pass
the class (Me) through the Begin... method. Then pull the class out of
the AsyncState property for access to the buffers.
I'm not sure I understand the above. Generally, an i/o system will track
two kinds of data structures: "per-connection", and "per-operation". You
seem to have a per-connection structure now. That structure would not
contain any per-operation data (i.e. buffers). Instead, you'd have a
separate data structure for that data.
Post by Michael Ober
[...]
I have made several changes based on your comments. I haven't touched
the _bufOut or the _SendResults variables, but the _ReceiveResults
variable was actually not used anywhere and is now gone. I've started
the program inside the VS debugger. I've confirmed that it is accepting
connections and returning results today and tomorrow morning I'll see
what happens when I try to connect.
Just to be clear: I have not identified any "smoking gun" in the code you
posted. There are a number of things that I feel are potential issues, at
least with respect to maintainability if not correctness. But none are
obviously causing problems, never mind the problem you describe.
I remain curious about the answers to the questions I asked in my other
reply, with respect to your comments that the problem started occurring
after you had other problems with the computer and had recompiled the code
that had been working fine for nearly a year.
In spite of all the "fishy" things about the code, it's entirely possible
that there's nothing wrong with the code that would lead to incorrect
behavior, and that the problem lies elsewhere.
Pete
Peter Duniho
2009-12-15 02:48:48 UTC
Permalink
Post by Michael Ober
Peter,
I think I found it this morning. It actually wasn't in the IPServer
namespace at all.
Just as I suggested might be the case.

I'm glad you were able to find the problem. But I hope you can see,
based on this experience, why a concise-but-complete code example is so
important in terms of obtaining help. Without one, all anyone can do is
make random guesses and suggestions, none of which may have anything at
all to do with the problem.

While comments you received to your question might have been helpful in
a general sense, I doubt any of them were instrumental in leading you to
a solution to the specific problem you were asking about.
Post by Michael Ober
[...]
I think I corrected all but the _SendResults and _bufOut issues that you
pointed out and I'm simply not sure how to fix that one, short of using
SyncLock.
Nothing wrong with a SyncLock, IMHO. It's always most important to make
sure the code works right, then worry about whether there's a better,
more efficient solution. And .NET's Monitor class is reasonably efficient.
Post by Michael Ober
[...]
I also realized that creating a disconnect timer for each connection was
extremely inefficient so I recoded this feature to use a single timer in
the IPServer class and a decrement counter in each ClientInformation
object.
If I were to code the timeout/disconnect logic from scratch, I might use
a sorted event queue, sorted by the event time and with a single thread,
and a call to Thread.Sleep() to delay until the next event. Add the
various logic to deal with resetting timers, adding new ones with a
future time earlier than the earliest current one in the queue, etc. and
you're done.

Fortunately, I believe that's basically what the System.Timers.Timer
timer does. It's specifically designed for server-related timers (or at
least, so the documentation says :) ), and I'm pretty sure one of its
main features is the ability to track a large number of timed events as
timers, without putting a significant load on the system.
Post by Michael Ober
Since timeouts are always measured in minutes and the exact
timeout period isn't critical, this change too hard to do. The
decrement counter is set to the max inactivity time passed to the
IPServer.New method and decremented once a minute by a timertick handler
in the IPServer class itself. The Send and Receive methods of the
ClientInformation class reset this counter to the time out value.
Advantage to your approach: reset of a timer is simply a new counter
value. Disadvantage to your approach: on every tick of the timer, you
have to inspect every connected client's data structure.

One downside to the sorted list of future events is the cost of the
insertion of a new element (either O(n) or O(log n), depending on the
implementation, which I don't know), borne every time you add a timer or
reset one. And of course, for large numbers of clients, this can be
large too.

Of course, if all the connections use the same timeout (which seems
likely), than any reset timer always winds up at the very end of the
sorted list, and the cost of doing that can be almost as cheap as
resetting a counter, because you already know where the new position is
in the list rather than having to search for it.

Obviously it depends somewhat on the specifics. But based on what I
think is likely about your implementation (specifically, every
connection uses the same timeout), you may find it preferable to
actually just create your own single-threaded event queue for the
timing, taking advantage of the fact that you know every new event added
(whether a new connection or one for which i/o operation happened and
thus needs a reset timer) goes to the end of the list.

I don't know for sure, but it's possible System.Timers.Timer includes an
optimization to check for end-of-list insertions, and so could in fact
perform basically as well as a custom implementation would. (I took a
quick look in Reflector, but unfortunately, the meat of the
implementation seems to be in native code, which doesn't show up in
Reflector; I can, however, verify that System.Timers.Timer just uses
System.Threading.Timer underneath, so whatever characteristics one has,
the other is pretty much the same :) ).
Post by Michael Ober
Yes,
there can be a race condition here, but if it occurs, one of two things
will happen. Either the server will break down the connection, which
still complies with the server's contract with its clients, or the
connection timeout will be reset, which will simply keep the connection
alive longer. In either case, if the client disconnects, the EndRecieve
method of the socket will return 0 bytes and the connection will be
shutdown by the server. [...]
Right. Timeouts always involve race conditions. The best you can do is
just make sure you have a way of knowing who won, and then get that part
right without corrupting your data. :)

Pete
Michael Ober
2009-12-18 23:48:07 UTC
Permalink
Pete and all,

Here's the final code. I actually got this working yesterday but ran into
another problem with cross thread errors inside the TCPServer class itself.
After googling for the specific error message and finding multiple
references as well as a bug report to MS that was closed because MS couldn't
duplicate the bug, I realized that all the bug reports were from VB
developers and none from C# developers and that the bug first appeared in VB
2005 RTM. VB 2005 and 2008 have a "My" class that provides several
shortcuts into the framework, which are really useful to someone just
starting in VB simply because of the sheer size of the framework. Reading
the My class documentation, almost all these classes eventually inherit from
WinForms when used in a WinForms application. Cross thread operations
against WinForms has long been documented, even in the Win32 API, as a major
don't do. Now when I start maintainence on a program, I search the entire
solution for "My." and recode them to use standard framework classes. This
fixed the problem and also probably explains why MS couldn't recreate it -
they were testing using the framework only.

Once again, thanks to everyone, especially Pete, who spent the time to
provide feedback.

Mike.

'============================================
Option Compare Text
Option Strict On
Option Explicit On
Option Infer Off

Imports System.Net.Sockets
Imports System.Net
Imports System.Threading
Imports System.Text.ASCIIEncoding
Imports System.IO

Public MustInherit Class IPServer
Implements IDisposable

Public Event ConnectionList(ByVal sConnection As List(Of String))
Public MustOverride Function ProcessMessage(ByVal message As String) As
String

' My list of clients
Private Clients As New ClientList

Private _tcpServer As Socket = Nothing
Private ReadOnly _CommandTermination As String = BEL

' Timeout support; Asynchronous sockets don't directly support timers.
Protected ReadOnly _SocketTimeout As TimeSpan
Protected ReadOnly _timer As Timer = Nothing

Public Sub New(ByVal Port As Integer)
Me.New(Port, BEL, Nothing)
End Sub
Public Sub New(ByVal Port As Integer, ByVal SocketTimeout As TimeSpan)
Me.New(Port, BEL, SocketTimeout)
End Sub
Public Sub New(ByVal Port As Integer, ByVal CommandTermination As
String)
Me.New(Port, CommandTermination, Nothing)
End Sub
Public Sub New(ByVal Port As Integer, ByVal CommandTermination As
String, ByVal SocketTimeout As TimeSpan)
' save the termination and TTL before doing anything else to avoid
threading issues
_CommandTermination = CommandTermination
_SocketTimeout = SocketTimeout

Try
Dim myAddr As IPAddress =
Dns.GetHostEntry(Environment.MachineName).AddressList(0)
Dim sEndPoint As String = Environment.MachineName & "(" &
myAddr.ToString() & "):" & Port.ToString()
WriteLog("Configure Listener on " & sEndPoint)

_tcpServer = New Socket(AddressFamily.InterNetwork,
SocketType.Stream, ProtocolType.Tcp)
Dim LocalEP As IPEndPoint = New IPEndPoint(myAddr, Port)
_tcpServer.Bind(LocalEP)
_tcpServer.Listen(5) ' 5 is the standard backlog value

_tcpServer.BeginAccept(AddressOf OnAccept, Nothing)
WriteLog("Listening for connections on " & sEndPoint)

If _SocketTimeout <> Nothing Then
Dim Interval As New TimeSpan(0, 1, 0)
_timer = New Threading.Timer(AddressOf TimerTick, Nothing,
Interval, Interval)
WriteLog("Socket Timer Started")
End If
Catch ex As Exception
WriteLog(ex)
SMTPMail.SendMessage("mis", AppName() & ": Unable to create
Listener", ex.Message())
Throw ex
End Try
End Sub

'Handle connection requests
Private Sub OnAccept(ByVal ar As System.IAsyncResult)
Dim ci As New ConnectionInformation(Me, _SocketTimeout)
Try
ci.sock = _tcpServer.EndAccept(ar)

' Start listening for the next connection
_tcpServer.BeginAccept(AddressOf OnAccept, Nothing)

' Wait for data
ci.BeginReceive(AddressOf OnReceive)

Catch ex As Exception
WriteLog(ci.ToString(), ex)
Finally
WriteLog(Clients.ToString(), True)
RaiseEvent ConnectionList(Clients.ToList())
End Try
End Sub

Private Sub OnReceive(ByVal ar As IAsyncResult)
Dim ci As ConnectionInformation = CType(ar.AsyncState,
ConnectionInformation)
Try
Dim bytesReceived As Integer = ci.EndReceive(ar)

Select Case bytesReceived
Case 0
ci.Close("Client Closed Connection")

Case Else
Dim i As Integer = InStr(ci.MessageIn,
_CommandTermination)
Do While i > 0
' Process msg
Dim msg As String = Left$(ci.MessageIn, i - 1)
WriteLog(ci.ToString() & " => " & msg)
Dim msgOut As String = ProcessMessage(msg)
If msgOut <> "" Then
ci.Send(msgOut & _CommandTermination)
WriteLog(ci.ToString() & " <= " & msgOut)
End If

ci.MessageIn = Mid$(ci.MessageIn, i +
_CommandTermination.Length)
i = InStr(ci.MessageIn, _CommandTermination)
Loop

' Finally, read the next chunk of data
ci.BeginReceive(AddressOf OnReceive)
End Select

Catch ex As Exception
WriteLog(ex)
Finally
WriteLog(Clients.ToString())
RaiseEvent ConnectionList(Clients.ToList())
End Try
End Sub

Private Sub TimerTick(ByVal o As Object)
' loop by descending index to ensure no connections are missed
' Cannot use a for each loop as ConnectionInformation.Close()
removes the connection from the clients list
For i As Integer = Clients.Count - 1 To 0 Step -1
Dim ci As ConnectionInformation = Clients(i)
Threading.Interlocked.Decrement(ci.Activity)
If ci.Activity <= 0 Then ci.Close("Socket Inactivity Timeout")
Next
RaiseEvent ConnectionList(Clients.ToList())
End Sub

Public Sub Close()
WriteLog("Server Shutdown Starting")

' Close the listener
WriteLog("Listener being closed")
If _tcpServer IsNot Nothing Then
_tcpServer.Close()
_tcpServer = Nothing
End If

' Stop the socket timer
If _timer IsNot Nothing Then _timer.Dispose()

' Stop the clients; Don't use "for each" as closing a connection
removes it from clients
' Closing a client connection has the side effect of removing the
connection from the clients collection
WriteLog("Closing Clients: " & Clients.ToString())
Do While Clients.Count > 0
Clients(0).Close("Server Shutting Down")
Loop
End Sub

#Region " IDisposable Support "
' This code added by Visual Basic to correctly implement the disposable
pattern.

Protected Overridable Sub Dispose(ByVal disposing As Boolean)
Close()
End Sub

Public Sub Dispose() Implements IDisposable.Dispose
' Do not change this code. Put cleanup code in Dispose(ByVal
disposing As Boolean) above.
Dispose(True)
GC.SuppressFinalize(Me)
End Sub
#End Region

Public Class ConnectionInformation
Implements IDisposable

Public MessageIn As String = ""

' Who's talking to me?
Private _ClientName As String = ""
Private _ClientPort As Integer = 0

' Who am I
Private _server As IPServer = Nothing

' Private structures and storage for sending and receiving data
Private _sock As Socket = Nothing
Private Const _bufSize As Integer = 1500
Private _bufIn(_bufSize) As Byte

' These two variables could be the source of a possible, but never
seen, race condition
Private _SendResults As IAsyncResult = Nothing
Private _bufOut() As Byte

' TimeOut support; Async sockets don't directly support timeouts
Public Activity As Integer = 0
Private _Activity As Integer = 0

Public Sub New(ByVal server As IPServer, ByVal sock As Socket, ByVal
SocketTimeout As TimeSpan)
Me.New(server, SocketTimeout)
Me.sock = sock
End Sub
Public Sub New(ByVal server As IPServer, ByVal SocketTimeout As
TimeSpan)
_server = server
If SocketTimeout <> Nothing Then _Activity =
SocketTimeout.Minutes
Activity = _Activity
_server.Clients.Add(Me)
End Sub

Public Sub BeginReceive(ByVal callback As AsyncCallback)
_sock.BeginReceive(_bufIn, 0, _bufSize, SocketFlags.None,
callback, Me)
Activity = _Activity
End Sub
Public Function EndReceive(ByVal asyncResult As IAsyncResult) As
Integer
Dim ci As ConnectionInformation = CType(asyncResult.AsyncState,
ConnectionInformation)
Try
If ci._sock Is Nothing Then Return 0

ci.Activity = ci._Activity
Dim bytesReceived As Integer =
ci._sock.EndReceive(asyncResult)
ci.MessageIn &= ASCII.GetString(ci._bufIn, 0, bytesReceived)
Return bytesReceived

Catch exSock As SocketException
Dim sockErr As SocketError = exSock.SocketErrorCode
Dim msg As String = [Enum].GetName(GetType(SocketError),
sockErr)
WriteLog("Socket Exception: " & msg, exSock)
Return 0
Catch ex As Exception
WriteLog(ex)
Return 0
Finally
ci.Activity = ci._Activity
End Try

Return 0 ' Forces the socket to close elsewhere in code
End Function

Public Sub Send(ByVal MessageOut As String)
Try
Activity = _Activity
Do Until _SendResults Is Nothing OrElse _bufOut.Length = 0
_SendResults.AsyncWaitHandle.WaitOne()
Loop

_bufOut = System.Text.Encoding.ASCII.GetBytes(MessageOut)
_SendResults = _sock.BeginSend(_bufOut, 0, _bufOut.Length,
SocketFlags.None, AddressOf OnSend, Me)

Catch ex As Exception
WriteLog(ex)
Finally
Activity = _Activity
End Try
End Sub

Private Sub OnSend(ByVal ar As IAsyncResult)
Dim ci As ConnectionInformation = CType(ar.AsyncState,
ConnectionInformation)
ci.Activity = ci._Activity
Try
Dim bytesSent As Integer = ci._sock.EndSend(ar)

' Did we send the entire buffer?
If bytesSent >= _bufOut.Length Then
Array.Resize(_bufOut, 0)

Else
' No; reset the buffer to contain only the bytes needed
to be sent
Dim tBuf(_bufOut.Length - bytesSent - 1) As Byte
Array.Copy(_bufOut, bytesSent, tBuf, 0, tBuf.Length)
_bufOut = tBuf
' Send them
_SendResults = ci._sock.BeginSend(_bufOut, 0,
_bufOut.Length, SocketFlags.None, AddressOf OnSend, ci)
End If
ci.Activity = ci._Activity
Catch ex As Exception
WriteLog(ci.ToString() & vbNewLine & ex.Message, True)
Finally
ci.Activity = ci._Activity
End Try
End Sub

Public WriteOnly Property sock() As Socket
Set(ByVal value As Socket)
_sock = value

If _sock Is Nothing Then
_ClientName = ""
_ClientPort = 0
Activity = 0
Else
Dim ClientEndPoint As IPEndPoint =
CType(_sock.RemoteEndPoint, IPEndPoint)
_ClientName =
Dns.GetHostEntry(ClientEndPoint.Address.ToString()).HostName
_ClientPort = ClientEndPoint.Port
Activity = _Activity
End If
End Set
End Property

Public ReadOnly Property ClientName() As String
Get
Return _ClientName
End Get
End Property
Public ReadOnly Property ClientPort() As Integer
Get
Return _ClientPort
End Get
End Property

Public Shadows Function ToString() As String
Return _ClientName & ":" & _ClientPort.ToString()
End Function

Public ReadOnly Property ClientCount() As Integer
Get
Return _server.Clients.Count
End Get
End Property

Public Function Close(ByVal Reason As String) As Integer
WriteLog(RemoveSpaces(Reason & ": " & Me.ToString()))
If _sock IsNot Nothing Then
_sock.Shutdown(SocketShutdown.Both)
_sock.Close()
_sock = Nothing
End If
If _server.Clients.Contains(Me) Then _server.Clients.Remove(Me)
Return _server.Clients.Count
End Function

#Region " IDisposable Support "
' This code added by Visual Basic to correctly implement the
disposable pattern.
Protected Overridable Sub Dispose(ByVal disposing As Boolean)
Close("Server Disposing Connection")
End Sub

Public Sub Dispose() Implements IDisposable.Dispose
' Do not change this code. Put cleanup code in Dispose(ByVal
disposing As Boolean) above.
Dispose(True)
GC.SuppressFinalize(Me)
End Sub
#End Region

End Class

Private Class ClientList
Inherits SynchronizedCollection(Of ConnectionInformation)

Public Function ToList() As List(Of String)
Dim sa As New List(Of String)
For Each ci As ConnectionInformation In MyBase.Items.ToArray()
sa.Add(ci.ToString() & "/" & ci.Activity.ToString("#,##0"))
Next
Return sa
End Function

Public Overloads Function ToString() As String
Dim sa As List(Of String) = Me.ToList()
Dim s As String = "Connections: " & sa.Count.ToString("#,##0")
For Each li As String In sa
s &= vbNewLine & li
Next
Return s
End Function
End Class
End Class
Peter Duniho
2009-12-19 00:45:55 UTC
Permalink
Post by Michael Ober
[...]
Once again, thanks to everyone, especially Pete, who spent the time to
provide feedback.
I have more. :)

Here are some things that I noticed looking through the code you posted:

-- There's a fair amount of redundant code, particularly with respect
to the "Activity = _Activity" statement that appears a lot. Not that
big of a deal, but you'll wish you'd made the "Activity" member a
property when you see the next item.

-- You have a threading bug in the handling of the "Activity" field.
In particular, while you do use the Interlocked class to decrement the
value, none of the assignments are protected in any way. So you have no
guarantee that when you decrement the value, it's actually the correct,
most recent value. Instead, any line that reads "Activity = _Activity"
should read "Thread.VolatileWrite(Activity, _Activity)". This will
ensure volatile semantics for the assignment. (Had you made "Activity"
a property, you could've fixed it in one place, rather than having to go
visit each and every statement in the code where "Activity" is used).

-- It also appears to me that you have a numerical bug in the
initialization of the "_Activity" field. In particular, you are using
TimeSpan.Minutes, while I believe you really want TimeSpan.TotalMinutes
(truncated to an integer, of course). As the code is now, if someone
specifies a 90-minute timeout, you'll only wait 30 minutes before
expiring the connection.

-- You also have managed to incorrectly implement the ToString()
method, in two completely different ways. :) You should make the
method an "Overrides", but you have one place where you've specified
"Shadows", and another place where you've specified "Overloads".

-- In your ConnectionInformation.Close() method, there is no point in
calling Contains(). It is not an error to call Remove() with an element
not actually in the collection, and since half of the work of the
Remove() method is scanning the list to find the element, which is the
same work the Contains() method does, you effectively do that twice for
no benefit to the code.

-- You have left in the behavior I commented on before, with respect
to having the per-connection "_SendResults" member, and a per-connection
buffer. If done correctly, this would at worst be a potential
performance issue, but because there's no synchronization around either
the _SendResults member, nor the _bufOut member, you have a potential
data corruption bug. You should either fix the design so that you have
a per-i/o-operation data structure that isolates each operation from any
other operation, or synchronize the code where you use the shared member
variables.

On that last point, note that just adding synchronization is probably
fine. Generalizing the data structures so you have a per-i/o structure
would make the code more flexible, but then you'd have to worry about
keeping overlapped sends in the correct order, which is just one more
complication you'll then need to code for. :)

That's all I see for now. :)

Pete
Michael Ober
2009-12-19 02:58:28 UTC
Permalink
Post by Peter Duniho
Post by Michael Ober
[...]
Once again, thanks to everyone, especially Pete, who spent the time to
provide feedback.
I have more. :)
-- There's a fair amount of redundant code, particularly with respect to
the "Activity = _Activity" statement that appears a lot. Not that big of
a deal, but you'll wish you'd made the "Activity" member a property when
you see the next item.
-- You have a threading bug in the handling of the "Activity" field. In
particular, while you do use the Interlocked class to decrement the value,
none of the assignments are protected in any way. So you have no
guarantee that when you decrement the value, it's actually the correct,
most recent value. Instead, any line that reads "Activity = _Activity"
should read "Thread.VolatileWrite(Activity, _Activity)". This will ensure
volatile semantics for the assignment. (Had you made "Activity" a
property, you could've fixed it in one place, rather than having to go
visit each and every statement in the code where "Activity" is used).
The _Activity field is part of the timeout system. If it gets an
inconsistent value, I don't think it's an an issue. It will either be
decremented early, which will cause an early timeout, or reset to the
initial value late, which will keep an inactive client connected longer. I
had actually used Interlocked.Decrement originally in the TimerTick method,
but couldn't find an equivalent for resetting this value. I didn't know
about the VolatileWrite method. That said - here's the new code for setting
and resetting Activity. I'm using

Private _TimeRemaining As Integer = 0
Public Property Activity() As Integer
Get
Thread.VolitileRead(_TimeRemaining)
End Get
Set(ByVal value As Integer)
Thread.VolitileWrite(_TimeRemainaing, value)
End Set
End Property

C#'s volitile keyword would have made this easier. One question - how do
the VolitileRead and VolitileWrite methods interact with the
Interlocked.Decrement method used in the IPServer.TimerTick callback?
Post by Peter Duniho
-- It also appears to me that you have a numerical bug in the
initialization of the "_Activity" field. In particular, you are using
TimeSpan.Minutes, while I believe you really want TimeSpan.TotalMinutes
(truncated to an integer, of course). As the code is now, if someone
specifies a 90-minute timeout, you'll only wait 30 minutes before expiring
the connection.
I hadn't even thought of this scenerio. The longest timeout I had used was
15 minutes. Fixed. Thanks.
Post by Peter Duniho
-- You also have managed to incorrectly implement the ToString() method,
in two completely different ways. :) You should make the method an
"Overrides", but you have one place where you've specified "Shadows", and
another place where you've specified "Overloads".
This is unfortunately one of the things that VB allows. There is definitely
a weaknesses here in the compiler itself. Another one that I find
troublesome is that properties and methods can be referenced with or without
the (), unlike C# that strictly enforces this. Fixed.
Post by Peter Duniho
-- In your ConnectionInformation.Close() method, there is no point in
calling Contains(). It is not an error to call Remove() with an element
not actually in the collection, and since half of the work of the Remove()
method is scanning the list to find the element, which is the same work
the Contains() method does, you effectively do that twice for no benefit
to the code.
Updated. I was expecting an exception if the object wasn't already in the
collection. RTM (Read The Manual).
Post by Peter Duniho
-- You have left in the behavior I commented on before, with respect to
having the per-connection "_SendResults" member, and a per-connection
buffer. If done correctly, this would at worst be a potential performance
issue, but because there's no synchronization around either the
_SendResults member, nor the _bufOut member, you have a potential data
corruption bug. You should either fix the design so that you have a
per-i/o-operation data structure that isolates each operation from any
other operation, or synchronize the code where you use the shared member
variables.
On that last point, note that just adding synchronization is probably
fine. Generalizing the data structures so you have a per-i/o structure
would make the code more flexible, but then you'd have to worry about
keeping overlapped sends in the correct order, which is just one more
complication you'll then need to code for. :)
SyncLock _outBuf
' All operations on _SendResults and _outBuf in the method.
End SyncLock

Since I can't send the next message until the previous one completes, any
performance hit would already have been experienced simply by the wait loops
that are in the code.
Post by Peter Duniho
That's all I see for now. :)
Pete
Peter Duniho
2009-12-19 08:28:22 UTC
Permalink
Post by Michael Ober
The _Activity field is part of the timeout system. If it gets an
inconsistent value, I don't think it's an an issue.
Without volatile semantics, in theory writes to the field might _never_
be visible to other threads.

On x86, I believe this would happen only due to compiler optimizations,
and given that it's a member field of a class, I'd guess that wouldn't
happen. But I prefer code that's provably correct, rather than
"probably won't break" in a specific environment. :)
Post by Michael Ober
[...]
C#'s volitile keyword would have made this easier. One question - how
do the VolitileRead and VolitileWrite methods interact with the
Interlocked.Decrement method used in the IPServer.TimerTick callback?
I think it should be fine.

You'll notice that Volatile...() methods only support data types that
can be handled atomically. The Interlocked methods all do atomic
operations as well.

If you'd rather stick only with the Interlocked class, there is the
Interlocked.Exchange() method which you can use to write to a variable,
and the Interlocked.Read() method. But my recollection is that you have
an atomicity guarantee for 32-bit values in .NET, so other than the race
conditions that, as you say, aren't of concern, I don't see a problem
mixing the APIs.

By the way, as I scan through the code again, I see at least one more
problem: you haven't synchronized the ClientInformation.Close() method.
You could get an ObjectDisposedException if two different threads try
to close the same object at the same time, as one reaches the
_sock.Close() before the other reaches the _sock.Shutdown(). (I don't
recall off the top of my head, but it's possible you can't even call
Shutdown() twice with the SocketShutdown.Both value...but for sure,
trying to call Shutdown() on a disposed/closed socket is not good).

You should definitely put a lock around the Socket shutdown/closure
code, so that only one caller to Close() ever finds the _sock field
non-null.

Related to this is the fact that the Close() method returns the new
client count. I didn't bother to look at how this return value is used,
but you should review that to make sure that having two different calls
to the Close() method returning the same value is okay, and that having
any given call to Close() return a value that is more than one less than
what the client count was before the Close() call is also okay (the
latter could happen if one client is removed in one thread, and another
is removed by a different thread at the same time).

Pete
Michael Ober
2009-12-19 14:24:35 UTC
Permalink
Post by Michael Ober
The _Activity field is part of the timeout system. If it gets an
inconsistent value, I don't think it's an an issue.
Without volatile semantics, in theory writes to the field might _never_ be
visible to other threads.
On x86, I believe this would happen only due to compiler optimizations,
and given that it's a member field of a class, I'd guess that wouldn't
happen. But I prefer code that's provably correct, rather than "probably
won't break" in a specific environment. :)
Post by Michael Ober
[...]
C#'s volitile keyword would have made this easier. One question - how do
the VolitileRead and VolitileWrite methods interact with the
Interlocked.Decrement method used in the IPServer.TimerTick callback?
I think it should be fine.
You'll notice that Volatile...() methods only support data types that can
be handled atomically. The Interlocked methods all do atomic operations
as well.
If you'd rather stick only with the Interlocked class, there is the
Interlocked.Exchange() method which you can use to write to a variable,
and the Interlocked.Read() method. But my recollection is that you have
an atomicity guarantee for 32-bit values in .NET, so other than the race
conditions that, as you say, aren't of concern, I don't see a problem
mixing the APIs.
By the way, as I scan through the code again, I see at least one more
problem: you haven't synchronized the ClientInformation.Close() method.
You could get an ObjectDisposedException if two different threads try to
close the same object at the same time, as one reaches the _sock.Close()
before the other reaches the _sock.Shutdown(). (I don't recall off the
top of my head, but it's possible you can't even call Shutdown() twice
with the SocketShutdown.Both value...but for sure, trying to call
Shutdown() on a disposed/closed socket is not good).
In my logs, I actually see the ObjectDisposedException a few times a day.
Hadn't had time to track it down, so thanks. I'll put a SyncLock Me before
the test If _sock IsNot Nothing. Locking the whole object here is safer
than locking just the _sock object since the _sock object may be nothing,
which I don't think will support a lock.
You should definitely put a lock around the Socket shutdown/closure code,
so that only one caller to Close() ever finds the _sock field non-null.
Related to this is the fact that the Close() method returns the new client
count. I didn't bother to look at how this return value is used, but you
should review that to make sure that having two different calls to the
Close() method returning the same value is okay, and that having any given
call to Close() return a value that is more than one less than what the
client count was before the Close() call is also okay (the latter could
happen if one client is removed in one thread, and another is removed by a
different thread at the same time).
The return value from Close isn't used anymore. If you still have the
earlier code, it was used as the second argument to the NewConnection and
ClosedConnection events. Removing it isn't an issue.
Pete
Mike.

Goran Sliskovic
2009-12-13 23:37:27 UTC
Permalink
Post by Michael Ober
The code below appears to work, but it eventually freezes and I have to
reboot the server. Killing and restarting the program itself doesn't
work as the listener socket is still bound.
For what it's worth, if you wait for the TCP CLOSE_WAIT to finish, you'll
be able to restart the server. While it's possible to do so, you don't
want to bypass the CLOSE_WAIT, because there is theoretically the
possibility of network messages still in transit that, if received after
you've restarted the server, could corrupt the state of your server.
...

CLOSE_WAIT means that the TCP connection has been terminated, but the socket
descriptor has not been closed (by calling close by application, of course).
There is no possibility that there are messages still in transit (at least
messages that could do any damage). TIME_WAIT state is there to ensure this.

The socket is bound in OP's case because it is not closed, for whatever
reason. Connection is closed, however socket is not.

Regards,
Goran
Peter Duniho
2009-12-14 07:43:48 UTC
Permalink
Post by Goran Sliskovic
CLOSE_WAIT means that the TCP connection has been terminated, but the
socket descriptor has not been closed (by calling close by application,
of course). There is no possibility that there are messages still in
transit (at least messages that could do any damage). TIME_WAIT state is
there to ensure this.
Sorry…I have confused the two. Thanks.
Post by Goran Sliskovic
The socket is bound in OP's case because it is not closed, for whatever
reason. Connection is closed, however socket is not.
The OP mentions another case where the connection indefinitely remains
in CLOSE_WAIT even after the process exits. Can you think of a reason
that might happen? I couldn't.

Pete
Goran Sliskovic
2009-12-14 22:20:25 UTC
Permalink
"Peter Duniho" <***@no.nwlink.spam.com> wrote in message news:***@TK2MSFTNGP02.phx.gbl...
...
The OP mentions another case where the connection indefinitely remains in
CLOSE_WAIT even after the process exits. Can you think of a reason that
might happen? I couldn't.
Pete
Once I saw this with WebRequest class. Either .NET uses some other process
that cleans up the socket (or some other class) or process is not shown in
task manager but is still somehow active.But this would be internals of
windows/.net that I'm not familiar with. I'm pretty sure that exiting the
process should close all sockets.
The OP problem looks to me as a case of deadlock, possibly in the finalizer.
The only way to see what is going on is to attach debugger (eg. Windbg with
SOS.dll).

Regards,
Goran
Michael Ober
2009-12-15 01:34:39 UTC
Permalink
Post by Goran Sliskovic
...
The OP mentions another case where the connection indefinitely remains in
CLOSE_WAIT even after the process exits. Can you think of a reason that
might happen? I couldn't.
Pete
Once I saw this with WebRequest class. Either .NET uses some other process
that cleans up the socket (or some other class) or process is not shown in
task manager but is still somehow active.But this would be internals of
windows/.net that I'm not familiar with. I'm pretty sure that exiting the
process should close all sockets.
The OP problem looks to me as a case of deadlock, possibly in the
finalizer. The only way to see what is going on is to attach debugger (eg.
Windbg with SOS.dll).
Regards,
Goran
Unfortunately I can't attach a debugger to the XP system, but I'll try the
updated server code on Wednesday to see if the problem is corrected.

Mike.
Loading...