Reference : http://msdn2.microsoft.com/en-us/magazine/cc163352.aspx
.NET Matters
Deadlock monitor
Stephen ToubCode download available at: NETMatters2007_10.exe (156 KB) Browse the Code Online [ http://msdn2.microsoft.com/en-us/magazine/cc164433(printer).aspx ]
Q I'm using locks in my application to synchronize work on a bunch of threads. Unfortunately, I'm doing something incorrectly and my threads seem to just stop sometimes. I think I'm running into deadlocks, but I'm not sure how to find them. Is there any way I can do so programmatically? I'd like an exception to be thrown when a deadlock is encountered.
Q I'm using locks in my application to synchronize work on a bunch of threads. Unfortunately, I'm doing something incorrectly and my threads seem to just stop sometimes. I think I'm running into deadlocks, but I'm not sure how to find them. Is there any way I can do so programmatically? I'd like an exception to be thrown when a deadlock is encountered.
A First, it's important to understand what a deadlock among threads is and the conditions that lead to one. Threads deadlock when waiting on each other to release some resources, but by performing that blocking wait, they're not releasing the resources the other threads need in order to unblock. The threads can't make any progress until the resources are released, but because they're not making progress, the resources will never be released, the threads are locked up, and thus "deadlock." Many OS course textbooks will cite the four conditions necessary for a deadlock to occur:
A First, it's important to understand what a deadlock among threads is and the conditions that lead to one. Threads deadlock when waiting on each other to release some resources, but by performing that blocking wait, they're not releasing the resources the other threads need in order to unblock. The threads can't make any progress until the resources are released, but because they're not making progress, the resources will never be released, the threads are locked up, and thus "deadlock." Many OS course textbooks will cite the four conditions necessary for a deadlock to occur:
A limited number of a particular resource. In the case of a monitor in C# (what you use when you employ the lock keyword), this limited number is one, since a monitor is a mutual-exclusion lock (meaning only one thread can own a monitor at a time).
The ability to hold one resource and request another. In C#, this is akin to locking on one object and then locking on another before releasing the first lock, for example:
Copy Codelock(a)
{
lock(b)
{
...
}
}
No preemption capability. In C#, this means that one thread can't force another thread to release a lock.
A circular wait condition. This means that there is a cycle of threads, each of which is waiting for the next to release a resource before it can continue.
If any one of these conditions is not met, deadlock is not possible. The first condition is inherent to what a monitor is, so if you're using monitors, this one is set in stone. The second condition could be avoided by ensuring that you only ever lock one object at a time, but that's frequently not a feasible requirement in a large software project. The third condition could possibly be avoided in the Microsoft® .NET Framework by aborting or interrupting the thread holding the resource your thread requires, but a) that would require knowing which thread owned the resource, and b) that's an inherently dangerous operation (for the many reasons why, see msdn.microsoft.com/msdnmag/issues/05/10/Reliability). Thus, the way to avoid deadlocks is to avoid (or thwart) condition four.
In his article in the April 2006 issue (available at msdn.microsoft.com/msdnmag/issues/06/04/Deadlocks), Joe Duffy discusses several techniques for avoiding and detecting deadlocks, including one known as lock leveling. In lock leveling, locks are assigned numerical values, and threads must only acquire locks that have higher numbers than locks they have already acquired. This prevents the possibility of a cycle. It's also frequently difficult to do well in a typical software application today, and a failure to follow lock leveling on every lock acquisition invites deadlock.
Rather than prevent deadlocks wholesale, many systems attempt to detect them and then eliminate them once found. For example, SQL Server® can detect when deadlocks occur and abort one of the tasks involved in the cycle, thereby removing the deadlock. In his article, Joe builds a common language runtime (CLR) host that is capable of this form of deadlock detection in .NET applications that use monitors, a very cool feat. Unfortunately, using a custom CLR host isn't always practical for many .NET apps, including those that already have a custom host, like ASP.NET applications. As such, it would be beneficial to be able to utilize similar deadlock detection capabilities, without the need for a custom CLR host, such that these types of deadlocks could be detected at run time. This would be beneficial during development and testing phases, to identify coding errors where locks are being used incorrectly. It could also be used in production to detect and eliminate deadlocks as they occur (preventing a thread from causing one by blocking it from attempting the critical wait that would complete a cycle), however typical deadlock detection algorithms are costly and may not be appropriate in production systems for performance reasons. (I'll have a few comments on performance at the end of this column.)
To address this need, I've built a sample wrapper for the .NET System.Threading.Monitor class that includes deadlock detection capabilities. As with Monitor, my DdMonitor class provides Enter and Exit methods, and under the covers it delegates to the equivalent methods on Monitor. However, it also tracks monitor usage and throws exceptions when the attempted acquisition of a lock will complete a cycle, which would result in deadlock. For the rest of this column, I'll detail the implementation of this deadlock monitor class and provide additional information on its usage as well as on the advantages and disadvantages of its deployment.
The outline for the DdMonitor class is shown in Figure 1. From the start, note that DdMonitor does not completely mimic the public interface of System.Threading.Monitor. It provides the same public static TryEnter, Enter, and Exit methods, but it does not provide the Wait, Pulse, and PulseAll static public methods of its counterpart.
Figure 1 DdMonitor Class
Copy Codeclass DdMonitor
{
public static IDisposable Lock(object monitor) { ... }
public static void Enter(object monitor) { ... }
public static bool TryEnter(object monitor) { ... }
public static bool TryEnter(
object monitor, TimeSpan timeout) { ... }
public static bool TryEnter(
object monitor, int millisecondsTimeout) { ... }
public static void Exit(object monitor) { ... }
}
DdMonitor also provides a public static Lock method. Since the C# lock keyword provides a nice abstraction over Monitor, it's beneficial to provide a similar one for DdMonitor. Without the ability to modify the C# language, it's impossible for us to obtain identical syntax, but we can get close:
Copy Code// with Monitor
lock(obj) { ... }
// with DdMonitor
using(DdMonitor.Lock(obj)) { ... }
This syntactical feat is accomplished by implementing Lock, as shown in Figure 2. The Lock method delegates to DdMonitor.Enter to acquire the lock. However, it also instantiates a DdMonitorCookie object that implements IDisposable. This object's Dispose method will call DdMonitor.Exit to release the lock; this enables the developer to wrap DdMonitor.Lock in a C# using statement (Using in Visual Basic® or stack-allocation in C++/CLI) as shown previously.
Figure 2 Implementing the Lock Method
Copy Codepublic static IDisposable Lock(object monitor)
{
if (monitor == null) throw new ArgumentNullException("monitor");
IDisposable cookie = new DdMonitorCookie(monitor);
Enter(monitor);
return cookie;
}
private class DdMonitorCookie : IDisposable
{
private object _monitor;
public DdMonitorCookie(object obj) { _monitor = obj; }
public void Dispose()
{
if (_monitor != null)
{
DdMonitor.Exit(_monitor);
_monitor = null;
}
}
}
The Enter method and two of the three TryEnter methods are simple wrappers for the third TryEnter method, as shown in Figure 3. These implementations are meant to follow the same specification as implemented by Monitor's Enter and TryEnter method. Calling Enter will block until the lock is acquired (we differ slightly by design in that our Enter method will throw an exception if a deadlock is detected, rather than blocking forever), and thus delegates to TryEnter with a Timeout.Infinite timeout. Note that Timeout.In- finite equals -1, which is a special value that signals TryEnter to block until lock acquisition, rather than failing out after the specific time. In contrast, the overload of TryEnter that doesn't accept a time value defaults to using a timeout of 0, meaning that it will return false if the lock cannot be acquired immediately (again, our implementation will also throw an exception if acquiring the lock would cause a deadlock; this decision is a debatable point with TryEnter, however, so you may choose to modify the implementation accordingly). Note that TryEnter will return true if the lock was acquired or false otherwise. The second overload of TryEnter simply converts the supplied TimeSpan timeout value into a number of milliseconds, validates the argument, and delegates to the final TryEnter overload, which is where all the real work happens.
Figure 3 Implementing Enter
Copy Codepublic static void Enter(object monitor)
{
TryEnter(monitor, Timeout.Infinite);
}
public static bool TryEnter(object monitor)
{
return TryEnter(monitor, 0);
}
public static bool TryEnter(object monitor, TimeSpan timeout)
{
long totalMilliseconds = (long)timeout.TotalMilliseconds;
if (totalMilliseconds < -1
totalMilliseconds > Int32.MaxValue)
throw new ArgumentOutOfRangeException("timeout");
return TryEnter(monitor, (int)totalMilliseconds);
}
Our design is relatively straightforward, though it does involve a few interesting implementation details. The implementation begins by validating the supplied parameters, ensuring that an object was actually supplied on which to be locked and that a valid timeout was supplied (meaning either -1 or a non-zero number of milliseconds).
At this point, DdMonitor needs to access some shared state. Since there's no supported way for DdMonitor to interrogate the CLR for information about a monitor (including which thread may own it and which threads may be waiting on it), DdMonitor has to track this information manually. It does this by storing a static table of data structures that contain all of the relevant information about acquired monitors in the system, and every time an Enter or Exit operation is performed on a monitor, it updates this shared state.
Note that this has a few implications. First, while we'll see that DdMonitor does end up delegating to Monitor with the same monitor object that was supplied by the user, DdMonitor knows nothing about direct calls to Monitor's methods, and thus it's not able to update its internal state information based on any such calls that are made. This means that deadlock detection will not function properly if you mix usage of DdMonitor and Monitor (or the lock keyword) in your app, causing false negatives and possibly not preventing some deadlocks.
Another implication of this implementation is that this static table is AppDomain-specific. This shouldn't be a problem, since objects you lock on should also only live in one AppDomain. However, there are several special kinds of objects (such as Type and String) that are able to cross AppDomain boundaries, and if you lock on one of these domain-agile objects with DdMonitor from multiple domains, the static table maintained by DdMonitor in each domain will only contain a subset of the relevant information, again possibly leading to false negatives.
A third issue with this approach has to do with reliability. A lock statement in C# expands to a call to Monitor.Enter followed immediately by a try block, whose body is equivalent to the body of the lock statement and whose finally block releases the monitor. The CLR, through a bit of JIT hackery, ensures that the try block is entered if the call to Monitor.Enter succeeds; this ensures that the finally block will also be executed if Monitor.Enter succeeds, which is important in the face of asynchronous exceptions that could otherwise occur between the call to Monitor.Enter and entering the try block (for more information, see msdn.microsoft.com/msdnmag/issues/05/10/Reliability and www.bluebytesoftware.com/blog/2007/01/30/MonitorEnterThreadAbortsAndOrphanedLocks.aspx). As you'll see in our DdMonitor implementation, however, there is code that comes after the actual underlying call to Monitor.Enter and before the try block that comes after the call to DdMonitor.Enter; as such, these reliability guarantees are lost with our implementation.
You should keep all of these issues in mind in case any is dangerous to your scenarios.
Back to our implementation of TryEnter, which is shown in Figure 4; it continues by obtaining its internal lock for this shared state. Once acquired, it accesses the dictionary to find any previous data on the monitor being entered. If no such data exists, that means that no threads currently own or are waiting on the monitor. In such a case, TryEnter initializes a new MonitorState object to track this monitor and puts it back into the table.
Figure 4 Implementing TryEnter
Copy Codeprivate static Dictionary
No comments:
Post a Comment