Buy me a cup of coffee 

Software development is powered by coffee... if you use any of our open source pojects, it would make my day if you'd buy me a cup (or several). Just click on the coffee cup.

Why Thread Locking Is Worthless in an ASCOM Driver

Tags: ASCOM, development, developers, software, software-engineering, reactive-extensions, rx, open-source, free software, patterns

In the ASCOM developer help for the serial port component, it says:

The key question to ask yourself is what will happen if two or more of your properties or methods are called at the same time? For each of these you should be clear about whether it can function independently or whether it has dependencies. In many cases the communication channel to the hardware that you are driving will be such a dependency and this will definitely be the case if you use a serial port as these can generally do just one thing at a time!

The ASCOM Serial component has a measure of protection built in which prevents multiple simultaneous use of the serial port by multi-threaded applications when using its transmit and receive mehtods. This however is unlikely to be sufficient for your needs as in many cases you will need to transmit something to your device and then wait for its reply while blocking any attempts by other threads to interrupt your "transmit - receive" transaction.

In the .NET world there are a variety of mechanics including Mutexes, Monitors and Semaphores each with its own pros and cons. Mutexes are a good place to start if you are unsure. In other languages you will need to check out what synchronisation features are provided.

It then goes on to provide an example of how to use a Mutex to protect the serial port from simulataneous overlapping access. However, there is a critical flaw in all of this: the Single Threaded Apartment (STA).

A COM Single Threaded Apartment is a threading model that only ever has one single thread (the STA Thread). Everything happens within just a single thread. Think about that for a moment: everything in a single thread. What does a thread locking primitive protect you from? Other threads! Since an STA only ever has one thread, well you can already see the root of the problem. A thread lock cannot be used to protect an STA thread from itself. Since many ASCOM drivers either explicitly or accidentally run in STA threads, then the conclusion is that thread locking primitives are worthless in an ASCOM driver. They only lead you into a false sense of security and the reentrancy bugs you will encounter as a resut are confusing and difficult to debug.

So why would an ASCOM driver use an STA thread? Well that depends on the type of driver. For a DLL based in-process driver, you never know what your driver will get loaded into. You might be lucky and get loaded into an MTA thread, but don't count on it. For an executable LocalServer, that has a GUI which by definition always runs in the STA thread (The GUI thread). Look at the Main() method in the LocalServer.cs file:

    // ==================
    // ==================
    private static void Main(string[] args)

See, right there, it is explicitly declared as running in an [STAThread]. So a LocalServer driver will run in an STA thread unless some action is taken to prevent this.

Helper Countermeasures

The ASCOM serial helper takes one step towards addressing this issue by using a variant of the Actor Pattern. All requests are handed off to a worker thread which runs in a Multi-threaded Apartment (MTA) and uses thread locking to ensure strict serialization. However, this is at best a partial solution because it has no concept of a transaction. There's no way to do a send and receive in one single operation. You can send, and you can receive, but not both at the same time. So what the serial helper does is serialize all the sends, and all the receives. But it does nothing to prevent a second send from happening before the first receive has happened. To be fair, it can't do that because that would require advance knowledge of the device protocol being used. The documentation recognizes that this protection is insufficient:

This however is unlikely to be sufficient for your needs as in many cases you will need to transmit something to your device and then wait for its reply while blocking any attempts by other threads to interrupt your "transmit - receive" transaction.

The documentation implicitly recognizes the need for transactions but the helper does not provide them. Instead, the developer is encouraged to use a Mutex around the send-receive operation. And here, the trap is sprung. If your driver is running in a Single Threaded Apartment, the Mutex will not protect you!

Developers who follow the advice in the documentation will fall foul of reentrancy. To understand the failure mode, a diversion is needed into how asynchronous code is possible within an STA thread.

Asynchronicity vs. Parallelism

Many developers confuse these two terms. Asynchronous code is simply code that is queued up for execution and will provide a result at some future time without any further intervention. The key here is that all calls into an STA thread from another thread or process are asynchronous but not parallel. They will sit in a queue waiting to get control of the single thread when it becomes idle and the message loop pumps messages.

Let's consider what happens when a multi-threaded application (e.g. Sequence Generator Pro) calls into your driver, which is a LocalServer running in an STA thread.

No application is allowed to access any other process directly. Therefore, the client application cannot directly call your driver. When the driver is loaded, the LocalServer process starts, spins up the STA thread and loads your driver into it. Then the client application receives a COM proxy object that looks identical to your driver, but is in fact a fake object provided by the COM runtime. This proxy object uses Inter-Process Communications (IPC) to "post" operations to your driver's process. The COM runtime receives this message and marshalls it to the STA thread by adding it to the Windows Message Queue for the thread, where it will stay until your driver's process is ready to handle it. Messages are "pumped" (taken off the queue and processed) in several situations:

  • When your driver process is idle, it is actually running something called the Application Message Loop which continually pumps messages.
  • Perhaps surprisingly: When your STA thread is blocked, for example by a Mutex. Most of the managed thread synchronization primitives actually pump messages!

The second situation above is perhaps unexpected, but there is a Prime Directive of STA Threads: Thou shalt not block the STA Thread, ever. That sometimes is heard in another form, "Do not block the UI thread" (which is really the same thing, because teh UI thread is always the STA thread). Many ASCOM drivers blatantly ignore this directive, but Microsoft is so serious about it that they actually try hard to prevent you from ever really blocking it! So blocking constructs such as Mutex actually pump messages while they supposedly blocked.

So now we know everything needed to understand why a Mutex (or any other thread synchronization primitive) will not protect your driver from multi-threaded applications.

When is a Lock Not a Lock?

Answer: when it's on the same thread. The STA thread. Let's consider what can happen when a multi-threaded client application calls into a LocalServer driver running in an STA thread.

  1. Application calls driver proxy from Thread A.
  2. Thread A blocks awaiting response.
  3. COM proxy sends an IPC request to driver process
  4. COM runtime marshals the request to the STA thread by posting it onto the message queue.
  5. Some time later, driver processes the message and dispatches the call to one of your driver's methods.
  6. Driver opens a Mutex (which is immediately granted because themutex is free) and sends a command to the serial port.
  7. Driver calls ReceiveTerminated() and "blocks" waiting for response.
  8. Application calls driver proxy from Thread B.
  9. Thread B blocks (repeat steps 3, 4, 5)
  10. Driver opens a Mutex which is on the same STA thread as in step 6, and is therefore immediately granted as a recursive lock and sends a command to the serial port.
  11. Driver calls ReceiveTerminated() and "blocks" awaiting a response.
  12. At this point, we have two pending commands that have been sent, but no responses have been received.
  13. Response arrives at the serial port. But which response was it? Does it belong to the first command or the second? Will the second response even arrive? Roll the dice...

Introducing the Reactive Communications for ASCOM Library

There are solutions to this highly technical and nontrivial problem, but the answers are in the domain of the driver, not within ASCOM. ASCOM has no device specific knowledge so cannot solve this in a way that will work for everyone. So the ASCOM documentation is correct when it places the onus on the driver developer to solve this problem, but the suggested approach will not work reliably, for the reasons presented above.

What is needed is the concept of a Transaction - a linked command and response that should be indivisible, whether by another thread or by reentrancy in an STA thread. Implementing this correctly will require proficiency in multi-threaded programming, which is hard. Damn hard. A few quotes from the trenches:

The main drawback with [multi-threading] is that humans can't reliably write free-threaded code. [Chris Brumme, MSDN Blogs]

Anybody who says "I can write correct multi threaded code" probably should be saying "I don't test my multi-threaded code". It is very difficult to write correct multi-threaded code. [Mike Stall, MSFT]

Why is multi-threading so damn hard? [Various, Quora]

The chances are very high that even an experienced developer will not get this right. There are so many subtleties and corner cases, that it takes a very special kind of mind to write correct multi-threaded code. If you think you have what it takes, then maybe look up the Dunning Kruger effect before you start!

I did not trust myself to get it right. The solution I went for was to delegate the difficult multi-threading and synchronization code to some very smart developers with a lot more experience and resources than me. That's why we built our Reactive Communications Library for ASCOM on top of the Reactive Extensions for .NET.

The Reactive Extensions (Rx) is a library for composing asynchronous and event-based programs using observable sequences and LINQ-style query operators. Using Rx, developers represent asynchronous data streams with Observables, query asynchronous data streams using LINQ operators, and parameterize the concurrency in the asynchronous data streams using Schedulers. Simply put, Rx = Observables + LINQ + Schedulers.

Critically, Rx means you can use a declarative approach to asynchronicity and parallelism - i.e. you tell the system what you want, not how to do it. Then you leave the hard stuff to Rx.

Rx provides the sequencing and thread-safety gaurantees I needed but which I didn't trust myself to write correctly and exhaustively test. To that, we have added the following:

  • The DeviceTransaction class which models command/response type protocols and serves as the base class for custom transactions that the driver developer will write.
  • An abstraction called ICommunicationsChannel which relieves the driver developer of having to ever interact directly with a serial port. Instead, the developer just deals with sequences of data and it doesn't matter where the data came from.
  • A TransactionObserver class, which observes a sequence of transactions and processes them in strict order of submission. This is the beating heart of the Rx Comms library and is where all the sequencing and thread safety gaurantees happen.
  • A set of helpers, glue logic and utilities to assist in using the above and creating your own transaction types, plus a few ready-made transaction types to serve as examples.

We have used the Rx Comms library in more than 8 commercial ASCOM drivers with great success. We've been able to work with both Command-Response type protocols, highly asynchronous protocols and hybrids that are mostly Command-Response but with the occasional unsolicited "event" message. We have never had a single threading issue, thanks to the rock solid Reactive Extensions for .NET.

The Reactive Communications for ASCOM library requires a slightly different approach to handling your communications channel, but it is not difficult. I have recorded a quick-start tutorial video to show how easy it is to get started and demonstrate some of the concepts.

Further Reading

  • Understanding COM Apartments, Part 1, Part 2

  • When is a Lock Not a Lock? [Tigra Astronomy]

  • Introduction to Reactive Communications for ASCOM [Blog: The Well Travelled Photon]

  • Why I Created the ASCOM Reactive Communications Library [ASCOM-Talk Yahoo group] (A discussion thread setting out some of the reasons why I thought this library was needed and including a log capture showing the reentrancy problem happening in a production driver)

No Comments

Add a Comment

TeamCity Build Status

Let Us Help You

Find us on Facebook