Inevitable Choices

By Douglas Merrill

The way software addresses concurrency fundamentally shapes its architecture.

A close-up of the point of a standard pencil with a gray tip.

“Concurrency is everywhere in modern programming, whether we like it or not” reads an MIT introduction to the subject. In a typical computing environment, machines and software will be contending with concurrency on numerous levels: from multiple processors on the same chip through multiple pieces of software running at a single workstation up to multiple — sometimes multitudes — computers running on one network.

Developers use techniques such as separate processes, threads, time-slicing, shared memory, message-passing and more to avoid problems such as race conditions, where the outcome of a process is not always the same, depending on uncertain order in which computations are executed.

For a Statistical Computing Environment in a carefully regulated sector like pharma development, keeping track of concurrent operations is crucial. Regulatory submissions, and ultimately the safety of new medications, will depend on the integrity of the data being submitted. That, in turn, depends on how well the software can handle demands from many different users.

“Concurrency is all over the place, no matter how you look at it,” says Alexander Lüders, a senior software engineer at entimo. “You can work with a single shared repository, or you can check out sections of data into a sandbox, but the issue remains because you eventually have to merge the checkout back to the repository.“

As a result, entimo has chosen to work directly with the questions of concurrency. “A sandbox still has the problem of how to resolve conflicts,” adds Marc Jantke, one of entimo’s board members. “You can have situations where more than one user wants to check something back into the master branch, and what do you do then? You still have to work through that question.”

A group of about 20 pencils, each of a different color, form a circle with their tips and the bodies of the pencils running outward to the edges of the photo.

Jantke points to other challenges for systems that involve check-outs and redundant copies. First, making and keeping all of them consumes resources and time, especially in environments where validation may require retaining an entire audit trail of data versions. Further, the data volumes for clinical studies may be large enough that a check-out approach in the computing environment could slow down performance. If an organization had infinite resources, that would not be an issue, but every company faces some kinds of limits, and a sandbox/check-out architecture may reach them more quickly than an approach that works with concurrency directly. Second, the computing required to accommodate check-outs can become a performance question, particularly given the size and relationships in data volumes in modern pharmaceutical studies. Again, this is less of a theoretical problem and more of a practical one, given that even the largest development programs have to work within budgets. “Those are non-trivial challenges,” says Jantke.

“A check-out approach faces the problem of making a consistent copy. That can lead to a lot of blockages for other users or would force us to have everything under version control,” adds Lüders. On a large study or in an analysis involving many users, that could slow such a system significantly.

In short, you can’t get around concurrency, so you might as well deal with it directly. That’s the business case entimo has made, and it has proven to work as our customers are able to collaborate efficiently even with hundreds of users working concurrently on a shared repository.