How PARLANSE delivers symmetric multiprocessing parallelism for DMS

PARLANSE is an inherently parallel LISP-like programming language for multicore workstations, designed to support computationally expensive symbolic applications requiring medium grain irregular parallelism (~100 to ~1,000 machine instructions) while providing robustness through efficient error handling.

DMS needs computational power for large-scale irregular inferences

Our DMS Software Reengineering Tookit is intended to provide compiler-like parsing, analysis capabilities and code transformation capabilities, to a wide variety of computer languages, for large source code bases in multiple languages with millions of lines of code. Scale of the artifacts drove us to insist that DMS have parallel foundations to help us get answers sooner.

Availability of symmetric multiprocessing in workstations suggested using SMP parallelism to achieve this effect. PARLANSE was designed explicitly to support computations of the type that DMS does at massive scale, and DMS is implemented as a set of DSLs on top of PARLANSE.

PARLANSE provides efficient execution for:

irregularly (even very small) sized grains of parallel computation
static parallel parallelism
static partial order parallelism
dynamic teams-of-grains parallelism
event-driven parallelism

We chose these features because we believe that program analysis tasks are inherently irregular in size. This is because

the code chunks being analyzed are irregularly sized
symbolic analysis algorithms collect information by navigating many complex data structures
the amount of inference per code chunk analysis is consequently irregular.

A data parallel model simply does not work for these types of computation.

While DMS itself might be an ideal tool to analyze its own code to find parallelism, in general this is an extremely hard task to do reliably. We chose to avoid this problem by insisting that the programmer find the parallelism (she often knows where it is anyway) and simply tell the PARLANSE compiler about it. (In the long run, DMS will be able to help out).

We code algorithms in parallel where it is obvious to do so, and sometimes where it not so obvious.

Pure Data Parallelism

Parallel Parsing and Reporting

Some DMS activities can be treated a data parallel and are thus easy to parallelize. In particular, DMS is often asked to parse a large set of files, or to generate a (for example, an HTML) report per file that it has processed. This is easy to parallelize using a PARLANSE dynamic team by simply forking a new grain for each file. For these specific activities, we don't get embarrassing speedups because the disk may act as a bottleneck, but we do get "free" overlapped disk I/O and computation. We remark the duration of parsing and disk I/O makes for highly dynamic runtimes for these grains, so reordering their execution to take advantage of available processors is effective.

Parallelism available by processing separate AST fragments for CloneDR

A more interesting data parallel opportunity is in a special analysis built into our duplicate code detection tool, CloneDR This finds small seeds for duplicate code by using hashing, and then "grows" the seeds into maximally sized duplicated code chunks. The seed-growing step is completely parallel. The number of such activities is completely dynamic because the cloniness and clone sizes of a code base is completely dependent on the code base. The activities themselves are of radically uneven duration, and may themselves propose new seeds. PARLANSE's dynamic teams are also perfect for this.

Pipeline Parallelism

It is easy to build a parallel pipeline using a ring buffer and a resource lock, between the the language lexers that DMS uses, and the parsing engine provided by DMS. This parallelism can interleave with the team parsing parallelism.

Harnessing obvious parallel algorithm opportunities

There are many opportunities to go parallel with divide and conquer algorithms. Our sorting libraries are divide and conquer; these are really easy to implement in parallel by simply forking additional work at the divide point. This manufactures a ton of parallelism because divide and conquer is recursive and thus you get powers-of-two-to-depth-N parallelism opportunities.

Parallelism in Divide and Conquer on ASTs

It is common for divide and parallel conquer opportunities to happen when processing ASTs. The CloneDR initial seed-finding process works by parallel divide-and-conquer on the set of ASTs when computing initial hash sets over subtrees.

Divide and conquer parallelism for Attribute Grammar Evaluation

A very interesting example is in DMS's attribute grammar (AGE) evaluators. These are a framework for building analysis tools that process ASTs by traversing them, collecting information during the traversal, propagating information both up and down the tree and and combining it in various ways. Attribute grammars are in fact extremely powerful ways to compute analyses over trees.

DMS's AGEs are biased toward functionally-computed attributes, which means they are generally parallelizable, but also allow optional side effects; it is pretty useful to traverse and AST and build a symbol table by such side effects. Each grammar rule allows one to define how to compose values from inherited and synthesized attributes in a mostly functional way. DMS's AGE generator (DMS is a set of metaprogramming tools) analyzes the information flows across all the computations for each grammar rule and generates a native PARLANSE parallel partial order construct. This partial order forces the computation to honor the necessary ordering of computation of synthesized attributes before use, and to force side-effecting operations to occur in the right place with respect to updates and consumers of the side-effected result. The consequence: one gets divide-and-conquer parallelism in evaluating each AST over the complex partial orders. The parallelism this derives is pretty hard for a human to understand, but scale of the AST means we win very nicely.

Parallelism in Symbol Table Construction

A third interesting example involves PARLANSE futures (we call them events having named them that way before the term futures became popular). These are very handy when complex side effects are involved yet there is clearly available parallelism. DMS's Java front use these to make parsing and name-resolving of large systems of Java source code efficient.

We start with an identified set of Java source files; these are parsed using a parallel dynamic team as outlined above. As a file completes parsing, we invoke an attribute grammar evaluator on the tree to build a symbol table (which gets us partial order divide and conquer parallelism as outlined above, running in parallel with the parsing). There is a natural opportunity to go parallel in the symbol table construction process: one can form scopes independently, and fill them in later in parallel.

In our name resolver, we make scopes and symbol table entries into futures; this means a computation grain part of the analysis trying follow a qualified path can simply walk down the symbol table/entry chains looking for the leaf of the qualified path. If any element along that path hasn't yet been computed, that grain is put to sleep on that future, and wakes when the symbol scope exists and/or the entry of interest exists. Often, in doing name resolution on a Java source file one encounters imports. Oversimplifying, this causes the need for the imported Java files (source or class depending on what is available in the file system) to be parsed and name resolved. This is handled by tossing these files back into the parsing/name resolving team and waiting for the future on the outer scope of the import to be ready. So the parallelism involved here is stunning: dynamic teams, partial orders, futures, all dynamically scheduled and distributed across the available cores.

Parallelism in Control and Dataflow Analysis

Without going into detail, you can run attribute grammars to build control flow graphs, and you can then run parallel, iteraive data-flow analysis algorithms guided by the control flow. We look for parallelism where we can find it.

Parallelism in general inference

We expect to build many other inference procedures into DMS, and we believe we can make many of them go parallel. While not yet available, we expect to implement a parallel Datalog for DMS to allow the posing of sophiticated queries across large bodies of code. We think this will integrate nicely with the types of parallelism available from PARLANSE.

Parallelism in Program Transformation

DMS is not only intended for program analysis, but also for actual code modification. Often one can identify opportunities to apply program transformations at many independent places in large code base. An obvious example is application of optimizing transforms of a per-procedure basis; this is data paralell and thus easily implemented.

In our experience using DMS for code modification tasks, one often alternates analysis with modification. All the parallelism capability outline above should generally compose.

We're pretty happy with how PARLANSE has delivered parallel capability for DMS. We don't use enough of it (we're struggling with key bottlenecks in parallel points-to analysis) but at least we are in a position to struggle with it and presumably eventually win. Tools without the parallelism capability can't even try the experiment.

You can't add Parallelism to a big, complex application after the fact

What is often rediscovered is that it is extremely hard to make a sequentially-designed application go parallel after the fact. This gets even harder when the application is complex, and harder yet when the parallelism is extremely dynamic and computation units are both irregular in size and often pretty small. If it isn't parallel from the start, it is extremely tough to go there later. Program analysis frameworks are big complicated beasts by the time they reach maturity (just think about parsing C++). It is our expectation that other program analysis infrastructures that already exist, won't ever go parallel in practical ways (with the possible exception of large data parallel chunks, such as processing compilation units).

Functional Languages aren't obviously the Answer

We often hear that functional languages make this easy. In particular, the Haskell folks will claim that being purely functional with lazy evaluation means they can go parallel easily. We have seen parallelism in Haskell; so far, it doesn't look all that pretty, we are unconvinced they can reproduce the richness of what we have done with DMS, at scale, with what they have so far. Gentlemen, place your bets.

Summary: DMS will continue to grow in parallel ways

What being parallel from the beginning means for DMS, is that is has lot of parallel machinery now, and a style consistent with being parallel. This means we can continue to add parallel machinery as we encounter the opportunities without fear of breaking anything already running. We see a future bright with many available CPUs per programmer.