PARLANSE Parallelism Primitives

PARLANSE is an parallel programming language designed to support computationally expensive symbolic applications requiring small to medium grain irregular parallelism (~100 to ~1,000 machine instructions) while providing robustness through efficient error handling. We call such small units of parallel computations grains; they are not OS processes or traditional threads, although they are implemented with those mechanisms. PARLANSE uses Lisp syntax, but is not Lisp.

How does one specify parallelism to PARLANSE?

Identifying parallelism automatically hard. Programmers are smart, and are supposed to know what they are doing. We insist that the programmer find the parallelism (she often knows where it is anyway) and simply tell the PARLANSE compiler about it. (Someday a DMS-based PARLANSE compiler will analyze explicitly specified parallelism for data races, and will find additional parallelism over supposedly sequential constructs).

To encourage the programmers to do this, we made specifying parellelism really easy.

Initial Grain

All PARLANSE programs start by executing main, with a single grain. That grain may invoke others, which may transitively in turn invoke more, based on algorithms or data structures whose execution/access can be parallelized.

Functional Parallelism

All arguments of an action (or function call)

(f A B C ... Z)

are evaluated, by definition, in parallel; this includes built-in functions for scalar operations such as add, subtract, etc. There is nothing extra to write to obtain this parallelism beyond the usual Lisp-style syntax. If some argument must be evaluated before others, the programmer can specify a partial order (see below) over the argument evaluation to ensure safe evaluation.

Forced Serialization

Sometimes in a parallel program one wants a set of actions

A1, ..., Ak, ... An

to be carried out sequentially. This could be accomplished with a very constrained partial order. We provide a convenience notation

(;; A1 ... Ak ... An )

to specify this. The phrase (;; ... ) is pronounced "sequential". This notation is intentionally a little bit inconvenient, to encourage the programmer to think about the order of the actions and choose a parallelism operator instead.

Pure (potential) Parallelism

If there are several (side-effecting) actions (potential grains)

A1, ..., Ak, ... An

which are safely parallel (including appropriate locked interactions with other grains), a PARLANSE programmer succinctly writes just

(|| A1 ... Ak ... An )

to specify this. (||...) is pronounced "parallel".

Partial Order Parallelism

If there is a partial order (some computations A2 must precede others e.g., A3 and A4), the programmer can label the actions and annotate the labels with required time << or >> ordering over action execution. An example:

(|; a1 (<< a2) A1
    a2 A2         
    a3 (<< a4) A3
    a4 (>> a1) A4
)|;

The phrase (|;...) is pronounced "partial order"; note the mixture of | (parallel) and ; (sequential) syntax.

z (<< x) Z

means "action Z labelled z happens earlier in time than the action labelled X". This example specifies the smallest parallel construct which is not pure-parallel.

Dynamically generated parallelism in Teams

If there are a set of dynamically generated parallel computations organized to achieve a specific purpose (perhaps interacting through locks) Ai, one can group them into a team T, representing that purpose. One can then add new actions as grains to the team by

(draft T Ai)

Grains in the team run when processors are available, and may themselves add additional grains to the team. A grain can even start another team. A parent grain can wait for the team to complete (all team member grains finished), or it can abort the team, allowing speculative computation.

Traditional dynamic parallelism, aka "spawn"

If there is no useful hierarchy, PARLANSE programs can simply

(spawn A)

additional parallel grains to execute a supplied function with supplied specific arguments. One can wait for spawned grains to complete, or such grains can be marked (doom) for auto-destruction on completion so no wait is required.

Events ("futures")

An event may be declared to hold a future computed result. Such an event may be used as in an expression; the accessing grain will be frozen until the event value appears, and then will continue with the event value. The overhead for this access when the event is completed is a only a few machine instructions. An event may be assigned; that stores the event value, and releases all waiting grains.

Locks and critical sections

Locks are built into the language, and can operate as locks or as resource allocators. Critical sections controlled by an implicit lock can be declared to protect blocks of code from being executed by two grains simultaneously.

Cactus Stack

PARLANSE routines have scoped access to lexically containing parent scopes. To allow arbitrary forking and nesting, the stack for a grain is actually in an individual activation record per function allocated from a heap. Since a function may fork multiple grains, and each grain also gets its own stack, the result is a cactus stack, with more lexically deeply nested code running a grain pointing upward to lexically enclosing parent code blocks. Heap allocation of activation records also ensures that one can recurse arbitrarily deeply; this help when processing huge data structures (trees and graphs).

Scheduling of grains

PARLANSE does all the work of mapping the conceptually parallel grains onto physical CPUs, and even provides work-stealing to distribute the computational load across those CPUs. The PARLANSE compiler looks at the apparant computational size of identified parallel actions, and generates corrsponding grains if it believes the work/overhead is reasonable; otherwise it generates sequential code. So you can write two trivial assignments in parallel and the execution times will still be excellent; the identified parallelism will be optimized away.

PARLANSE throttles a work-forking process when there is already lots of available, ready-to-run grains. This way the machine doesn't get overwhelmed trying to manage a billion tiny grains.

Execution of the program continues until all ready-to-run grains are exhausted. This allows the main program to spawn helper grains; the helpers can run to completion safely even after the main program grain terminates.

Examples

Here you can see some simple PARLANSE Examples.