DMS PrettyPrinters (aka "Trees to Text")
The DMS Software Reengineering Toolkit is designed to allow the "domain" (language) engineer specify those languages quickly and accurately, so that she may spend most of her attention on the actual program analysis or transformation of interest.
At DMS Domains, we give some background on the necessary elements to build a DMS language processing domain. We have already described how to easily specify a grammar to DMS, automatically acquiring a parser and syntax tree builder. Other parts of DMS allow the engineer to transform the tree... at which point there is a need to regenerate (prettyprint) valid, compilable source text from that modified tree.
On this page, we show how prettyprinting is specified and executed by DMS.
We use Nicholas Wirth's Oberon language as an example, as it is a real, practical language yet simple enough so the entire parser and prettyprinter definitions can be easily be displayed and understood here.
PrettyPrinting Concepts [1]
The problem of converting an AST back into source code is generally called "prettyprinting". There are two subtle variations: regenerating the text matching the original as much as possible (fidelity printing), and explicitly-controlled(nice prettyprinting), which generates nicely formatted text. How text is printed depends on whether the original programmres will be working on the regenerated code (they often want fidelity printing) or not (at which point any nice prettyprinting is good for the next person who must read the code).
To do prettyprinting well requires usually more information than a classic parser collects. Parsers that collect enough information to do this well (as DMS does) are re-engineering parsers; virtually none of the classic parser generators collect or store the necessary information for good prettyprintig.
Tree Visitor Paradigm
The fundamental way prettyprinting is accomplished when there is an AST, is by walking the AST (classically called Visitor pattern) and generating text based on the AST node content begin visited. The basic approach node visitor is: call children nodes left-to-right (assuming that's the order of the original text) to generate the text they represent, interspersing additional text as appropriate for this AST node type. To prettyprint a curly-braced block of statements requires the following PrettyPrintBlock visitor, and a visitor for the statements that are part of the block:
PrettyPrintBlock(Node): Print("{"}; PrintNewline(); Call PrettyPrint(Node.children[1]); // prints out statements in block Print("}"); PrintNewline(); return; PrettyPrintStatements(Node): do i=1,number_of_children // print one statement: Call PrettyPrint(Node.children[i]); Print(";"); PrintNewline(); endo return;
Note that this spits out text on the as the tree is visited in-order.
Printing terminals
There's a number of details that need to be managed. A key problem is that for any artifact to be printed, there may be multiple valid ways to print it. The prettyprinter needs to choose the best way, balancing readability and preservation of formats. For end-users to accept prettyprinted results, the tool may have to preserve some properties of the source text that one wouldn't normally think to store. DMS does this by capturing (in ways defined by the domain engineer in the lexer), format (or shape) properties of the original lexeme, such as character case, number radix, etc. and preserving this formatting information in the constructed (or modified!) AST; this formatting information is then used to shape the prettyprinted result. (DMS can resort to the original text in the original file if necessary, but that's very expensive). Here are some specific issues:
- Keywords: Sometimes keywords have multiple spellings or abbreviations, or allow different alphabetic case. One may be able to choose a canonical spelling/case, or, depending on the audience, one may have to preserve information to enable regeneration of the original keyword.
- Literal values: For AST nodes carrying literal values, one has to regenerate the
literal value. This is harder than it looks if the answer is to be
accurate.
- Numbers: Printing floating point numbers without losing any precision is difficult, and scientists hate it when the value of Pi gets damaged even in the last digit. Often one must regenerate the radix of the number; coders having entered a number as a hex literal are not happy when you regenerate the decimal equivalent even though it means exactly the same thing. Even leading digits and the presence of an a egregious + may be necessary to preserve.
- Strings: For string literals, one must often regenerate the original quotes and the string literal content, including escape sequences for characters that have to be escaped. (For PHP, which quote was used, and determines which characters in the string literal has to be escaped; PHP strings have lot of other interesting formatting quirks because they are really best treated as implicit string concatenation expressions).
- Identifiers: Modern languages such as Java and C# allow arbitrary Unicode identifiers. When prettyprinting such identifiers to non-Unicode encodings, they must be properly escaped. Case differences in variable names may matter or not, depending on the type of the variable PHP allows funny characters in different type of identifiers (e.g., "$") but it isn't always there (see $ variables embedded in literal strings).
- Spacing: some languages require whitespace in critical places. The two tokens ABC17 42 better not be printed as ABC1742, but it is generally ok for the tokens ABC17 ( 42 ) to be printed as ABC17(42). One way to solve this problem is to put a space wherever it is legal, but recipients of such code often don't like the result: too much whitespace. This is not an issue if all one is going to do with the code is compile it.
- Newlines: languages that allow arbitrary whitespace can technically be regenerated as a single line of text. People hate this, even if you are going to compile the result; sometimes you have to look at the generated code and this makes it impossible. So you need a way to introduce newlines for AST nodes representing major language elements (statements, blocks, methods, classes, etc.). This isn't usually hard; when visiting a node representing such a construct, print out the construct and append a newline.
- Comments: Most standard parsers throw comments away completely. This is unacceptabel for a prettyprinted answer of transformed code. DMS captures comments in the AST, so that AST transformations can inspect/generate comments too.
- Fidelity Data: Often people want the original layout formatting of code where possible. To do this the tool must store column-number information for concrete tokens, and have prettyprinting rules about when to use that column-number data to position prettyprinted text where in the same column when possible, and what to do if the so-far-prettyprinted line is filled past that column.
A more principled approach distinguishes prettyprinting whose purpose is nice formatting, from fidelity printing whose purpose is to regenerate the text to match the original source to a maximal extent. It should be clear that at the level of the terminals, you pretty much want fidelity printing. Depending on your purpose, you can pretty print with nice formatting, or fidelity printing. The strategy DMS uses as default is to do fidelity printing where the AST hasn't been changed, and prettyprinting where it has (because often the change machinery doesn't have any information about column numbers or number radixes, etc.). The transformations stamp the AST nodes that are newly generated as "no fidelity data present".
Printing Code Blocks as Boxes
An organized approach to prettyprinting nicely is to understand that virtually all text-based programming language are rendered nicely in terms of rectangular blocks of text. (Knuth's TeX document generator has this idea, too). If you have some set of text boxes representing pieces of the regenerated code (e.g., primitive boxes generated directly for the terminal tokens), you can then imagine operators for composing those boxes: Horizontal composition (stack one box to the right of another), Vertical (stack boxes on top of each other; this in effect replaces printing newlines), Indent (Horizontal composition with a box of blanks), etc. Then you can construct your prettyprinter by building and composing text boxes:
PrettyPrintBlock: Box1=PrimitiveBox("{"); Box2=PrimitiveBox("}"); ChildBox=PrettyPrint(Node.children[1]); // gets box for statements in block ResultBox=VerticalBox(Box1,Indent(3,ChildBox),Box2); return ResultBox; PrettyPrintStatements: ResultBox=EmptyBox(); do i=1,number_of_children ResultBox= VerticalBox(ResultBox, HorizontalBox(PrettyPrint(Node.children[i])),PrimitiveBox(";")); enddo return;
The real value in this is any node can compose the text boxes produced by its children in arbitrary order with arbitrary intervening text. You can rearrange huge blocks of text this way (imagine VBox'ing the methods of class in method-name order). No text is spit out as encountered; only when the root is reached, or some AST node where it is known that all the children boxes have been generated correctly.
The DMS PrettyPrinter Box Language
Our DMS Software Reengineering Toolkit uses this approach to prettyprint all the languages it can parse (including PHP, Java, C#, etc.). Instead of attaching the box computations to AST nodes via visitors, we use box computations in a domain-specific text-box notation:
- H(...) for Horizontal boxes (with H* meaning "applied to all RHS elements")
- V(....) for vertical boxes (with V* meaning "applied to all RHS elements")
- I(...) for indented boxes)
block = '{' statements '}' ; -- grammar rule to recognize block of statements <<PrettyPrinter>>: { V('{',I(statements),'}'); };
You can see a bigger example of how this is done at Simple Language Example using DMS showing how grammar rules and prettyprinting rules are combined. The PHP Front looks like this but its a lot bigger, obviously. A more complex way to do prettyprinting is to build a syntax-directed translator (means, walk the tree and build text or other data structures in tree-visted order) to produce text-boxes in a special text-box AST. The text-box AST is then prettyprinted by another tree walk, but the actions for it are basically trivial: print the text boxes. See this technical paper: Pretty-printing for software reengineering
An additional point: you can of course go build all this machinery yourself. But the same reason that you choose to use a parser generator (its a lot of work to make one, and that work doesn't contribute to your goal in an interesting way) is the same reason you want to choose an off-the-shelf prettyprinter generator. There are lots of parser generators around. Not many prettyprinter generators. [DMS is one of the few that has both built in.]
A Grammar for Wirth's Oberon
with PrettyPrinter declarations
Here is the complete grammar for the Oberon language, covering the original design by Wirth and the 2007 dialect. The standard DMS approach is to builds parser and prettyprinter at the same time, weaving the parsing and prettying rules. The grammar show previously left these prettyprinting rules out.
Explanation of the prettyprinter aspects can be found below.
-- Oberon.atg: DMS Grammar Definition for Oberon -- Copyright (C) 2014-2015 Semantic Designs, Inc.; All Rights Reserved -- PRETTYPRINTER { -- Do not put spaces between these sets of symbols while emitting the refactored code. separator { string, ident, integer, hexint, real, 'NIL' } none { '(', ')' }; optional separator { '(', ')' } none { string, ident, integer, hexint, real, 'NIL' }; separator * none { ',', '.', ';' }; separator { '.' } none *; #IF Oberon separator { charconst } none { '(', ')' }; separator { '(', ')' } none { charconst }; #ELSIF Oberon07 separator { 'TRUE', 'FALSE' } none { '(', ')' }; separator { '(', ')' } none { 'TRUE', 'FALSE' }; #ELSE #FAIL #ENDIF -- Hex integers require custom formatting. lexeme hexint function "FormatOberonHexNatural"; } ------------------------------------------------------------------------------------------ ---(1)------------ Module ------------------------------------------------------------------------------------------ module = 'MODULE' ident ';' DeclarationSequence 'END' ident '.' ; <<PrettyPrinter>>: { V(H('MODULE', ident, ';'), DeclarationSequence, H('END', ident, '.')); } module = 'MODULE' ident ';' DeclarationSequence ProgramBody ident '.' ; <<PrettyPrinter>>: { V(H('MODULE', ident, ';'), DeclarationSequence, H(ProgramBody, ident, '.')); } module = 'MODULE' ident ';' 'IMPORT' ImportList ';' DeclarationSequence 'END' ident '.' ; <<PrettyPrinter>>: { V(H('MODULE', ident, ';'), H('IMPORT', ImportList, ';'), DeclarationSequence, H('END', ident, '.')); } module = 'MODULE' ident ';' 'IMPORT' ImportList ';' DeclarationSequence ProgramBody ident '.' ; <<PrettyPrinter>>: { V(H('MODULE', ident, ';'), H('IMPORT', ImportList, ';'), DeclarationSequence, H(ProgramBody, ident, '.')); } ProgramBody = 'BEGIN' StatementSequence 'END' ; <<PrettyPrinter>>: { V(NL<1>, 'BEGIN', I(StatementSequence), 'END'); } ImportList = import ; ImportList = ImportList ',' import ; <<PrettyPrinter>>: { H*; } import = ident ; import = ident ':=' ident ; <<PrettyPrinter>>: { H*; } #IF Oberon -- Original version of the language as it was introduced in 1986. DeclarationSequence = ConstTypeVarDeclSequence ProcedureDeclSequence; <<PrettyPrinter>>: { V*; } ConstTypeVarDeclSequence = ; ConstTypeVarDeclSequence = ConstTypeVarDeclSequence 'CONST' ConstantDeclarationList ; <<PrettyPrinter>>: { V(ConstTypeVarDeclSequence, 'CONST', I(ConstantDeclarationList)); } ConstTypeVarDeclSequence = ConstTypeVarDeclSequence 'TYPE' TypeDeclarationList ; <<PrettyPrinter>>: { V(ConstTypeVarDeclSequence, 'TYPE', I(TypeDeclarationList)); } ConstTypeVarDeclSequence = ConstTypeVarDeclSequence 'VAR' VariableDeclarationList ; <<PrettyPrinter>>: { V(ConstTypeVarDeclSequence, 'VAR', I(VariableDeclarationList)); } ProcedureDeclSequence = ; ProcedureDeclSequence = ProcedureDeclSequence ForwardDeclaration ';' ; <<PrettyPrinter>>: { V(ProcedureDeclSequence, NL<1>, H(ForwardDeclaration, ';')); } ProcedureDeclSequence = ProcedureDeclSequence ProcedureDeclaration ';' ; <<PrettyPrinter>>: { V(ProcedureDeclSequence, NL<1>, H(ProcedureDeclaration, ';')); } #ELSIF Oberon07 -- Dialect of the original language. Revision 1.10.2013/10.3.2014. DeclarationSequence = ; DeclarationSequence = ConstTypeVarDeclaration ; DeclarationSequence = DeclarationSequence ProcedureDeclaration ';' ; <<PrettyPrinter>>: { V(DeclarationSequence, NL<1>, H(ProcedureDeclaration, ';')); } ConstTypeVarDeclaration = 'VAR' VariableDeclarationList ; <<PrettyPrinter>>: { V('VAR', I(VariableDeclarationList)); } ConstTypeVarDeclaration = 'TYPE' TypeDeclarationList ; <<PrettyPrinter>>: { V('TYPE', I(TypeDeclarationList)); } ConstTypeVarDeclaration = 'TYPE' TypeDeclarationList 'VAR' VariableDeclarationList ; <<PrettyPrinter>>: { V('TYPE', I(TypeDeclarationList), 'VAR', I(VariableDeclarationList)); } ConstTypeVarDeclaration = 'CONST' ConstantDeclarationList ; <<PrettyPrinter>>: { V('CONST', I(ConstantDeclarationList)); } ConstTypeVarDeclaration = 'CONST' ConstantDeclarationList 'VAR' VariableDeclarationList ; <<PrettyPrinter>>: { V('CONST', I(ConstantDeclarationList), 'VAR', I(VariableDeclarationList)); } ConstTypeVarDeclaration = 'CONST' ConstantDeclarationList 'TYPE' TypeDeclarationList ; <<PrettyPrinter>>: { V('CONST', I(ConstantDeclarationList), 'TYPE', I(TypeDeclarationList)); } ConstTypeVarDeclaration = 'CONST' ConstantDeclarationList 'TYPE' TypeDeclarationList 'VAR' VariableDeclarationList ; <<PrettyPrinter>>: { V('CONST', I(ConstantDeclarationList), 'TYPE', I(TypeDeclarationList), 'VAR', I(VariableDeclarationList)); } #ELSE #FAIL #ENDIF ------------------------------------------------------------------------------------------ ---(2)------------ Declarations ------------------------------------------------------------------------------------------ ConstantDeclarationList = ; ConstantDeclarationList = ConstantDeclarationList IdentDef '=' ConstantExpression ';' ; <<PrettyPrinter>>: { V(ConstantDeclarationList, H(IdentDef, '=', ConstantExpression, ';')); } #IF Oberon TypeDeclarationList = ; TypeDeclarationList = TypeDeclarationList IdentDef '=' Type ';' ; <<PrettyPrinter>>: { V(TypeDeclarationList, H(IdentDef, '=', Type, ';')); } #ELSIF Oberon07 TypeDeclarationList = ; TypeDeclarationList = TypeDeclarationList IdentDef '=' StructType ';' ; <<PrettyPrinter>>: { V(TypeDeclarationList, H(IdentDef, '=', StructType, ';')); } #ELSE #FAIL #ENDIF Type = QualIdent ; Type = StructType ; StructType = ArrayType ; StructType = RecordType ; StructType = PointerType ; StructType = ProcedureType ; ArrayType = 'ARRAY' LengthList 'OF' Type ; <<PrettyPrinter>>: { H*; } LengthList = ConstantExpression ; LengthList = LengthList ',' ConstantExpression ; <<PrettyPrinter>>: { H*; } RecordType = 'RECORD' 'END' ; <<PrettyPrinter>>: { H*; } RecordType = 'RECORD' FieldListSequence 'END' ; <<PrettyPrinter>>: { V('RECORD', I(FieldListSequence), 'END'); } RecordType = 'RECORD' '(' BaseType ')' 'END' ; <<PrettyPrinter>>: { H('RECORD', SP<1>, '(', BaseType, ')', 'END'); } RecordType = 'RECORD' '(' BaseType ')' FieldListSequence 'END' ; <<PrettyPrinter>>: { V(H('RECORD', SP<1>, '(', BaseType, ')'), I(FieldListSequence), 'END'); } BaseType = QualIdent ; FieldListSequence = FieldList ; FieldListSequence = FieldListSequence ';' FieldList ; <<PrettyPrinter>>: { V(H(FieldListSequence, ';'), FieldList); } FieldList = FieldNameList ':' Type ; <<PrettyPrinter>>: { H*; } FieldNameList = IdentDef ; FieldNameList = FieldNameList ',' IdentDef ; <<PrettyPrinter>>: { H*; } PointerType = 'POINTER' 'TO' Type ; <<PrettyPrinter>>: { H*; } ProcedureType = 'PROCEDURE' ; ProcedureType = 'PROCEDURE' FormalParameters ; <<PrettyPrinter>>: { H*; } VariableDeclarationList = ; VariableDeclarationList = VariableDeclarationList VariableDefList ':' Type ';' ; <<PrettyPrinter>>: { V(VariableDeclarationList, H(VariableDefList, ':', Type, ';')); } VariableDefList = IdentDef ; VariableDefList = VariableDefList ',' IdentDef ; <<PrettyPrinter>>: { H*; } QualIdent = ident ; QualIdent = ident '.' ident ; <<PrettyPrinter>>: { H*; } IdentDef = ident ; IdentDef = ident '*' ; <<PrettyPrinter>>: { H*; } ------------------------------------------------------------------------------------------ ---(3)------------ Procedures ------------------------------------------------------------------------------------------ ProcedureDeclaration = ProcedureHeading ';' ProcedureBody ident ; <<PrettyPrinter>>: { V(H(ProcedureHeading, ';'), H(ProcedureBody, ident)); } ProcedureHeading = 'PROCEDURE' IdentDef ; <<PrettyPrinter>>: { H*; } ProcedureHeading = 'PROCEDURE' IdentDef FormalParameters ; <<PrettyPrinter>>: { H*; } #IF Oberon ProcedureHeading = 'PROCEDURE' '*' IdentDef ; <<PrettyPrinter>>: { H*; } ProcedureHeading = 'PROCEDURE' '*' IdentDef FormalParameters ; <<PrettyPrinter>>: { H*; } #ELSIF Oberon07 #ELSE #FAIL #ENDIF FormalParameters = '(' ')' ; <<PrettyPrinter>>: { H*; } FormalParameters = '(' ')' ':' QualIdent ; <<PrettyPrinter>>: { H*; } FormalParameters = '(' FormalParameterList ')' ; <<PrettyPrinter>>: { H*; } FormalParameters = '(' FormalParameterList ')' ':' QualIdent ; <<PrettyPrinter>>: { H*; } FormalParameterList = FormalParameterSection ; FormalParameterList = FormalParameterList ';' FormalParameterSection ; <<PrettyPrinter>>: { H*; } FormalParameterSection = FormalParameterNameList ':' FormalParameterType ; <<PrettyPrinter>>: { H*; } FormalParameterSection = 'VAR' FormalParameterNameList ':' FormalParameterType ; <<PrettyPrinter>>: { H*; } FormalParameterNameList = ident ; FormalParameterNameList = FormalParameterNameList ',' ident ; <<PrettyPrinter>>: { H*; } FormalParameterType = QualIdent ; FormalParameterType = ArrayOfSequence QualIdent ; <<PrettyPrinter>>: { H*; } #IF Oberon FormalParameterType = ProcedureType ; FormalParameterType = ArrayOfSequence ProcedureType ; <<PrettyPrinter>>: { H*; } #ELSIF Oberon07 #ELSE #FAIL #ENDIF ArrayOfSequence = 'ARRAY' 'OF' ; <<PrettyPrinter>>: { H*; } ArrayOfSequence = ArrayOfSequence 'ARRAY' 'OF' ; <<PrettyPrinter>>: { H*; } ProcedureBody = DeclarationSequence 'END' ; <<PrettyPrinter>>: { V(DeclarationSequence, 'END'); } ProcedureBody = DeclarationSequence 'BEGIN' StatementSequence 'END' ; <<PrettyPrinter>>: { V(DeclarationSequence, 'BEGIN', I(StatementSequence), 'END'); } #IF Oberon ForwardDeclaration = 'PROCEDURE' '^' ident ; <<PrettyPrinter>>: { H*; } ForwardDeclaration = 'PROCEDURE' '^' ident FormalParameters ; <<PrettyPrinter>>: { H*; } ForwardDeclaration = 'PROCEDURE' '^' ident '*' ; <<PrettyPrinter>>: { H*; } ForwardDeclaration = 'PROCEDURE' '^' ident '*' FormalParameters ; <<PrettyPrinter>>: { H*; } #ELSIF Oberon07 ProcedureBody = DeclarationSequence 'RETURN' Expression 'END' ; <<PrettyPrinter>>: { V(DeclarationSequence, H('RETURN', Expression), 'END'); } ProcedureBody = DeclarationSequence 'BEGIN' StatementSequence 'RETURN' Expression 'END' ; <<PrettyPrinter>>: { V(DeclarationSequence, 'BEGIN', I(StatementSequence), H('RETURN', Expression), 'END'); } #ELSE #FAIL #ENDIF ------------------------------------------------------------------------------------------ ---(4)------------ Statements ------------------------------------------------------------------------------------------ StatementSequence = Statement ; StatementSequence = StatementSequence ';' Statement ; <<PrettyPrinter>>: { V(H(StatementSequence, ';'), Statement); } Statement = ; Statement = Assignment ; Statement = ProcedureCall ; Statement = IfStatement ; Statement = CaseStatement ; Statement = WhileStatement ; Statement = RepeatStatement ; #IF Oberon Statement = LoopStatement ; Statement = WithStatement ; Statement = 'EXIT' ; Statement = 'RETURN' ; Statement = 'RETURN' Expression ; <<PrettyPrinter>>: { H*; } #ELSIF Oberon07 Statement = ForStatement ; #ELSE #FAIL #ENDIF Assignment = designator ':=' Expression ; <<PrettyPrinter>>: { H*; } ProcedureCall = designator ; ProcedureCall = designator ActualParameters ; <<PrettyPrinter>>: { H*; } IfStatement = 'IF' Expression 'THEN' StatementSequence 'END' ; <<PrettyPrinter>>: { V(H('IF', Expression, 'THEN'), I(StatementSequence), 'END'); } IfStatement = 'IF' Expression 'THEN' StatementSequence 'ELSE' StatementSequence 'END' ; <<PrettyPrinter>>: { V(H('IF', Expression, 'THEN'), I(StatementSequence), 'ELSE', I(StatementSequence), 'END'); } IfStatement = 'IF' Expression 'THEN' StatementSequence ElseIfSequence 'END' ; <<PrettyPrinter>>: { V(H('IF', Expression, 'THEN'), I(StatementSequence), ElseIfSequence, 'END'); } IfStatement = 'IF' Expression 'THEN' StatementSequence ElseIfSequence 'ELSE' StatementSequence 'END' ; <<PrettyPrinter>>: { V(H('IF', Expression, 'THEN'), I(StatementSequence), ElseIfSequence, 'ELSE', I(StatementSequence), 'END'); } ElseIfSequence = 'ELSIF' Expression 'THEN' StatementSequence ; <<PrettyPrinter>>: { V(H('ELSIF', Expression, 'THEN'), I(StatementSequence)); } ElseIfSequence = ElseIfSequence 'ELSIF' Expression 'THEN' StatementSequence ; <<PrettyPrinter>>: { V(ElseIfSequence, H('ELSIF', Expression, 'THEN'), I(StatementSequence)); } CaseStatement = 'CASE' Expression 'OF' CaseSequence 'END' ; <<PrettyPrinter>>: { V(H('CASE', Expression, 'OF'), I(CaseSequence), 'END'); } #IF Oberon CaseStatement = 'CASE' Expression 'OF' CaseSequence 'ELSE' StatementSequence 'END' ; <<PrettyPrinter>>: { V(H('CASE', Expression, 'OF'), I(CaseSequence), 'ELSE', I(StatementSequence), 'END'); } #ELSIF Oberon07 #ELSE #FAIL #ENDIF CaseSequence = CaseCase ; CaseSequence = CaseSequence '|' CaseCase ; <<PrettyPrinter>>: { V(CaseSequence, H('|', CaseCase)); } CaseCase = ; CaseCase = CaseLabelList ':' StatementSequence ; <<PrettyPrinter>>: { V(H(CaseLabelList, ':'), I(StatementSequence)); } CaseLabelList = CaseLabels ; CaseLabelList = CaseLabels ',' CaseLabels ; <<PrettyPrinter>>: { H*; } CaseLabels = label ; CaseLabels = label '..' label ; <<PrettyPrinter>>: { H*; } #IF Oberon label = ConstantExpression ; #ELSIF Oberon07 label = integer ; label = hexint ; label = string ; label = ident ; #ELSE #FAIL #ENDIF WhileStatement = 'WHILE' Expression 'DO' StatementSequence 'END' ; <<PrettyPrinter>>: { V(H('WHILE', Expression, 'DO'), I(StatementSequence), 'END'); } #IF Oberon #ELSIF Oberon07 WhileStatement = 'WHILE' Expression 'DO' StatementSequence ElseIfWhileSequence 'END' ; <<PrettyPrinter>>: { V(H('WHILE', Expression, 'DO'), I(StatementSequence), ElseIfWhileSequence, 'END'); } ElseIfWhileSequence = 'ELSIF' Expression 'DO' StatementSequence ; <<PrettyPrinter>>: { V(H('ELSIF', Expression, 'DO'), I(StatementSequence)); } ElseIfWhileSequence = ElseIfWhileSequence 'ELSIF' Expression 'DO' StatementSequence ; <<PrettyPrinter>>: { V(ElseIfWhileSequence, H('ELSIF', Expression, 'DO'), I(StatementSequence)); } #ELSE #FAIL #ENDIF RepeatStatement = 'REPEAT' StatementSequence 'UNTIL' Expression ; <<PrettyPrinter>>: { V('REPEAT', I(StatementSequence), H('UNTIL', Expression)); } #IF Oberon LoopStatement = 'LOOP' StatementSequence 'END' ; <<PrettyPrinter>>: { V('LOOP', I(StatementSequence), 'END'); } WithStatement = 'WITH' QualIdent ':' QualIdent 'DO' StatementSequence 'END' ; <<PrettyPrinter>>: { V(H('WITH', QualIdent, ':', QualIdent, 'DO'), I(StatementSequence), 'END'); } #ELSIF Oberon07 ForStatement = 'FOR' ident ':=' Expression 'TO' Expression 'DO' StatementSequence 'END' ; <<PrettyPrinter>>: { V(H('FOR', ident, ':=', Expression, 'TO', Expression, 'DO'), I(StatementSequence), 'END'); } ForStatement = 'FOR' ident ':=' Expression 'TO' Expression 'BY' ConstantExpression 'DO' StatementSequence 'END' ; <<PrettyPrinter>>: { V(H('FOR', ident, ':=', Expression, 'TO', Expression, 'BY', ConstantExpression, 'DO'), I(StatementSequence), 'END'); } #ELSE #FAIL #ENDIF ------------------------------------------------------------------------------------------ ---(5)------------ Expressions ------------------------------------------------------------------------------------------ Expression = SimpleExpression ; Expression = Expression RelationOperator SimpleExpression ; <<PrettyPrinter>>: { H*; } ConstantExpression = Expression ; RelationOperator = '=' ; RelationOperator = '#' ; RelationOperator = '<' ; RelationOperator = '<=' ; RelationOperator = '>' ; RelationOperator = '>=' ; RelationOperator = 'IN' ; RelationOperator = 'IS' ; SimpleExpression = term ; SimpleExpression = '+' term ; <<PrettyPrinter>>: { H*; } SimpleExpression = '-' term ; <<PrettyPrinter>>: { H*; } SimpleExpression = SimpleExpression AddOperator term ; <<PrettyPrinter>>: { H*; } AddOperator = '+' ; AddOperator = '-' ; AddOperator = 'OR' ; term = factor ; term = term MulOperator factor ; <<PrettyPrinter>>: { H*; } MulOperator = '*' ; MulOperator = '/' ; MulOperator = 'MOD' ; MulOperator = 'DIV' ; MulOperator = '&' ; factor = number ; factor = string ; factor = 'NIL' ; #IF Oberon factor = charconst ; #ELSIF Oberon07 factor = 'TRUE' ; factor = 'FALSE' ; #ELSE #FAIL #ENDIF factor = Set ; factor = designator ; factor = designator ActualParameters ; <<PrettyPrinter>>: { H*; } factor = '(' Expression ')' ; <<PrettyPrinter>>: { H*; } factor = '~' factor ; <<PrettyPrinter>>: { H*; } number = integer ; number = hexint ; number = real ; Set = '{' '}' ; <<PrettyPrinter>>: { H*; } Set = '{' SetElementList '}' ; <<PrettyPrinter>>: { H*; } SetElementList = SetElement ; SetElementList = SetElementList ',' SetElement ; <<PrettyPrinter>>: { H*; } SetElement = Expression ; SetElement = Expression '..' Expression ; <<PrettyPrinter>>: { H*; } designator = QualIdent ; designator = designator '.' ident ; <<PrettyPrinter>>: { H*; } designator = designator '[' IndexList ']' ; <<PrettyPrinter>>: { H*; } designator = designator '(' QualIdent ')' ; <<PrettyPrinter>>: { H*; } designator = designator '^' ; <<PrettyPrinter>>: { H*; } IndexList = Expression ; IndexList = IndexList ',' Expression ; <<PrettyPrinter>>: { H*; } ActualParameters = '(' ')' ; <<PrettyPrinter>>: { H*; } ActualParameters = '(' ExpressionList ')' ; <<PrettyPrinter>>: { H*; } ExpressionList = Expression ; ExpressionList = ExpressionList ',' Expression ; <<PrettyPrinter>>: { H*; }
Description of the Oberon PrettyPrinting Grammar
The grammar rules here are identical to those bare DMS grammar for Oberon. For each grammar rule, a prettyprinting rule <<PrettyPrinter:>> { ... } has been added where needed (unit productions don't generally need rules; they print the same box as their right hand side).
The PRETTYPRINTER declaration at the top of the grammar file tells the prettyprinter whether to generate whitespace between various terminals of the grammar. (Note that grammar preprocessor conditionals can apply to prettyprinter declarations.)
It is instructive to examine the prettyprinting rule for while in detail. This rule combines the while and do keyword with the condition in one line, indents the body of the loop, and vertially alignes the while clause with the end keyword.
PrettyPrinting Terminals
A lexer converts characters into tokens. A prettyprinter, in converting trees back into text, must convert leaf nodes into text. Non-value carrying leaves (produced from #token 'foo' ... have a text equivalent of foo; the DMS PrettyPrinting engineautomatically handles such cases as a default.
Where the spelling of the token might be different than its quoted name, or where the token carries some kind of (binary) literal value, the corresponding string must be regenerated. In the same way that DMS Lexers use custom code to convert recognized text to a binary value, DMS PrettyPrinters use custom code convert binary values carried on leaves back to text. In general, one may write an arbitrary procedure. DMS provides, through a prettyprinter library, standard procedures to convert standard binary data types with format information collected by the lexer, to produce a nicely formatted string.
The DMS lexer for Oberon collects hexadecimal numbers, and converts them to binary. The custom code for the Oberon prettyprinter converts the binary number back to string using this custom code:
(define HexadecimalToString (lambda Literal-prettyprinter-function-type (let (;; (= [literal Graph:HGHandling:NaturalAndFormat] (Graph:HGHandling:GetNaturalAndFormat token)) (= [format LiteralFormat:NaturalLiteralFormat] (LiteralFormat:ExpandCompactToNaturalLiteralFormat literal:Format)) (= [result string] `') );; (value (;; (= result (PrettyPrinterFormattingSupport:FormatNaturalToString (. format) literal:Value PrettyPrinterFormattingSupport:NotACharacter PrettyPrinterFormattingSupport:NotACharacter))= (ifthen (== format:Radix 16) (;; (ifthen (~ (&& (>= (ordinal result:1) "0") (<= (ordinal result:1) "9"))) (= result (concatenate `0') result) )ifthen (return (append result "H")) );; )ifthen );; result )case )value )let )lambda )define
Example: PrettyPrinting Oberon Source
DMS can parse the following badly formatted Oberon source file, using the Oberon DMS Parser, and produce an internal AST.MODULE Buffer; CONST N = 100; VAR nonempty*, nonfull* : BOOLEAN; in, out, n: INTEGER; buf: ARRAY N OF INTEGER; PROCEDURE Put(x : INTEGER); BEGIN (* Procedure Body *) IF n < N THEN buf[in] := x; in := (in+1) MOD N; INC(n); nonfull := n < N; nonempty := TRUE END END put; (* End of the procedure *) PROCEDURE Get*(VAR x: INTEGER); BEGIN IF n > 0 THEN x := buf[out]; out := (out+1) MOD N; DEC(n); nonempty := n > 0; nonfull := TRUE END END get; BEGIN n := 0; in := 0; out := 0; nonempty := FALSE; nonfull := TRUE END Buffer.
DMS can prettyprint that AST using the Oberon PrettyPrinter to produce the following legal Oberon output: This is essentially the same case as prettyprinting an AST that has been generated or transformed ASTs, and contains no useful positioning information.
MODULE Buffer; CONST N = 100; VAR nonempty *, nonfull * : BOOLEAN; in, out, n : INTEGER; buf : ARRAY N OF INTEGER; PROCEDURE Put(x : INTEGER); BEGIN (* Procedure Body *) IF n < N THEN buf [ in ] := x; in := (in + 1) MOD N; INC(n); nonfull := n < N; nonempty := TRUE END END put; (* End of the procedure *) PROCEDURE Get * ( VAR x : INTEGER); BEGIN IF n > 0 THEN x := buf [ out ]; out := (out + 1) MOD N; DEC(n); nonempty := n > 0; nonfull := TRUE END END get; BEGIN n := 0; in := 0; out := 0; nonempty := FALSE; nonfull := TRUE END Buffer.
Next: DMS Attribute Grammars
Note: text in this section is an enhanced version of that originally published by Semantic Designs/Ira Baxter at StackOverflow. SD claim copyrights, and thus the right to use it here with arbitrary changes of our choice.