Extended codeblock processing
DITA-OT provides additional processing support beyond that which is mandated by the DITA specification. These extensions can be used to define character encodings or line ranges for code references, normalize indentation, add line numbers or display whitespace characters in code blocks.
Character set definition
For <coderef>
elements, DITA-OT supports defining the code reference target file
encoding using the @format
attribute. The supported format is:
format (";" space* "charset=" charset)?
If a character set is not defined, the system default character set will be used. If the character set is not recognized or supported, the DOTJ052E error is thrown and the system default character set is used as a fallback.
<coderef href="unicode.txt" format="txt; charset=UTF-8"/>
As of DITA-OT 3.3, the default character set for code references can be changed by adding the default.coderef-charset key to the configuration.properties file:
default.coderef-charset = ISO-8859-1
The character set values are those supported by the Java Charset class.
Line range extraction
Code references can be limited to extract only a specified line range by defining the
line-range
pointer in the URI fragment. The format is:
uri ("#line-range(" start ("," end)? ")" )?
Start and end line numbers start from 1 and are inclusive. If the end range is omitted, the range ends on the last line of the file.
<coderef href="Parser.scala#line-range(5,10)" format="scala"/>
Only lines from 5 to 10 will be included in the output.
RFC 5147
DITA-OT also supports the line position and range syntax from RFC 5147. The format for line range is:
uri ("#line=" start? "," end? )?
Start and end line numbers start from 0 and are inclusive and exclusive, respectively. If the start range is omitted, the range starts from the first line; if the end range is omitted, the range ends on the last line of the file. The format for line position is:
uri ("#line=" position )?
The position line number starts from 0.
<coderef href="Parser.scala#line=4,10" format="scala"/>
Only lines from 5 to 10 will be included in the output.
Line range by content
Instead of specifying line numbers, you can also select lines to include in the code reference by specifying keywords (or “tokens”) that appear in the referenced file.
DITA-OT supports the token
pointer in the URI fragment to extract a line range based on the
file content. The format for referencing a range of lines by content is:
uri ("#token=" start? ("," end)? )?
Lines identified using start and end tokens are exclusive: the lines that contain the start token and end token will be not be included. If the start token is omitted, the range starts from the first line in the file; if the end token is omitted, the range ends on the last line of the file.
Given a Haskell source file named fact.hs with the following content,
-- START-FACT
fact :: Int -> Int
fact 0 = 1
fact n = n * fact (n-1)
-- END-FACT
main = print $ fact 7
a range of lines can be referenced as:
<coderef href="fact.hs#token=START-FACT,END-FACT"/>
to include the range of lines that follows the START-FACT
token on Line 1, up to (but not
including) the line that contains the END-FACT
token (Line 5). The resulting
<codeblock>
would contain lines 2–4:
fact :: Int -> Int
fact 0 = 1
fact n = n * fact (n-1)
token
keywords are preserved when the referenced
resource is modified.Whitespace normalization
DITA-OT can adjust the leading whitespace in code blocks to remove excess indentation and keep lines short. Given an XML snippet in a codeblock with lines that all begin with spaces (indicated here as dots “·”),
··<subjectdef keys="audience">
····<subjectdef keys="novice"/>
····<subjectdef keys="expert"/>
··</subjectdef>
DITA-OT can remove the leading whitespace that is common to all lines in the code block. To trim the excess
space, set the @outputclass
attribute on the <codeblock>
element to
include the normalize-space
keyword.
In this case, two spaces (“··”) would be removed from the beginning of each line, shifting content to the left by two characters, while preserving the indentation of lines that contain additional whitespace (beyond the common indent):
<subjectdef keys="audience">
··<subjectdef keys="novice"/>
··<subjectdef keys="expert"/>
</subjectdef>
Whitespace visualization (PDF)
DITA-OT can be set to display the whitespace characters in code blocks to visualize indentation in PDF output.
To enable this feature, set the @outputclass
attribute on the
<codeblock>
element to include the show-whitespace
keyword.
When PDF output is generated, space characters in the code will be replaced with a middle dot or “interpunct”
character ( ·
); tab characters are replaced with a rightwards arrow and three spaces
( →
).
Line numbering (PDF)
DITA-OT can be set to add line numbers to code blocks to make it easier to distinguish specific lines.
To enable this feature, set the @outputclass
attribute on the
<codeblock>
element to include the show-line-numbers
keyword.