4.  Using Configuration Files

4.1. Configuration File Syntax
4.2. Formatting Options

An xmlformat configuration file specifies formatting options to be associated with particular elements in XML documents. For example, you can format <itemizedlist> elements differently than <orderedlist> elements. (However, you cannot format <listitem> elements differentially depending on the type of list in which they occur.) You can also specify options for a "pseudo-element" named *DEFAULT. These options are applied to any element for which the options are not specified explicitly.

The following sections describe the general syntax of configuration files, then discuss the allowable formatting options that can be assigned to elements.

4.1.  Configuration File Syntax

A configuration file consists of sections. Each section begins with a line that names one or more elements. (Element names do not include the "<" and ">" angle brackets.) The element line is followed by option lines that each name a formatting option and its value. Each option is applied to every element named on its preceding element line.

Element lines and option lines are distinguished based on leading whitespace (space or tab characters):

  • Element lines have no leading whitespace.

  • Option lines begin with at least one whitespace character.

On element lines that name multiple elements, the names should be separated by spaces or commas. These are legal element lines:

para title
para,title
para, title

On option lines, the option name and value should be separated by whitespace and/or an equal sign. These are legal option lines:

normalize yes
normalize=yes
normalize = yes

Blank lines are ignored.

Lines that begin "#" as the first non-white character are taken as comments and ignored. Comments beginning with "#" may also follow the last element name on an element line or the option value on an option line.

Example configuration file:

para
  format        block
  entry-break   1
  exit-break    1
  normalize     yes
  wrap-length   72

literal replaceable userinput command option emphasis
  format        inline

programlisting
  format        verbatim

It is not necessary to specify all of an element's options at the same time. Thus, this configuration file:

para, title
  format block
  normalize yes
title
  wrap-length 50
para
  wrap-length 72

Is equivalent to this configuration file:

para
  format block
  normalize yes
  wrap-length 72
title
  format block
  normalize yes
  wrap-length 50

If an option is specified multiple times for an element, the last value is used. For the following configuration file, para ends up with a wrap-length value of 68:

para
  format        block
  wrap-length   60
  wrap-length   72
para
  wrap-length   68

To continue an element line onto the next line, end it with a backslash character. xmlformat will interpret the next line as containing more element names for the current section:

chapter appendix article \
section simplesection \
sect1 sect2 sect3 \
sect4 sect5
  format        block
  entry-break   1
  element-break 2
  exit-break    1
  normalize     no
  subindent     0

Continuation can be useful when you want to apply a set of formatting options to a large number of elements. Continuation lines are allowed to begin with whitespace (though it's possible they may appear to the casual observer to be option lines if they do).

Continuation is not allowed for option lines.

A configuration file may contain options for two special "pseudo-element" names: *DOCUMENT and *DEFAULT. (The names begin with a "*" character so as not to conflict with valid element names.)

*DEFAULT options apply to any element that appears in the input document but that was not configured explicitly in the configuration file.

*DOCUMENT options are used primarily to control line breaking between top-level nodes of the document, such as the XML declaration, the DOCTYPE declaration, the root element, and any comments or processing instructions that occur outside the root element.

It's common to supply *DEFAULT options in a configuration file to override the built-in values. However, it's normally best to leave the *DOCUMENT options alone, except possibly to change the element-break value.

Before reading the input document, xmlformat sets up formatting options as follows:

  1. It initializes the built-in *DOCUMENT and *DEFAULT options,

  2. It reads the contents of the configuration file, assigning formatting options to elements as listed in the file.

    Note that although *DOCUMENT and *DEFAULT have built-in default values, the defaults they may be overridden in the configuration file.

  3. After reading the configuration file, any missing formatting options for each element are filled in using the options from the *DEFAULT pseudo-element. For example, if para is defined as a block element but no subindent value is defined, para "inherits" the subindent value from the *DEFAULT settings.

Missing options are filled in from the *DEFAULT options only after reading the entire configuration file. For the settings below, *DEFAULT has a subindent value of 2 (not 0) after the file has been read. Thus, para also is assigned a subindent value of 2.

*DEFAULT
  subindent 0
para
  format block
  normalize yes
*DEFAULT
  subindent 2

4.2.  Formatting Options

The allowable formatting options are as follows:

format {block | inline | verbatim}
entry-break n
element-break n
exit-break n
subindent n
normalize {no | yes}
wrap-length n

A value list shown as { value1 | value2 | ... } indicates that the option must take one of the values in the list. A value shown as n indicates that the option must have a numeric value.

Details for each of the formatting options follow.

  • format {block | inline | verbatim}

    This option is the most important, because it determines the general way in which the element is formatted, and it determines whether the other formatting options are used or ignored:

    • For block elements, all other formatting options are significant.

    • For inline elements, all other formatting options are ignored. Inline elements are normalized, wrapped, and indented according to the formatting options of the enclosing block element.

    • For verbatim elements, all other formatting options are ignored. The element content is written out verbatim (literally), without change, even if it contains other sub-elements. This means no normalization of the contents, no indenting, and no line-wrapping. Nor are any breaks added within the element.

    A configuration file may specify any option for elements of any type, but xmlformat will ignore inapplicable options. One reason for this is to allow you to experiment with changing an element's format type without having to disable other options.

    If you use the --show-config command-line option to see the configuration that xmlformat will use for processing a document, it displays only the applicable options for each element.

  • entry-break n

    element-break n

    exit-break n

    These options indicate the number of newlines (line breaks) to write after the element opening tag, between child sub-elements, and before the element closing tag. They apply only to block elements.

    A value of 0 means "no break". A value of 1 means one newline, which causes the next thing to appear on the next line with no intervening blank line. A value n greater than 1 produces n-1 intervening blank lines. Some examples:

    • An entry-break value of 0 means the next token will appear on same line immediately after the opening tag.

    • An exit-break value of 0 means the closing tag will appear on same line immediately after the preceding token.

  • subindent n

    This option indicates the number of spaces by which to indent child sub-elements, relative to the indent of the enclosing parent. It applies only to block elements. The value may be 0 to suppress indenting, or a number n greater than 0 to produce indenting.

    This option does not affect the indenting of the element itself. That is determined by the subindent value of the element's own parent.

    Note: subindent does not apply to text nodes in non-normalized blocks, which are written as is without reformatting. subindent also does not apply to verbatim elements or to the following non-element constructs, all of which are written with no indent:

    • Processing instructions

    • Comments

    • DOCTYPE declarations

    • CDATA sections

  • normalize {no | yes}

    This option indicates whether or not to perform whitespace normalization in text. This option is used for block elements, but it also affects inline elements because their content is normalized the same way as their enclosing block element.

    If the value is no, whitespace-only text nodes are not considered significant and are discarded, possibly to be replaced with line breaks and indentation.

    If the value is yes, normalization causes removal of leading and trailing whitespace within the element, and conversion of runs of whitespace characters (including line-ending characters) to single spaces.

    Text normalization is discussed in more detail in Section 3.3, “ Text Handling ”.

  • wrap-length n

    Line-wrapping length. This option is used only for block elements and line-wrapping occurs only if normalization is enabled. The option affects inline elements because they are line-wrapped the same way as their enclosing block element.

    Setting the wrap-length option to 0 disables wrapping. Setting it to a value n greater than 0 enables wrapping to lines at most n characters long. (Exception: If a word longer than n characters occurs in text to be wrapped, it is placed on a line by itself. A word will never be broken into pieces.) The line length is adjusted by the current indent when wrapping is performed to keep the right margin of wrapped text constant. For example if the wrap-length value is 60 and the current indent is 10, lines are wrapped to a maximum of 50 characters.

    Any prevailing indent is added to the beginning of each line, unless the text will be written immediately following a tag on the same line. This can occur if the text occurs after the opening tag of the block and the entry-break is 0, or the text occurs after the closing tag of a sub-element and the element-break is 0.