TeXML is an XML syntax for TeX. A processor translates TeXML source into TeX.
The Document Type Definition (DTD) for TeXML can be found in a TeXML distribution package.
TeXML
cmd
env
group
math
and dmath
ctrl
spec
pdf
TeXML
<?xml version="1.0" encoding="..."?> <TeXML> ... your content here ... </TeXML>
The root element of a TeXML document is the element TeXML
.
cmd
TeXML: <cmd name="documentclass"> <opt>12pt</opt> <parm>letter</parm> </cmd>
TeX: \documentclass[12pt]{letter}
The TeXML cmd
element encodes TeX commands.
opt
children to the cmd
element. The processor places opt
children within square braces, as LaTeX style options.parm
children to the cmd
element. The processor places parm
children within TeX groups, that is, curly braces.The TeXML cmd
can have several parm
or opt
elements.
env
TeXML: <env name="document"> ... </env>
TeX: \begin{document} ... \end{document}
The element env
is a convenience for expressing LaTeX environments.
group
TeXML: <group><cmd name="it"/>italics</group>
TeX: {\it italics}
The group
element is a convenience for encoding groups. The processor will supply an opening brace at the beginning, and a closing brace at the end of the group.
math
and dmath
TeXML: <math>a+b</math> <dmath><cmd name="sqrt"><parm>2</parm></cmd></dmath>
TeX: $a+b$ $$\sqrt{2}$$
Elements math
and dmath
are conveniences for encoding math groups. The processor inserts the appropriate math shift symbol at the beginning and end of the group and also switches mode to math inside the group.
ctrl
TeXML: line1<ctrl ch="\"/>line2
TeX: line1\\line2
The ch
attibute of the ctrl
element encodes a control symbol.
spec
TeXML: <spec cat="vert"/>l<spec cat="vert"/>
TeX: |l|
The attribute cat
of the element spec
creates the corresponding symbol verbatim, without escaping.
description |
cat value |
output |
---|---|---|
escape character | esc | \ |
begin group | bg | { |
end group | eg | } |
math shift | mshift | $ |
alignment tab | align | & |
parameter | parm | # |
superscript | sup | ^ |
subscript | sub | _ |
tilde | tilde | ~ |
comment | comment | % |
vertical line | vert | | |
less than | lt | < |
greater than | gt | > |
pdf
TeXML: <pdf>τεχ</pdf>
TeX: \003\304\003\265\003\307
Content of the element pdf
is converted to UTF16BE encoding and represented using escaped octal codes. The result is a PDF unicode string.
Characters are processed as follows:
To leave specials as is, without escaping, use the TeXML
attribute escape
:
<TeXML escape="0">...</TeXML>
symbol | text mode | math mode |
---|---|---|
\ | \textbackslash{} | \backslash{} |
{ | \{ | \{ |
} | \} | \} |
$ | \textdollar{} | \$ |
& | \& | \& |
# | \# | \# |
^ | \^{} | \^{} |
_ | \_ | \_ |
~ | \textasciitilde{} | \~{} |
% | \% | \% |
| | \textbar{} | | |
< | \textless{} | < |
> | \textgreater{} | > |
The LaTeX mapping table for unicode characters is automatically generated from the file unicode.xml. This file is an appendix for the W3C MathML specification.
If a replacement of an unicode character a) is valid only in math mode and b) the current mode is text, then the replacement is wrapped by the command “\ensuremath”. Likewise if a replacement a) is valid only in text mode and b) the current mode is math, then wrapper “\ensuretext” is used.
LaTeX does not have the command “\ensuretext” so you should define it yourself. One of the approaches is:
\def\ensuretext{\textrm}
Empty lines have a special meaning for TeX. They cause automatic generation of the TeX command \par. To avoid this, the processor outputs a line with the one symbol % (TeX comment) instead of a empty line.
To leave empty lines as is, use the TeXML
attribute emptylines
:
<TeXML emptylines="1">...</TeXML>
The TeXML processor disconnects well-known ligatures “--”, “---”, “``”, “''”, “!`” and “?`”. These ligatures are converted into “-{}-”, “-{}-{}-”, “`{}`”, “'{}'”, “!{}`”, and “?{}`” respectively.
To leave ligatures as is, use the TeXML
attribute ligatures
:
<TeXML ligatures="1">...</TeXML>
There are two modes: text and math. Modes only affect the translation of characters.
The default mode is text. In order to change mode, use the mode
attribute of the element TeXML
. The possible values for this attribute are math and text. If the element TeXML
is used without attribute mode
, then the mode is not changed.
<TeXML mode="math"> ... math mode here ... <TeXML mode="text">... text mode here ...</TeXML> </TeXML>
Elements math
and dmath
also change mode to math.
The TeXML processor performs advanced whitespace processing. The program
If you find that something goes wrong you can switch off whitespace elimination using the ws
attribute of the TeXML
tag.
<TeXML ws="1"> ... whitespace is verbatim here ... </TeXML>
If the TeXML elements ctrl or spec have any content (including whitespace) then the TeXML processor reports an error.
The program deletes any whitespace that is located directly in the TeXML element cmd.
Insignificant whitespace is whitespace around any opening or closing tag, for example, whitespace around “... <TeXML> ...” and “... </TeXML> ...”. The XML reader converts insignificant whitespace into the weak space.
Another source of weak spaces is TeX commands. When the processor converts “<cmd name="it"/>” into “\it ”, the space after “\it” is a weak space.
The TeX writer processes weak spaces in the following manner:
The resulting documents are usually very good, but after some tuning they can be even better. This section describes how whitespace is handled and introduces some hints to make resulting documents look as good as handcrafted.
If a command has no parameters and options then the TeXML processor adds an empty group “{}” after the command name: “\smth{}”. Without the empty group, the following whitespace is ignored by TeX, but sometimes it is exactly what you need. In this case set attribute “gr” (shortcut for “group”) to “0”.
TeXML: <cmd name="it"/> once, <cmd name="it" gr="0"/> twice
TeX: \it{} once, \it twice
It's difficult to work with documents that are one long line as a result of transformation, so the TeXML processor performs automatic line breaking.
By default “far enough” is 62. You can set another value by using command line parameter “-w” or “--width”. This setting is not strict: a line can be much longer than a specified width, if there are no spaces in it.
Attributes nl1
and nl2
can be used to force a new line before (nl1
) or after (nl2
) TeX command.
The TeXML processor automatically creates new lines around the beginning and the end of an environment. You can change this behaviour using four attributes: nl1
(before the beginning), nl2
(after the beginning), nl3
(before the end) and nl4
(after the end).
You can affect whitespace output by using special categories of the element spec
: nl, nl?, space and nil.
TeXML namespace is http://getfo.sourceforge.net/texml/ns1.
<TeXML xmlns="http://getfo.sourceforge.net/texml/ns1"> ... </TeXML>
In the ConTeXt mode, the element env
creates ConTeXt environments.
TeXML: <env name="document"> ... </env>
TeX: \begindocument ... \enddocument
To activate ConTeXt mode, give the command line option -c or --context to the TeXML processor.