Referring to custom glyphs/chars from MathML

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Referring to custom glyphs/chars from MathML

Imsieke, Gerrit, le-tex
tl;dr

– For TEI projects that contain (many and/or extensive) formulas with
custom mathematical symbols, we recommend to declare these symbols using
the TEI elements <char> or <glyph>, even when the symbols only appear
within MathML.
– For establishing the connection between the MathML symbols in formulas
and their declarations, we recommend to use @xlink:href (for MathML 2)
or @href (for MathML 3).
– If the symbols in the formulas have content (private use area
character or image), add Schematron to ensure that these representations
correspond to the respective representations in the linked declarations.

What follows is more like a comment than a question ;-) We found few
references to MathML in the list archives, which tells us that encoding
mathematical formulas does not seem to be a key component of most
digital humanities projects. But when it is, it might well involve
encoding symbols for which no Unicode representation exists.

We (textloop and le-tex) want to share which encoding recommendations we
arrived at for a current project. And of course we are soliciting
feedback from the list.

So here we go:

We are converting math-heavy volumes of a text-critical edition of
Leibniz’s works from LaTeX to TEI. We intend to encode the formulas as
(presentational) MathML:

Unless another vocabulary seems more appropriate (for example for
encoding equations that can be consumed by computer algebra systems), we
recommend using presentational MathML as formula notation. One part of
this consideration is that we want to have an XML representation (and
not an unparsed string-only format such as TeX/LaTeX/AMSTeX) so that we
be able to actually link from a single symbol in a formula to its
declaration. The case for presentational MathML is the tool support and
its unchallenged role as the go-to math format in publishing. Compared
to other XML vocabulary candidates such as, say, SVG, presentational
MathML conveys at least some mathematical meaning.

Leibniz invented some mathematical operators that didn’t make it into
Unicode yet. The typesetters used custom LaTeX macros for each of these
symbols, and the macros ultimately resolve to including images.

We are now thinking about how to encode these symbols in a TEI
customizing that incorporates MathML. The use of these symbols is
confined to contexts in which we can use MathML exclusively, instead of
TEI-native vocabulary. On the other hand, the formulas can be so complex
that they cannot be appropriately encoded with TEI-native markup.

So in principle we can use MathML’s mglyph element that, by means of its
@src attribute, will refer to the corresponding image. mglyph’s alt
attribute may contain the LaTeX macro name so this mapping information
will still be available when converting the TEI XML to LaTeX in the
future. (We will most likely invert the production process for future
volumes, going from TEI to LaTeX rather than the other way round).

However, we think it is potentially more expressive, more flexible, and
less redundant to use TEI’s <glyph> or <char> elements to declare these
symbols in a central place. Then the question arises how we can point to
the declarations, given that MathML elements such as <mo> don’t have a
dedicated TEI-pointer-like attribute such as g/@ref.

Candidates are @xlink:href and @xref. The latter is rarely used. It was
designed to link between presentational and content MathML elements in
parallel markup. It is declared as an IDREF in the schema which makes it
difficult to point to a declaration that is stored in a different file.
@xlink:href, on the other hand, is declared to be able to hold arbitrary
content, in particular URLs that can point to the glyph definitions.

So an empty <mo> element with @xlink:href pointing to the glyph
declaration would be a good candidate.

The glyph could be declared as:

<glyph xml:id="pleibvdash">
   <glyphName>pleibvdash</charName>
   <desc>a dagger &#x2020; with a horizontal line on the left-hand side
of its stem, or a double dagger &#x2021; without the lower right-hand
horizontal line.</desc>
   <mapping type="PUA">&#xE212;</mapping>
   <mapping type="tex">\pleibvdash</mapping>
   <graphic url="pm.pdf"/>
   <graphic url="pm.svg"/>
</glyph>


There are two concerns though. The first is that we are considering
using MathML 3 instead of MathML 2. In MathML 3, the attribute is called
@href instead of @xlink:href, and its semantics seem to have shifted
towards actual hyperlinking (instead of unspecified linking mechanisms
as in MathML 2). This seems to be a minor concern. I don’t think that
Leibniz or the critical edition editors will start using hyperlinks on
math symbols any time soon. And if they do, they will be able to use the
<maction> element in order to make their hyperlinking intent
unambiguous. If we document the use and rendering expectation of @href
on <mo> and <mi> in our encoding description, everything should be fine.


An obvious TEI-centric solution would be to allow <g>’s tei.pointer
attribute @ref also on <mo> in our customization. We cannot pursue this
approach though because the resulting XML needs to validate against
tei_allPlus.rng, too (or to an otherwise unaltered tei_allPlus-like
customization that includes MathML 3 instead of MathML 2). This has been
stipulated by the editor/publisher.


The second concern is about renderability of the custom symbols in TEI
viewers and MathML editing tools. The issue is that MathML renderers and
equation editors won’t be able to properly display an <mo> that has no
content, but only a custom link to a TEI element instead. (It would at
best provide a hyperlink that may or may not take you to the declaration
in the TEI file.)

For the purpose of HTML or LaTeX→PDF renderings, we can always look up
the appropriate image URL or LaTeX macros in the char/glyph declaration
and transform the source MathML to another MathML that contains an
<mglyph src="…"/> element or to LaTeX code, as described below in
greater detail. (Yes, the detail will become even greater further down
this posting, dear reader.)

However, when we switch to a TEI-first workflow in the future, someone
needs to type the equations, probably not as raw XML, but with a visual
MathML editor. (Although it is possible that the formulas will be
written in LaTeX and converted to MathML using LaTeXML, as we are doing
now.) Ideally this editor will provide a customizable symbol palette or
a toolbar that can hold more complex MathML expressions. In any case,
without a string value or an image to represent the symbol, it won’t
display in the formulas that contain it.

So maybe instead of, or in addition to, linking to the glyph definition,
we might give the <mo> element content, like this:

<mo>&#xE212;</mo>

or

<mo href="#pleibvdash">&#xE212;</mo>

(In order to be able to actually see the symbols, we’d need to patch the
math font that the equation editor uses.)

The second variant is only supported by an equation editor whose toolbar
can hold arbitrary MathML expressions, not just custom symbol
characters. (Examples for these editors are MathType and Wiris Editor.)
Such an equation editor is most probably able to insert a @href-only,
otherwise empty, <mo>, although it might not offer a recognizable visual
representation for it.

Alternatively, this visual representation can be achieved by including
an <mglyph src="pm.svg"/> in the <mo>. So this would be a third variant:

<mo href="#pleibvdash"><mglyph src="pm.svg"/></mo>

Of course the second and third variants a bit redundant. You could look
up the <glyph>, provided that it contains <mapping
type="PUA">&#xE212;</mapping> and that no other <glyph> or <char>
contains the same PUA mapping, by the string value only. Or you can look
up the <glyph/> by the image file name.

However, linking by href is more explicit than matching by string value
or image name, and therefore, despite the redundancy, we think that
content should always be accompanied by an @href (@xlink:href for MathML
2) connection.

Therefore, if an equation editor or a TEI viewer for proofreading must
have content in <mo> in order to display the symbol in formulas, we will
accept this redundancy.

It is then prudent to add these Schematron checks to the customization:
– Does the @href of an <mo> point to a <glyph> or <char> declaration?
– Is lookup by string content or image file name unambiguous?
– Does the looked-up declaration contain the same PUA string
representation (or, in the case of images, does it contain a <graphic>
whose @url matches the @src attribute of an <mglyph>)?

If the equation editor is only able to insert single-character strings
(with some default <mo> or <mi> markup around them), the project should
provide an XSLT transformation or an XML refactoring action that
replaces this element with a properly @hrefed one.


There is another concern that is specific to <mo> elements (in contrast
to <mi> elements). In MathML, operators may have properties, such as
spacing to the left and to the right, or the ability to stretch so that
their height matches the height of a mathematical term that they
enclose/precede/follow. These properties are not expressed as XML
attributes, they are rather included in an operator dictionary that is
maintained by the MathML renderer. Lookup of the dictionary entries is
by an <mo>’s string content and its position (infix, postfix, prefix)
relative to the surrounding content, as determined by the MathML
renderer. So if we want to be able to use this lookup mechanism, the
<mo>s need to have content, rather than being empty elements that point
to a declaration.

However, in practice, there is no way to inform a MathML renderer that
there are new operator dictionary entries for the newly introduced
symbols. We can nevertheless encode the spacing etc. values that should
go into the operator dictionary, using TEI vocabulary within <glyph> or
<char>:

   <charProp>
     <localName>mathOperatorInfixLeftSpace</localName>
     <value>mediummathspace</value>
   </charProp>
   <charProp>
     <localName>mathOperatorInfixRightSpace</localName>
     <value>mediummathspace</value>
   </charProp>
   <charProp>
     <localName>mathOperatorPrefixLeftSpace</localName>
     <value>0em</value>
   </charProp>
   <charProp>
     <localName>mathOperatorPrefixRightSpace</localName>
     <value>veryverythinmathspace</value>
   </charProp>

(these are the operator dictionary lspace/rspace values for common
operators such as '+', '±', and '−', as recommended in
https://www.w3.org/TR/MathML3/appendixc.html#oper-dict.entries-table).

It is expected that for HTML renderings, the MathML formulas will be
slightly transformed so that the @href linking will be replaced with the
SVG representation that is taken from the linked declaration. This
transformation process might then, after analyzing whether the operator
is used as a prefix, an infix or a postfix, insert explicit <mspace
with="mediummathspace"/> spacers around <mo><mglyph src="pm.svg"/></mo>
if the default spacing is not satisfactory. Likewise, required
stretchiness of a custom fence operator might be achieved by scaling the
SVG content to match the box size of the MathML expression that it
delimits (haven’t tried though how to make this work in practice).

For PDF generation through LaTeX, we’d look up the LaTeX macros in the
<glyph> declarations, and leave any spacing issues to the math operator
declaration in the TeX styles.


This is our treatise on how to refer to custom symbols from MathML. Do
you share the conclusions that we arrived at, or would you pursue a
different approach?

Gerrit

--
Gerrit Imsieke
Geschäftsführer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
[hidden email], http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Geschäftsführer: Gerrit Imsieke, Svea Jelonek, Thomas Schmidt