tl;dr
– For TEI projects that contain (many and/or extensive) formulas with custom mathematical symbols, we recommend to declare these symbols using the TEI elements <char> or <glyph>, even when the symbols only appear within MathML. – For establishing the connection between the MathML symbols in formulas and their declarations, we recommend to use @xlink:href (for MathML 2) or @href (for MathML 3). – If the symbols in the formulas have content (private use area character or image), add Schematron to ensure that these representations correspond to the respective representations in the linked declarations. What follows is more like a comment than a question ;-) We found few references to MathML in the list archives, which tells us that encoding mathematical formulas does not seem to be a key component of most digital humanities projects. But when it is, it might well involve encoding symbols for which no Unicode representation exists. We (textloop and le-tex) want to share which encoding recommendations we arrived at for a current project. And of course we are soliciting feedback from the list. So here we go: We are converting math-heavy volumes of a text-critical edition of Leibniz’s works from LaTeX to TEI. We intend to encode the formulas as (presentational) MathML: Unless another vocabulary seems more appropriate (for example for encoding equations that can be consumed by computer algebra systems), we recommend using presentational MathML as formula notation. One part of this consideration is that we want to have an XML representation (and not an unparsed string-only format such as TeX/LaTeX/AMSTeX) so that we be able to actually link from a single symbol in a formula to its declaration. The case for presentational MathML is the tool support and its unchallenged role as the go-to math format in publishing. Compared to other XML vocabulary candidates such as, say, SVG, presentational MathML conveys at least some mathematical meaning. Leibniz invented some mathematical operators that didn’t make it into Unicode yet. The typesetters used custom LaTeX macros for each of these symbols, and the macros ultimately resolve to including images. We are now thinking about how to encode these symbols in a TEI customizing that incorporates MathML. The use of these symbols is confined to contexts in which we can use MathML exclusively, instead of TEI-native vocabulary. On the other hand, the formulas can be so complex that they cannot be appropriately encoded with TEI-native markup. So in principle we can use MathML’s mglyph element that, by means of its @src attribute, will refer to the corresponding image. mglyph’s alt attribute may contain the LaTeX macro name so this mapping information will still be available when converting the TEI XML to LaTeX in the future. (We will most likely invert the production process for future volumes, going from TEI to LaTeX rather than the other way round). However, we think it is potentially more expressive, more flexible, and less redundant to use TEI’s <glyph> or <char> elements to declare these symbols in a central place. Then the question arises how we can point to the declarations, given that MathML elements such as <mo> don’t have a dedicated TEI-pointer-like attribute such as g/@ref. Candidates are @xlink:href and @xref. The latter is rarely used. It was designed to link between presentational and content MathML elements in parallel markup. It is declared as an IDREF in the schema which makes it difficult to point to a declaration that is stored in a different file. @xlink:href, on the other hand, is declared to be able to hold arbitrary content, in particular URLs that can point to the glyph definitions. So an empty <mo> element with @xlink:href pointing to the glyph declaration would be a good candidate. The glyph could be declared as: <glyph xml:id="pleibvdash"> <glyphName>pleibvdash</charName> <desc>a dagger † with a horizontal line on the left-hand side of its stem, or a double dagger ‡ without the lower right-hand horizontal line.</desc> <mapping type="PUA"></mapping> <mapping type="tex">\pleibvdash</mapping> <graphic url="pm.pdf"/> <graphic url="pm.svg"/> </glyph> There are two concerns though. The first is that we are considering using MathML 3 instead of MathML 2. In MathML 3, the attribute is called @href instead of @xlink:href, and its semantics seem to have shifted towards actual hyperlinking (instead of unspecified linking mechanisms as in MathML 2). This seems to be a minor concern. I don’t think that Leibniz or the critical edition editors will start using hyperlinks on math symbols any time soon. And if they do, they will be able to use the <maction> element in order to make their hyperlinking intent unambiguous. If we document the use and rendering expectation of @href on <mo> and <mi> in our encoding description, everything should be fine. An obvious TEI-centric solution would be to allow <g>’s tei.pointer attribute @ref also on <mo> in our customization. We cannot pursue this approach though because the resulting XML needs to validate against tei_allPlus.rng, too (or to an otherwise unaltered tei_allPlus-like customization that includes MathML 3 instead of MathML 2). This has been stipulated by the editor/publisher. The second concern is about renderability of the custom symbols in TEI viewers and MathML editing tools. The issue is that MathML renderers and equation editors won’t be able to properly display an <mo> that has no content, but only a custom link to a TEI element instead. (It would at best provide a hyperlink that may or may not take you to the declaration in the TEI file.) For the purpose of HTML or LaTeX→PDF renderings, we can always look up the appropriate image URL or LaTeX macros in the char/glyph declaration and transform the source MathML to another MathML that contains an <mglyph src="…"/> element or to LaTeX code, as described below in greater detail. (Yes, the detail will become even greater further down this posting, dear reader.) However, when we switch to a TEI-first workflow in the future, someone needs to type the equations, probably not as raw XML, but with a visual MathML editor. (Although it is possible that the formulas will be written in LaTeX and converted to MathML using LaTeXML, as we are doing now.) Ideally this editor will provide a customizable symbol palette or a toolbar that can hold more complex MathML expressions. In any case, without a string value or an image to represent the symbol, it won’t display in the formulas that contain it. So maybe instead of, or in addition to, linking to the glyph definition, we might give the <mo> element content, like this: <mo></mo> or <mo href="#pleibvdash"></mo> (In order to be able to actually see the symbols, we’d need to patch the math font that the equation editor uses.) The second variant is only supported by an equation editor whose toolbar can hold arbitrary MathML expressions, not just custom symbol characters. (Examples for these editors are MathType and Wiris Editor.) Such an equation editor is most probably able to insert a @href-only, otherwise empty, <mo>, although it might not offer a recognizable visual representation for it. Alternatively, this visual representation can be achieved by including an <mglyph src="pm.svg"/> in the <mo>. So this would be a third variant: <mo href="#pleibvdash"><mglyph src="pm.svg"/></mo> Of course the second and third variants a bit redundant. You could look up the <glyph>, provided that it contains <mapping type="PUA"></mapping> and that no other <glyph> or <char> contains the same PUA mapping, by the string value only. Or you can look up the <glyph/> by the image file name. However, linking by href is more explicit than matching by string value or image name, and therefore, despite the redundancy, we think that content should always be accompanied by an @href (@xlink:href for MathML 2) connection. Therefore, if an equation editor or a TEI viewer for proofreading must have content in <mo> in order to display the symbol in formulas, we will accept this redundancy. It is then prudent to add these Schematron checks to the customization: – Does the @href of an <mo> point to a <glyph> or <char> declaration? – Is lookup by string content or image file name unambiguous? – Does the looked-up declaration contain the same PUA string representation (or, in the case of images, does it contain a <graphic> whose @url matches the @src attribute of an <mglyph>)? If the equation editor is only able to insert single-character strings (with some default <mo> or <mi> markup around them), the project should provide an XSLT transformation or an XML refactoring action that replaces this element with a properly @hrefed one. There is another concern that is specific to <mo> elements (in contrast to <mi> elements). In MathML, operators may have properties, such as spacing to the left and to the right, or the ability to stretch so that their height matches the height of a mathematical term that they enclose/precede/follow. These properties are not expressed as XML attributes, they are rather included in an operator dictionary that is maintained by the MathML renderer. Lookup of the dictionary entries is by an <mo>’s string content and its position (infix, postfix, prefix) relative to the surrounding content, as determined by the MathML renderer. So if we want to be able to use this lookup mechanism, the <mo>s need to have content, rather than being empty elements that point to a declaration. However, in practice, there is no way to inform a MathML renderer that there are new operator dictionary entries for the newly introduced symbols. We can nevertheless encode the spacing etc. values that should go into the operator dictionary, using TEI vocabulary within <glyph> or <char>: <charProp> <localName>mathOperatorInfixLeftSpace</localName> <value>mediummathspace</value> </charProp> <charProp> <localName>mathOperatorInfixRightSpace</localName> <value>mediummathspace</value> </charProp> <charProp> <localName>mathOperatorPrefixLeftSpace</localName> <value>0em</value> </charProp> <charProp> <localName>mathOperatorPrefixRightSpace</localName> <value>veryverythinmathspace</value> </charProp> (these are the operator dictionary lspace/rspace values for common operators such as '+', '±', and '−', as recommended in https://www.w3.org/TR/MathML3/appendixc.html#oper-dict.entries-table). It is expected that for HTML renderings, the MathML formulas will be slightly transformed so that the @href linking will be replaced with the SVG representation that is taken from the linked declaration. This transformation process might then, after analyzing whether the operator is used as a prefix, an infix or a postfix, insert explicit <mspace with="mediummathspace"/> spacers around <mo><mglyph src="pm.svg"/></mo> if the default spacing is not satisfactory. Likewise, required stretchiness of a custom fence operator might be achieved by scaling the SVG content to match the box size of the MathML expression that it delimits (haven’t tried though how to make this work in practice). For PDF generation through LaTeX, we’d look up the LaTeX macros in the <glyph> declarations, and leave any spacing issues to the math operator declaration in the TeX styles. This is our treatise on how to refer to custom symbols from MathML. Do you share the conclusions that we arrived at, or would you pursue a different approach? Gerrit -- Gerrit Imsieke Geschäftsführer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 [hidden email], http://www.le-tex.de Registergericht / Commercial Register: Amtsgericht Leipzig Registernummer / Registration Number: HRB 24930 Geschäftsführer: Gerrit Imsieke, Svea Jelonek, Thomas Schmidt |
Free forum by Nabble | Edit this page |