stripping text in angle brackets

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
Report Content as Inappropriate

stripping text in angle brackets

Michael Sperberg-McQueen
Christopher Currie asks for the name of an editor which can delete all
the text from one angle bracket to the next, as I claimed any editor
'worth its salt' could do.  He also asks how to tell the difference
between angle brackets that delimit markup and angle brackets that
belong in the text.  Fair enough.

One question at a time.  First:  the editor I use most under MS-DOS,
Kedit, does the deletion of all text between angle brackets this way:

    set arbchar on $
    change /<$>// * *

which works in all of my files because I don't ever use line ends within
markup.  If I did, I would use the commands above and then either handle
the rest semi-manually (a macro would help) or try once more (and
possibly succeed) to decipher the manual's discussion of changing the
editor settings to ignore line ends within search strings.

Handling angle brackets which do not delimit markup is a slightly
trickier question, which I skated over quickly in my previous posting;
it's a lot like handling backslashes which don't introduce keywords in
TeX, or dots in column one which don't introduce control words in Script
or troff:  in principle not too difficult though apt to produce
occasional surprises.

The quickest answer is to say there need not be any angle brackets in
the file except those which delimit markup (and that is the situation I
was quietly assuming, for simplicity, in my note).  SGML allows a simple
method for masking non-delimiter angle brackets:  the entities &le; and
&gt; will be interpreted as less-than and greater-than signs.

Undeniably, the matter becomes somewhat more complicated if we are
unable to ensure that the only angle brackets in our file are those
around markup.  It is *not* impossible to do it mechanically, though a
full exposition of the rules for SGML parsing is beyond the scope of
this note.  For any file of fewer than about twenty typed pages in
length, I personally would use a simple change-with-confirmation command
to ensure I didn't accidentally trash any document content, unless
I had access to an SGML processor, in which case I would use the
native capabilities of that processor to strip out the markup.  (If,
that is, I were for some reason determined not to process the file
in some interesting way but merely to strip out all the markup.  I
find it hard to become really involved in the problem of stripping
out all markup partly because I find it so hard to imagine a situation
in which that would be my goal.)

Michael Sperberg-McQueen