Eric Lease Morgan-2
I have written a little hack I call tei2json.pl which allows one to summarize the structure of Early English poetry and prose (TEI documents). From the blog posting: [1]

  Some of my work is really an effort to reverse engineer good work
  done by the late Sebastian Rahtz. For example, Mr. Rahtz cached a
  version of the TCP corpus, transformed each item into a number of
  different formats, and put the whole thing on GitHub. As a
  part of this project, he created metadata files enumerating what
  TEI elements were in each file and what attributes were
  associated with each element. The result was an HTML display
  allowing the reader to quickly see how many bibliographies an
  item may have, what languages may be present, how long the
  document was measured in page breaks, etc. One of my goals is/was
  to do something very similar.

For example:

  * http://dh.crc.nd.edu/tmp/early-print/A00002.json
  * http://dh.crc.nd.edu/tmp/early-print/A00002.htm

  * http://dh.crc.nd.edu/tmp/early-print/A00395.json
  * http://dh.crc.nd.edu/tmp/early-print/A00395.htm

One can temporarily see additional sample input and output file online. [2]

[1] blog posting - http://blogs.nd.edu/emorgan/2017/01/tei2json/
[2] temporary input & output - http://dh.crc.nd.edu/tmp/early-print/

Eric Lease Morgan
University of Notre Dame