|
The CSL are a truly electronic publishers: not only do we produce material online, but
our whole publishing system is an electronic publishing system, designed to maximize
the information we hold, and allow our products to be recast to any existing (and
future) platform with ease. The CSL create, edit and publish our texts using the
gold-mark standard of electronic publishing: TEI-conformant XML. Whereas other
publishers — who may claim to be electronic publishers — work within traditional
formats such as word-processing applications right through to print, and then
produce electronic versions as a by-product of print, we produce a single-source
XML representation of the text.
XML is a platform-neutral mark-up language that is used to encode a text according
to the information in the text, rather than the formatting of the text. Thus, whilst a
word-processing file may mark some text to be bold, some text to be italics and some
to be oblique, the CSL would comparatively mark text as a commentary lemma, a
section of foreign text, or a pun translation. Formatting information is only added at
the point of delivery, of which more below. The principle of keeping information
separate from formatting has long been recognised as a key method to true
single-source publishing. Although, for the ease of authors, we produce
stylized proofs at regular points during text preparation, the source remains independent
of these proofs.
The information encoded is extensive, and covers more than may be visible in the
traditional publishing format — the book. In this, we adhere to another principle of
eletronic publishing: maximal informativeness. This is particuarly important in the
case of bilingual (or more) texts such as we handle: whilst a book may align the languages
at the page- or paragraph-level, the CSL’s XML source aligns the languages at a much
more detailed level: usually at the level of the verse, but occasionally right down to the
clause or even a couple of words. Most CSL texts will be produced with three different
top-level texts being aligned: the English translation, the Sanskrit in Roman, and the
Sanskrit in Devanāgarī. However, drama texts will contain additional alignments for chāyā,
and all texts may contain alignments for śleṣa or for variant readings. The first Sanskrit text
to be handled in the XML is the Romanized version because this adheres to the principle
of maximal informativeness: the Romanized version is marked with sandhi disambiguation
(including some innovations of the CSL) and samāsa divisions. The Romanized version is
then converted automatically to Devanāgarī, and proofed for any specific quirks that may
have escaped the conversion. An additional, editorial benefit of using XML and this level of
mark-up is that we have a very high level of automated editorial checks including stanza
numbering, quotation mark balancing (which is very important in Sansrit texts like the
Mahabhārata or Pañcatantra with many nested narratives) and so on.
Once a text has been fully prepared in XML, we then produce output using a range of
scripts. At present, the major output sources are to TeX for print, and to a concordancing
indexer/display engine for the online corpus. As styles are applied at this point only, and
automatically by a computer, the layout of any of our outputs is always consistent: there
is no danger of editorial lapses resulting in inconsistently formatted text.
About the Author Stuart Brown is the director of OxfordML, a digital publishing and XML consultancy. He is responsible for developing an XML-based editorial and publishing system as well as the corpus search facility on this site.
|
|