Join
Name

Email

The CSL are a truly electronic publishers: not only do we produce material online, but our whole publishing system is an electronic publishing system, designed to maximize the information we hold, and allow our products to be recast to any existing (and future) platform with ease. The CSL create, edit and publish our texts using the gold-mark standard of electronic publishing: TEI-conformant XML. Whereas other publishers — who may claim to be electronic publishers — work within traditional formats such as word-processing applications right through to print, and then produce electronic versions as a by-product of print, we produce a single-source XML representation of the text.

XML is a platform-neutral mark-up language that is used to encode a text according to the information in the text, rather than the formatting of the text. Thus, whilst a word-processing file may mark some text to be bold, some text to be italics and some to be oblique, the CSL would comparatively mark text as a commentary lemma, a section of foreign text, or a pun translation. Formatting information is only added at the point of delivery, of which more below. The principle of keeping information separate from formatting has long been recognised as a key method to true single-source publishing. Although, for the ease of authors, we produce stylized proofs at regular points during text preparation, the source remains independent of these proofs.

The information encoded is extensive, and covers more than may be visible in the traditional publishing format — the book. In this, we adhere to another principle of eletronic publishing: maximal informativeness. This is particuarly important in the case of bilingual (or more) texts such as we handle: whilst a book may align the languages at the page- or paragraph-level, the CSL’s XML source aligns the languages at a much more detailed level: usually at the level of the verse, but occasionally right down to the clause or even a couple of words. Most CSL texts will be produced with three different top-level texts being aligned: the English translation, the Sanskrit in Roman, and the Sanskrit in Devanāgarī. However, drama texts will contain additional alignments for chāyā, and all texts may contain alignments for śleṣa or for variant readings. The first Sanskrit text to be handled in the XML is the Romanized version because this adheres to the principle of maximal informativeness: the Romanized version is marked with sandhi disambiguation (including some innovations of the CSL) and samāsa divisions. The Romanized version is then converted automatically to Devanāgarī, and proofed for any specific quirks that may have escaped the conversion. An additional, editorial benefit of using XML and this level of mark-up is that we have a very high level of automated editorial checks including stanza numbering, quotation mark balancing (which is very important in Sansrit texts like the Mahabhārata or Pañcatantra with many nested narratives) and so on.

Once a text has been fully prepared in XML, we then produce output using a range of scripts. At present, the major output sources are to TeX for print, and to a concordancing indexer/display engine for the online corpus. As styles are applied at this point only, and automatically by a computer, the layout of any of our outputs is always consistent: there is no danger of editorial lapses resulting in inconsistently formatted text.


About the Author
Stuart Brown is the director of OxfordML, a digital publishing and XML consultancy. He is responsible for developing an XML-based editorial and publishing system as well as the corpus search facility on this site.