next up previous
Next: TEX Interface Up: idoc.itx (ITRANS doc) Previous: itrans Mechanism

Subsections


Input Format

itrans makes use of an IFM file -- Indian language font metric file, which is a ASCII file containing descriptions on how to generate the Indian language characters from the basic characters available in the font.

Sidenote: This IFM file format is an itrans specific concept, it allows all character composition directives to be loaded in at runtime, making it easy to support many different Indian languages. The IFM file is an ASCII text file, and all IFM files end in the suffix .ifm.

itrans scans through the input text, and copies everything to the output unchanged, except for portions between marker words, such as #marathi and #endmarathi . Some eight-ten different marker words are available, see the section 2.4 for more information. All english text between these words is mapped into Indian language characters, based on the transliteration map in the IFM file.

At the beginning of the input file, the user has to specify the IFM file, and the name of the TEX or PostScript or HTML command that changes the font to the Indian language font. For example, if the IFM file is named dvnc.ifm, and the font is available through the \devnf TEX command, the following two lines should be present in the input file:

#marathiifm=dvnc

#marathifont=\devnf

This also assumes the user will be using the markers #marathi and #endmarathi , see 2.4 for all the other language markers and commands.

Once the above initialization is made, the #marathi marker then specifies the beginning of the marathi transliterated text, and makes use of the specified IFM file (dvnc.ifm). At that point, itrans also outputs the command (\devnf) specified in the #marathifont directive. This command should change the font to the devanagari font, and may do other things such as change the baselineskip length, etc.

Note that both the TEX interface and the Direct Text interface (both PostScript and direct Text HTML mode) follow identical input text requirements. For further examples, see the sample documents provided. All transliterated files have been given the file extension .itx. [Older ITRANS version also used .ips extensions for direct PostScript output, but since ITRANS version 5.0, the \char35 output command has been added which allows specifying the output mode in the input file itself.)

The ``#include='' command

Itrans accepts an ``include'' filename command in the input. Syntax:

#include=<filename>

This command can appear anywhere in the input document, and Itrans behaves as if the contents of that file were actually present at that point in the document.

This ``include'' comamnd can be nested in multiple files (to a compiled-in maximum).

The ``#output='' command

This command should be the first ITRANS command in any input file, that is, it should appear before any other #$<$command$>$. It can be used to direct ITRANS to produce kind of output - TeX, or PostScript, or direct Text HTML output. So, instead of using arguments to ITRANS (such as -P or -7 or -8 or -U), users can include this command in the input file itself, making it clear what the input file is to be used for. The four valid options for this command are:
#output=HTML_7
#output=HTML_8
#output=UTF_8
#output=PostScript
#output=TeX

The ``#endfont='' command

This command is generally of use in HTML output modes only, though if needed, it could be used in any mode.

This command allows the user to specify a string that will be echoed to the output file whenever any #end$<$language$>$ is seen.

For example, it is useful to use this in HTML documents, where every end of Indic script needs to print out a </FONT> command, and this can be done automatically by specifying:

#endfont=</FONT>

This, in conjunction with something like:

#hindifont=<FONT FACE=name SIZE=size>

makes it easy to use ITRANS in HTML output mode.

Note that there is single #endfont command, and it applies to all language markers.


#$<$lang$>$, #end$<$lang$>$, and ##markers

You can use any of these marker sets to delimit the Indian language text. The marker names indian/marathi/hindi/tamil do not actually do anything by themselves, but make use of the corresponding command names to load in the IFM file or output the font changing command string. So, use any one of the sets you feel suits your needs best, each can be used for every language supported for ITRANS, the marker is just used to enter ITRANS mode, the actual language is always correctly recognized based on the IFM file. Since the marker words are all long, a shorter version of markers is also available, and the short markers are enabled by default, though it is possible to turn them off.

  1. indian marker.

    To set the IFM file name: #indianifm=XXX.ifm

    To set the font command name: #indianfont=YYY

    Start Marker: #indian

    End Marker: #endindian

  2. hindi marker.

    To set the IFM file name: #hindiifm=XXX.ifm

    To set the font command name: #hindifont=YYY

    Start Marker: #hindi

    End Marker: #endhindi

  3. sanskrit, marathi, tamil, telugu, bengali, gujarati, roman, kannada, gurmukhi.

    Just as for hindi and indian, there are markers for all these languages. Follow above examples, replace indian with $<$language$>$ as required.

  4. ## short marker.

    These markers are activated by default. To turn them off, use the #ignoreshortmarkers command.


``#useshortmakers'' and ```#ignoreshortmarkers''

The short marker is a toggle marker. When scanning text in non-ITRANS mode (non Indic text), if a short marker is seen, it implies restoring back ITRANS processing, using whatever language marker was last encountered. Then, the next seen short marker implies ITRANS should exit processing of the indian language text.

This, if the input text has the following input:


#hindi $<$some text$>$ #endhindi

$<$some more text$>$

## $<$short marker text$>$ ##


then the first short marker seen above will be considered equivalent to #hindi, since that was the last ITRANS marker seen in the text at that point.

If the short marker is the first marker seen in the text, i.e., there was no other ITRANS marker seen until that point, then the ## shortmarker will be taken to be equivalent to #indian.

``#usecsx''

The CS/CSX input encoding can be accepted along with the ITRANS encoding when this command is used. See the icsx.itx document for more inforamtion on CS/CSX support in ITRANS.

``#endwordvowel=''

See the section 9.5 for info on this command.


next up previous
Next: TEX Interface Up: idoc.itx (ITRANS doc) Previous: itrans Mechanism

2009-12-04
ITRANS Home Page: http://www.aczoom.com/itrans/