As humans, we prefer to view and edit text files on a line-by-line basis. Once we think a line of text is long enough we hit “return” (in the text editor) to signal the end of that line. Behind the scenes, your text editor will interpret that as an instruction to add a newline character at the point where you decide to break a line.
However, if only it were that simple: the trouble is that different operating systems have varying notions of what constitutes a newline character. To make matters worse, Windows will treat newline characters differently depending on whether a file is opened in so-called binary mode or text mode. The result is that, depending on the host operating system, lines in a text file can be terminated by varying combinations of characters called carriage return (ASCII/Unicode character 13) and line feed (ASCII/Unicode character 10): denoted by \r
and \n
respectively.
Clearly, to be system-independent TeX needs a way to deal with the vaguries introduced through the different characters used to terminate a line within text files it needs to read and process.
You may, or may not, be surprised to learn that TeX engines (including LuaTeX and XeTeX) read input files a line at a time: they don’t read the entire text file into memory. Even though most text files processed by TeX engines are miniscule compared to the available memory on modern devices, each line in the file is individually read and stored in a small internal buffer. But, of course, TeX’s process of reading and storing a line has some additional twists.
\endlinechar
commandWhen TeX reads another line of text from an input file it performs two “housekeeping tasks”:
These two processes happen before TeX actually starts to scan the characters contained in the line itself: think of them as a form of “housekeeping” in preparation for the next stage of processing (scanning). So, during this initial stage of the line-reading process TeX has stripped off all platform-dependent line endings (and any trailing whitespace): so how will TeX know (detect) where that line ends? TeX has one more “trick” up its sleeve: the \endlinechar
command.
To avoid the problem of platform-dependent newline characters TeX introduces the concept of \endlinechar
, a user-definable parameter that TeX uses to insert its own end-of-line character to the very end of a line of text it has just read from a file. Note again that this happens before TeX actually starts scanning the characters—it is the final step is TeX’s “housekeeping” before it is ready to start reading (scanning) the actual characters contained in the line.
TeX will use the value stored in \endlinechar
to add its own end-of-line terminator if, and only if, \endlinechar
is appropriately defined—in Knuth’s TeX that means it has to have a value that is >-1 and < 256. Typically, \endlinechar
is assigned the value of 13: the carriage return character—usually denoted by \r
within programming literature.
But if you write \endlinechar=-1
somewhere within your input then the next time TeX reads a line of text from a file it will not add any additional terminator to the end of a line. Consequently, your input will be treated as one long continuous string of text until you reset \endlinechar
to an appropriate value—typically 13 (\r
):
\endlinechar=13
One of TeX’s 16 category codes (value 5) is reserved to identify the “end of line” character which is usually the character that \endlinechar
inserts—which is inserted if (and only if) the value of \endlinechar
is set to an appropriate value.
Although these details are quite low-level they will be of interest to anyone who wants to explore writing macros which deal with reading lines of text.
\r
and \n
) added by your text editor when the file was saved. In addition:Aside: One of LuaTeX’s source code files, the one which has code to perform this stripping of spaces, contains the following note:
(Cited in the file luatex.c
) “David Fuchs mentions that this [space] stripping was done to ensure portability of TeX documents given the padding with spaces on fixed-record "lines" on some systems of the time, e.g., IBM VM/CMS and OS/360.”
\r
), which means that the character added in step (2) is usually character 13 (\r
)—but, of course, you can set \endlinechar to another value to achieve special effects via macro programming.\r
(character code 13) at the end of its internal buffer, TeX will, as usual, check its category code in order to decide what to do with it.\r
, category code 5) into a space character—this is how end of lines characters become spaces.\par
token. The following graphic gives a visual summary of steps (1) and (2): stripping newline characters and trailing space characters and inserting \endlinechar
ready for the task of scanning the input.