Aller au contenu

Introduction

Perhaps you’ve heard of, or read about, something called “TeX”, “LaTeX” or “pdfLaTeX”—or any one of the multitude of similar-sounding terms—but you aren’t quite sure what they actually mean? If so, then this article is for you: a non-technical background to explain the many variations of TeX-based software—what they mean and why they exist. For the moment we’ll simplify the discussion by using the generic term “TeX” but later we’ll give the context and meaning of its many derivatives and variations: LaTeX, pdfTeX, pdfLaTeX, XeTeX, XeLaTeX, LuaTeX, and LuaLaTeX. You might have seen some of those terms within Overleaf’s menu where it allows you to select your preferred “Compiler”:

Choosing a LaTeX compiler in Overleaf

Unless you are a seasoned TeX user, or familiar with its ecosystem, the many “varieties of TeX” can be confusing; however, by the end of this article you should feel much more informed and at ease when engaging with any colleagues, authors or journal editors who are conversant with TeX-based terminology.

The context: 40 years of development

TeX’s roots date back to the late 1970s and the decades following its creation have seen the development of numerous TeX-based typesetting programs which provide considerable enhancements and additional functionality compared to the original TeX program. Those new to STM publishing, or considering it as a career, may be surprised to learn that typesetting software whose origins date back some 40 years is still in widespread use by technical authors—and forms a critical component of many modern publishing workflows through services such as Overleaf.

TeX is not just for mathematics

It is a common, though understandable, misconception that the use of TeX is restricted to scientific and technical disciplines; specifically, the typesetting of complex mathematics. Whilst it finds most users in those domains, TeX-based software is widely used for the production of non-mathematical content—because of its high quality of output and incredible versatility. In addition to typesetting mathematics, the latest versions of TeX (called XeTeX and LuaTeX) support modern font technologies (OpenType), Unicode-based text input, OpenType-based math fonts (as pioneered by Microsoft Word), multilingual typesetting (including Arabic and other complex scripts), direct output to PDF and much more besides. For example, here is a demonstration of XeTeX’s multilingual typesetting with complex-script languages, including Arabic, Sanskrit, Hindi, Chinese, Japanese, Korean, Greek and Thai. Or, if you are interested in cooking, how about producing a recipe booklet?

The genesis of TeX: A brief history

The American historian Daniel J. Boorstin once noted that:

“Trying to plan for the future without a sense of history is like trying to plant cut flowers.”

In keeping with the ethos of that quote we’ll start with a brief history of TeX: where did it come from, who created it—and why?

On 30 March 1977 the diary of Professor Donald Knuth, a computer scientist at Stanford University, recorded the following note to express his dissatisfaction with the quality of typeset proofs he’d just received for volume 2 of his book series The Art of Computer Programming:

“Galley proofs for vol. 2 finally arrive, they look awful... (typographically). I decide I have to solve the problem myself.”

The quote above is from page 482 of Digital Typography by Donald E. Knuth. That small entry in Professor Knuth’s diary marked the catalyst for a programming journey which lasted many years and resulted in the creation of typesetting software capable of producing exquisitely typeset mathematics and, of course, beautifully typeset text: a program that Knuth named TeX.

Knuth is a brilliant computer scientist and whilst developing TeX he, and his colleagues, designed new and sophisticated algorithms to solve some very complex typesetting problems: including automatic line breaking, hyphenation and, of course, mathematical typesetting. As part of TeX’s development Knuth needed fonts for use with his typesetting software so he developed his own font technology called MetaFont—although we won’t discuss that in any detail here.

TeX was a huge success

A number of reasons contributed to TeX’s popularity, including:

  • High quality typesetting: In addition to the sophisticated algorithms built into TeX, Knuth’s extreme attention to typographic detail resulted in TeX’s ability to produce very high quality typeset mathematics and text.
  • TeX is programmable: Knuth gave TeX its own programming language. Users could write “TeX macros” (a collection of TeX commands) which gave them a great deal of control over TeX’s typesetting process. The programmability of TeX is an important topic and one we’ll discuss in more detail below.
  • TeX is free: Knuth made TeX available at no cost—including its source code (i.e., program code).
  • Portability: Knuth engineered TeX’s internal design to ensure that it was highly portable and could run on many different computer systems. Given the same input, TeX would produce identical output, irrespective of the system on which it was running—that included producing the same line breaks and page breaks.

Authors were delighted because TeX enabled mathematicians, physicists, computer scientists, and others, to have precise control over the typesetting and visual appearance of their work. Authors could use TeX to write their articles or books and submit their manuscripts (TeX files) to publishers, feeling somewhat more confident that their proofs would not suffer the same fate as Knuth’s did in 1977.

Knuth still maintains TeX, but new “versions” have evolved

During the 1980s Knuth decided to freeze active development of TeX because he wanted to ensure the long-term stability of his software: he decided that no new features would be added to TeX. In 1989 Knuth was persuaded to make one last set of changes to TeX—primarily to move from 7-bit to 8-bit character sets. In 1990 Knuth published an article called The Future of TeX and MetaFont in which he stated that his development of TeX (and related software) was at an end but that others were free to build on the work he had done.

Today, some 4 decades after that fateful genesis of TeX, Knuth still continues to make periodic bug fixes to the master source code of TeX—which is available from CTAN (Comprehensive TeX Archive Network). Those updates take place every few years with the latest being The TeX tuneup of 2014 as reported in the TeX journal TUGboat—the next tuneup is scheduled for 2021! During those tuneups Knuth does not add new features to TeX, they really are just bug fixes—although TeX is considered by many to be the most bug-free program in the world.

A note on “versions” of TeX: When writing about TeX it is extremely important to emphasize that, strictly speaking, there is only one definitive version of “TeX”: the one that Knuth wrote and maintains. In fact “TeX” (denoted by its typeset logo) is a trademark of the American Mathematical Society. Knuth did not exclude or prevent others from using his code to develop software based on TeX—extending Knuth’s software to add features and functionality beyond those that Knuth had chosen to implement. However, Knuth did, as is his absolute right, make one strong stipulation, which can be found in the source code to TeX:

If this program is changed, the resulting system should not be called TeX; the official name TeX by itself is reserved for software systems that are fully compatible with each other.

Consequently, it is not wholly accurate to refer to programs derived from Knuth’s source code as “versions” of TeX. Strictly speaking, TeX-based software derived from TeX’s source code should be referred to as “adaptations” or “derivatives” but for simplicity we’ll continue to make use of the term “versions”, mindful of the caveats noted here.

Despite Knuth freezing development there was still a strong desire for new TeX features, or improvements to existing ones, and over the years there have been various attempts to develop “the next generation of TeX"—some have been very successful, others have not. It is an interesting history but not one we can pursue here—the intrepid reader can find a much more complete account in an article by Frank Mittelbach: TUGboat, Volume 34 (2013), No. 1.

During the 1990s some parts of TeX were showing their age—including its font handling and the file format used for TeX’s output: the so-called DeVice Independent format, or DVI. Most users converted TeX’s output into PostScript but by the mid-1990s PostScript was being eclipsed by the rise of PDF as the preferred output file format. And, of course, there was TeX’s place in a world which now had the internet. However, despite these shortcomings, many of TeX’s core algorithms—line breaking and justification, hyphenation and mathematical typesetting—were still unsurpassed. Developers wanted to build on TeX’s strengths but update those areas where the world had moved on and TeX really did need to catch up.

What’s in a name?

It has become a convention that software which is derived from TeX receives its name by adding a prefix to the word “TeX”: giving program names such as pdfTeX, XeTeX and LuaTeX. Although these programs are derived from Knuth’s original TeX software they contain features and functionality which aren’t available in Knuth’s original version. Collectively, these executable programs are often referred to as TeX engines—think of them as the software which drives the typesetting process. A brief description of pdfTeX, XeTeX and LuaTeX is provided at the end of this article.

LaTeX: a set of macros, not a TeX engine

We’ve mentioned that TeX-based programs derived from Knuth’s software have names such as pdfTeX or XeTeX; naturally, you might think that LaTeX is just another version of Knuth’s software. Sadly, it’s not quite that straightforward. LaTeX is not a version of the executable TeX typesetting program: it is a collection of so-called TeX macros, a topic we will discuss in more detail below. The macros which comprise LaTeX were written in the mid-1980s by Leslie Lamport—who gave that package its name. Like TeX engines themselves, the LaTeX macro package is still actively developed and the interested reader can find out more on the LaTeX project website.

So, what does TeX actually do?

As noted, TeX is a typesetting program but if you envisage a slick graphical user interface (GUI), such as Adobe InDesign, then think again. At the time of TeX’s genesis (the late 1970s) today’s sophisticated graphical interfaces and operating systems were still some way into the future and TeX’s modus operandi still reflect its heritage, even for the new modern variants of TeX.

Those accustomed to using modern page layout applications, such as Adobe InDesign, may be surprised to see how TeX works. Suppose someone gives you a copy of some TeX software (but no fancy text editors) and you decided to run it and see what happens: what would you see? In truth, not a lot! TeX uses a so-called command-line interface: it has no fancy graphical screen into which you type your text to be typeset or point, click, tap to set options or configurations. If you were to run one of the TeX programs (engines) you would see a simple screen with a blinking cursor—for example, running LuaTeX on a local machine (luatex.exe on Windows):

Running LuaTeX under Windows

Of course, those who use TeX-based software through Overleaf are presented with a much more convenient and author-friendly interface.

Understanding TeX’s programmability

Clearly, if you want a piece of software to typeset something you will need to provide it with some form of input (material to typeset) and then give it some “instructions” telling it what you want to achieve—such as which fonts to use and the page size of the final document, among many other details. If you’re using a tool such as Adobe InDesign you can choose from various menus, screens and dialog boxes to set the parameters which give you some influence and control over the behaviour of the software. However, what if no such screen exists and all you have is a blank screen and flashing cursor? This is where TeX’s programmability comes into play.

TeX the program and TeX the programming language

Because TeX does not have a built-in graphical user interface through which you can control and direct its behaviour, you have to provide it with explicit written instructions to guide it through the typesetting process. You create a text file containing not only the text of your work but also the explicit typesetting instructions (or commands) that tell TeX what you want it to do. Once you have written your TeX file called, for example, mybook.tex, you then tell TeX to process it and, if all is well, you receive a beautifully typeset document “mybook.pdf” as the output.

Those “typesetting instructions” used to control TeX’s behaviour are actually written in a programming language—one specifically designed by Knuth to provide TeX users with a great deal of control over his sophisticated typesetting program. It is this typesetting programming language that gives TeX its incredible power and flexibility.

We can now start to see that TeX is, in fact, a piece of typesetting software that users can control by providing it with instructions written in a special programming language. You should think of “TeX” as an executable program (typesetting engine) which can be controlled by your instructions written in the TeX typesetting language. Of course, because TeX is controlled by a programming language there is always the possibility of making mistakes—bugs in your TeX file which TeX cannot understand or simply do not give the results you were expecting. It’s an everyday “joy” that users of TeX-related software are all too familiar with. Understanding that TeX engines are programmable is the key to truly appreciating the differences between LaTeX, pdfTeX, pdfLaTeX, XeTeX, LuaTeX and so forth. Each TeX engine (program) understands hundreds of so-called primitive commands. Primitive in this sense does not mean “simple” or “unsophisticated", it means that they are the fundamental building blocks of the TeX language. A simple, though not wholly accurate, analogy is the alphabet of a particular language: the individual characters of the alphabet cannot be reduced to simpler entities; they are the fundamental building blocks from which words, sentences etc are constructed.

And finally: from TeX to pdfTeX, XeTeX and LuaTeX

Just to recap. When Knuth wrote the original version of TeX he provided it with the features and capabilities that he thought were sufficient to meet the needs of sophisticated text and mathematical typesetting based on the technology environment of that time—including processing power and memory of computers, font technologies and output devices. Knuth’s specification of TeX included its internal/programming design (and typesetting algorithms) plus, of course, defining the TeX language used to “mark up” the material to be typeset. What we mean by “defining the TeX language” is defining the set of several hundred primitive commands that the TeX engine can understand—and the action taken by the TeX engine whenever it encounters one of those primitives during the processing of your input text.

Naturally, technology environments evolve: computers become faster and have more storage/memory, new font technologies are released (Type 1, TrueType, OpenType), file output formats evolve (e.g., the move from PostScript to PDF) and Unicode became the dominant way to encode text. Naturally, TeX users wanted those new technologies to be supported—in addition to adding new features and functionality not present in Knuth’s original TeX program.

As noted earlier, in the 1980s Knuth decided to freeze his development of TeX: no more new features in his version. With the genuine need to update/modernize Knuth’s original software, TeX programming experts have taken Knuth’s original source code and enhanced it to add new features and provide support for modern typesetting technologies. Those new versions of TeX not only provide additional features (e.g., outputting direct to PDF, supporting OpenType fonts) they also extend and adapt the TeX language too: new primitives are added to Knuth’s original set, thus providing users with greater programming power and flexibility to control and take advantage of the additional functionality built into new TeX-based typesetting engines.

Each new TeX engine is given its own name to distinguish it from Knuth’s original software: hence you now have pdfTeX, XeTeX and LuaTeX. Those three TeX engines are not 100% compatible with each other and it is quite possible to prepare input that can be processed with one TeX engine but fail to work with others—simply because a particular TeX engine may support primitive commands that the others do not. But all is not lost: enter the world of TeX macros!

Primitives are not the whole story: TeX macros

We have mentioned that each TeX engine supports a particular set of low-level commands called primitives—but this is not the full story. Of course, many of the same primitives are supported by all engines but some are specific to a particular engine. TeX achieves its true power and sophistication through so-called TeX macros. The primitive commands of an engine’s TeX language can be combined together to define new commands (called macros) which are built from combinations of low-level primitive instructions and/or other macros. TeX macros allow users to define new commands that are capable of performing complex typesetting operations, saving a great deal of time, typing and programming errors. In addition, TeX engines provide primitives that can determine which TeX engine is being used to typeset a document—so that a TeX engine can, on-the-fly, adapt its behaviour depending on whether or not it supports a particular primitive it might encounter. If a certain primitive is not supported directly but can be “mimicked” (using combinations of other primitives) then all is usually well—but if the chosen TeX engine really cannot cope with a particular primitive then typesetting will fail and an error will be reported. The TeX language is, after all, a programming language albeit one designed to solve typesetting problems; but as a programming language TeX is extremely arcane and works very differently to most programming languages you are likely to encounter today.

So, finally, what is LaTeX?

We’ve talked about various versions of the TeX engine—from Knuth’s original TeX to its descendants of pdfTeX, XeTeX and LuaTeX, and briefly discussed TeX as a typesetting language: primitives, programming and the ability to write macros. Finally, we are in a position to discuss LaTeX. The logical extension to writing individual TeX macros for your own personal use is to prepare a collection of macros that others can also use—a package of macros which provide some useful tools and commands from which other (La)TeX users can benefit. And that is precisely what LaTeX is: it is a very large collection of complex and sophisticated macros designed to help you typeset books, journal papers and so forth. It provides a wealth of features to control things like page layout, fonts and a myriad of other typesetting details. Not only that but LaTeX was designed to be extensible: you can plug-in additional, more specialist, macro packages written to solve specific typesetting problems—e.g., producing nicely typeset tables, typesetting particularly complex forms of mathematics, chemical diagrams and so forth. If you visit the Comprehensive TeX Archive Network you can choose from hundreds, if not thousands, of macro packages that have been written and contributed by users worldwide.

So, if someone says they are typesetting their work with LaTeX then they are telling you only part of the story. What they really mean is that they are using the LaTeX macro package with a particular TeX engine—usually pdfTeX but maybe XeTeX (for multilingual work) or LuaTeX (perhaps for advanced customized document production). Frequently you will see terms such as pdfLaTeX, XeLaTeX or LuaLaTeX: but these are not actually the names of TeX engines, all they signify is which TeX engine is being used to run the LaTeX macro collection:

  • pdfLaTeX means using the LaTeX macro package with the pdfTeX engine
  • XeLaTeX means using the LaTeX macro package with the XeTeX engine
  • LuaLaTeX means using the LaTeX macro package with the LuaTeX engine

For example, to say “I am using pdfLaTeX” means “I am preparing my typeset document using the LaTeX macro package and processing it with the pdfTeX engine”. Equally, if anyone says to you that they are “using TeX” then you should now see that statement probably does not tell you the whole story—that is, unless they are using Knuth’s original version of TeX, which is quite unlikely these days.

From TeX engines to TeX installations

We’ve briefly explored the history of TeX and seen that its modern derivatives—pdfTex, XeTeX and LuaTeX—have added many new features and functionality to Knuth’s original software. To round-off our discussion we’ll take a quick look at the three most popular TeX engines and briefly review TeX installations.

Key features of pdfTeX, XeTeX and LuaTeX

Here is a summary of some key features provided by the three most popular TeX engines:

  • pdfTeX: As its name suggests, it provides the ability to output directly to PDF, saving users from having to convert TeX’s native DVI format to PostScript and converting that to PDF via GhostScript or Acrobat Distiller (NB: some users also go from DVI to PDF via tools such as dvipdf). pdfTeX also introduced refinements to TeX’s typesetting—such as margin kerning (character protrusion). pdfTeX was developed by Hàn Thế Thành and its implementation details formed the basis for his PhD dissertation Micro-typographic extensions to the TeX typesetting system.
  • XeTeX: It introduced the ability to directly read/input TeX files saved or created in UTF-8 encoding, added sophisticated handling of multilingual typesetting—including complex scripts such as Arabic. One particularly useful feature is that XeTeX enabled very easy and convenient use of OpenType fonts and later versions added OpenType-based typesetting of mathematics. XeTeX was developed by Jonathan Kew although subsequent development has been led by other members of the TeX community.
    • First release date (Wikipedia): Initially Mac OSX only, April 2004
    • Further information: http://tug.org/xetex
  • LuaTeX: Arguably the most powerful and versatile of all the TeX engines, LuaTeX is derived from pdfTeX (in addition to many other sources/libraries) and provides significant additional functionality. The key innovation is the addition of the Lua scripting language, allowing very sophisticated control of the TeX engine through an easy-to-use scripting language. It also supports UTF-8 text encoding, OpenType-based mathematical typesetting and very advanced use of OpenType fonts for text typesetting—although the mechanism is different to the one deployed by XeTeX. LuaTeX also integrates the MetaPost graphics language, allowing users to fully exploit MetaPost’s sophisticated drawing capabilities. In addition to the preparation of books and journal papers, LuaTeX is ideal for advanced or customized document engineering—a powerful feature is LuaTeX’s extensibility through “plugins” written in C/C++ and loaded as a .DLL (Windows) or .so (on Linux). LuaTeX is developed by a team that includes Hans Hagen, Taco Hoekwater, Luigi Scarso and others.
    • First release date: Development work started around 2006 with numerous beta releases culminating in a version 1.0 release in September 2016. It is still undergoing very active development.
    • Further information: www.luatex.org

TeX installations: TeX Live

You may be wondering how users gain access to the various TeX-based typesetting programs and the associated LaTeX macro packages? The answer is to use a so-called TeX distribution which users can download and install—modern TeX installations now contain far more than just the TeX-based typesetting engines. Over the years, TeX users around the world have developed and contributed an astonishing array of TeX-related tools and software in addition to hundreds of fonts and, of course, a vast number of specialist LaTeX packages. This enormous collection of software is managed and updated by leading members of the TeX community and culminates in yearly releases of a distribution called TeX Live—which will also contain the latest stable releases of TeX engines for each supported platform (Windows, Linux etc). Windows-based users frequently use another distribution called MiKTeX.

Overleaf: Supporting the LaTeX ecosystem

The context of today’s research landscape is, of course, a highly interconnected and collaborative environment—including working together to write and prepare papers for publication. Distributing and sharing LaTeX-based papers by e-mail, including any associated graphics or data, can be frustrating—not only the problem of version control (and file sizes) but the real possibility that one or more co-authors has a LaTeX installation which cannot process the LaTeX file; for example due to missing fonts, variations in package availability or out-of-date LaTeX releases. A co-author may be travelling or temporarily based at a location without access to LaTeX. It all adds up to a potentially frustrating scenario—especially when close to a submission deadline! Academic institutions or commercial enterprises wishing to provide their staff or teams with access to LaTeX might need to install and then maintain, update and support a comprehensive, enterprise-wide TeX system. It can be a complex undertaking, perhaps requiring specialist expertise which might reside in a single individual. If your LaTeX installation expert leaves for another job it can be a challenge to replace them. TeX installations have to be actively maintained because the TeX world is not static and your installation can soon become out of date—much to the annoyance of your users who might need to take advantage of newer or more advanced TeX-based tools. New and updated LaTeX packages are continually released as are additional fonts and TeX-related software tools. Additionally, TeX engines, especially LuaTeX, continue to undergo development.

Overleaf: LaTeX solutions for authors and institutions

Overleaf provides authors and their institutions with a cloud-based LaTeX authorship and project management system—backed by powerful servers equipped with a state-of-the-art TeX installation.

Overleaf for authors

Using Overleaf’s browser-based LaTeX editor, authors can create, share, collaborate and manage their LaTeX-based projects from wherever they happen to be working. All you need is internet access and a device with a modern browser.

Overleaf provides a very convenient way to use LaTeX, which includes:

  • No more e-mailing LaTeX files and huge figures—just send your colleagues a link to your project on Overleaf to start collaborating and sharing.
  • Outstanding technical support—contact us at anytime with your questions about using LaTeX.
  • Co-authors share the same LaTeX installation—no need to rely on local installations or be limited by an out-of-date LaTeX system.
  • You can make your documents stand out—choose from a wide range of ready-to-use modern OpenType fonts, or upload additional fonts to your project. Simple and easy to use with the fontspec package.
  • No need for you to run the TeX engines—Overleaf does it for you. Save time and take advantage of the fast, real-time previewing of your typeset LaTeX document or switch to a manual refresh if you prefer.
  • Submit your paper directly to participating journals and preprint services or download your entire LaTeX project into a single ZIP file for onward transmission to a journal of your choice.
  • Access to a fully-equipped Linux server: the tools and utilities you might need for graphics and text processing—TeX’s \write18 has never been so happy!
  • Select the TeX engine to process your LaTeX code or let Overleaf detect and use the engine most suited to processing your document. Overleaf supports LaTeX processing with pdfTeX, XeTeX, LuaTeX and dvipdf.

Overleaf for institutions and enterprises

Provide your communities and teams with access to a state-of-the-art LaTeX installation—but without any of the management overhead. No more technical support calls about LaTeX installations, no need to worry about Windows, Linux or Mac OS—take advantage of Overleaf’s technical infrastructure. Your community of LaTeX users will value the suite of features Overleaf provides to manage their LaTeX projects—from uploading files and graphics through to sharing project links that enable teams to work together on the same paper. Deliver an outstanding service to researchers—foster collaborations to support the very best research. Thank you for reading this article, we hope you found something of interest. If you have any questions about Overleaf, please feel free to contact us—the Overleaf team look forward to hearing from you.

Happy (La)TeXing!

Overleaf guides

LaTeX Basics

Mathematics

Figures and tables

References and Citations

Languages

Document structure

Formatting

Fonts

Presentations

Commands

Field specific

Class files

Advanced TeX/LaTeX