\documentclass{article}
\usepackage{xcolor}
\usepackage{graphicx}
\usepackage[breaklinks]{hyperref}
\usepackage{luaotfload}
\title{Complex-script shaping using \texttt{luaotfload} and HarfBuzz}
\date{\today}
\author{Graham Douglas}
\begin{document}
\maketitle
\def\artext{بِسْمِ ٱللَّٰهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ}
\def\diacritics{حَّ}
\def\fatha{َ}
\def\shadda{ّ}
\def\unshaped{{\ScheherazadeNoShape\artext}}
\def\testfont#1{\par{\pardir TRT\textdir TRT\parindent=0pt#1\artext}}
\font\ruqqaHarf=[ArefRuqaa-Bold.ttf]:mode=harf;script=arab; at 20pt
\font\ruqqaNode=[ArefRuqaa-Bold.ttf]:mode=node;script=arab; at 20pt
\font\ScheherazadeHarf =
{name:Scheherazade:mode=harf;script=arab} at 40pt
\font\ScheherazadeNode = {name:Scheherazade:mode=node;script=arab} at 40pt
\font\ScheherazadeNoShape = {name:Scheherazade:mode=node} at 40pt
\font\AmiriHarf = {name:Amiri:mode=harf;script=arab} at 12pt
\font\AmiriNode = {name:Amiri:mode=node;script=arab} at 12pt
\font\AmiriHarfBIG = {name:Amiri:mode=harf;script=arab} at 20pt
\font\AmiriNodeBIG = {name:Amiri:mode=node;script=arab} at 20pt
\section*{Introduction}
\noindent This project is processed by LuaHB\TeX{} to demonstrate the text-shaping capability of the \texttt{loaotfload} Lua library, comparing results from using \texttt{mode=harf} vs \texttt{mode=node}. We use three different Arabic typefaces, defining various \texttt{\string\font} ``versions'' by using different settings for \texttt{loaoftload}. In these examples we make use of the primitive commands \texttt{\string\pardir} and \texttt{\string\textdir} to set the correct paragraph and text directions. We use the Amiri and Aref Ruqqa fonts to demonstrate improved shaping results when using the HarfBuzz-based shaper via \texttt{mode=harf}.\vskip10pt
\section*{The \textit{unshaped} input: Unicode text}
\noindent Firstly, here are the input characters: \textit{unshaped} Unicode Arabic characters displayed in the order they are present in the Overleaf \texttt{main.tex} file---this is the so-called \textit{logical order}---the sequence in which the characters are read-in by the LuaHB\TeX{} engine:\vskip10pt
\noindent\unshaped\vskip10pt
Because the Arabic script is read right-to-left, the reading order---also called visual or display order---of the typeset glyphs is the reverse of the sequence in which the unshaped text (characters) are stored in the text file and read-in by the \TeX{} engine (or any other software).
\section*{The \textit{shaped} result}
Here, we'll use the \texttt{Scheherazade} font, which is included with \TeX{} Live. We'll create two instances of Scheherazade:
\begin{itemize}
\item\verb|\ScheherazadeHarf| which uses \texttt{luaotfload}'s node-based shaping (\texttt{mode=node}):
\begin{verbatim}
\font\ScheherazadeNode =
{name:Scheherazade:mode=node;script=arab} at 40pt
\end{verbatim}
\item\verb|\ScheherazadeNode| which uses
\texttt{luaotfload}'s HarfBuzz-based shaping (\texttt{mode=harf}):
\begin{verbatim}
\font\ScheherazadeHarf =
{name:Scheherazade:mode=harf;script=arab} at 40pt
\end{verbatim}
\end{itemize}
\subsection*{Using \texttt{\string\ScheherazadeHarf} (\texttt{mode=harf})}
\vskip10pt\noindent Here, the Unicode text has been \textit{shaped} using \verb|\ScheherazadeHarf|---defined using the \texttt{Scheherazade} font with shaping applied by HarfBuzz:\vskip5pt
\testfont\ScheherazadeHarf
\subsection*{Using \texttt{\string\ScheherazadeNode} (\texttt{mode=node})}
The next example uses the same Unicode text but it is \textit{shaped} using the font \verb|\ScheherazadeNode|---which was defined using the \texttt{Scheherazade} font with shaping applied by \texttt{luaotfload}'s built-in shaper written in Lua (node-based shaping):\vskip-5pt
\testfont\ScheherazadeNode
\vskip15pt
Note how HarfBuzz-based shaping and \texttt{luaotfload}'s \textit{built-in} shaping produce equally good results---using the \texttt{Scheherazade} font and these examples.
However, for other fonts, and/or scripts, HarfBuzz-based shaping can produce superior results. As of March 2021, page 13 of the \href{http://mirrors.ctan.org/macros/luatex/generic/luaotfload/luaotfload-latex.pdf}{\color{blue}\texttt{luaotfload} documentation} (as of version \texttt{3.17}) advises ``\texttt{harf} \textit{mode improves greatly the rendering of indic and arabic scripts and is highly recommended for such scripts}''.\vskip10pt
\section*{Forgetting to set the \texttt{script}}
\noindent Here, we defined the font \verb|\ScheherazadeNoShape| but deliberately omitted to set the correct script:
\begin{verbatim}
\font\ScheherazadeNoShape =
{name:Scheherazade:mode=node} at 40pt
\end{verbatim}
In this case, the text is unshaped because we did not provide \texttt{luaotfload} with the correct \texttt{script} setting---we omitted \texttt{script=arab} in the options provided to \texttt{luaoftload}:\vskip10pt
\testfont\ScheherazadeNoShape\vskip10pt
\noindent Note that the unshaped characters are shown in their visual (display) order.
\section*{Examples where HarfBuzz gives better results}
The following examples demonstrate fonts for which HarfBuzz, via \texttt{luaotfload}'s \texttt{mode=harf}, improves the shaping.
\subsection*{Amiri: \texttt{luaotfload}'s \texttt{mode=harf} vs \texttt{mode=node}}
The following example uses the \href{https://fonts.google.com/specimen/Amiri?preview.text_type=custom}{\color{blue}Amiri} font.
\par\noindent\hbox to 150pt{\texttt{luaoftload} with \texttt{mode=harf}:\hfil} {\textdir TRT \AmiriHarfBIG\artext}
\par\noindent\hbox to 150pt{\texttt{luaoftload} with \texttt{mode=node}:\hfil} {\textdir TRT \AmiriNodeBIG\artext}
\vspace{10pt}\noindent With \texttt{mode=node} you can see ``collisions'' in some diacritics---perhaps easier to see if we increase the font size: HarfBuzz (\texttt{mode=harf}) {\textdir TRT \AmiriHarfBIG\diacritics} vs {\tt luaotfload}'s built-in shaper (\texttt{mode=node}) {\textdir TRT \AmiriNodeBIG\diacritics}. With {\tt mode=node}, the ``fatha'' {\textdir TRT \AmiriNodeBIG\fatha{}} and ``shadda''{\textdir TRT \AmiriNodeBIG\shadda{}} characters have overlapped---like this {\textdir TRT \AmiriNodeBIG\fatha\shadda} rather than being positioned atop each other like this: {\textdir TRT \AmiriHarfBIG\fatha\shadda}.
\subsection*{Aref Ruqqa: \texttt{luaotfload}'s \texttt{mode=harf} vs \texttt{mode=node}}
The following text uses the \href{https://fonts.google.com/specimen/Aref+Ruqaa?preview.text_type=custom}{\color{blue}Aref Rugga font}, and is based on the example shown in the TUGBoat article \href{https://www.tug.org/TUGboat/tb40-1/tb124hosny-harfbuzz.pdf}{\color{blue}Bringing world scripts to Lua\TeX{}: The HarfBuzz experiment}, written by Khaled Hosny.
\par\noindent\hbox to 150pt{\texttt{luaoftload} with \texttt{mode=harf}:\hfil} {\textdir TRT \ruqqaHarf\artext}
\par\noindent\hbox to 150pt{\texttt{luaoftload} with \texttt{mode=node}:\hfil} {\textdir TRT \ruqqaNode\artext}
\vspace{10pt}\noindent Here, HarfBuzz produces clearly superior results because the glyphs are stacked correctly, in accordance with the \href{https://en.wikipedia.org/wiki/Ruq%CA%BFah_script}{\color{blue}Ruqqa style}.
\end{document}