TeX
After an initial implementation in SAIL, Donald Knuth spent the next few years (1979-1982) rewriting and improving TeX in WEB: a literate programming language that interleaves prose with Pascal code.
Running tangle on the tex.web source separates the TeX documentation from the Pascal code, which was written in a dialect of Pascal, interpreted by a Pascal compiler (Pascal-H) that was available at the time.
For portability, tangle can incorporate changes to the tex.web source using additional .ch patch files. This allowed e-TeX to extend TeX, supporting bidirectional text and additional primitives.
TeX is based on macro expansion. It provides ~300 built-in primitive commands, and these can be extended using additional sets of macros, such as Plain TeX and LaTeX. For efficiency, TeX can be run in INITEX mode, where it loads plain.tex or latex.ltx then dumps the internal state to a .fmt file which can be imported on the next run for quick initialisation.
TeX was designed to run under the constraints of its era, particularly limited memory and Pascal's lack of support for strings. The state is all global, the code uses goto for flow control and makes use of Pascal features such as copy-on-assign and assigning return values to function variables.
However, the Pascal data types map quite well to TypeScript, which makes a direct port seem feasible as the Pascal and TypeScript code can be directly compared.
OpenAI GPT (Codex)
In the first half of 2025, it became apparent that the latest LLMs (GPT o3) understood the TeX source code well enough to be able to explain it in some detail. This opened the possibility that they might be able to create a reliable port of TeX from Pascal to TypeScript.
One possibility was targeting modern web browsers, making use of the latest JavaScript features and removing many of the TeX constraints. The difficulty with this approach is that if the first pass isn’t 100% correct, it’s difficult to compare the TypeScript code with the Pascal original for discrepancies.
The other possibility is a direct translation of each Pascal function to TypeScript. By 2026, with the release of OpenAI's GPT 5.3 Codex, the large context window, increased reasoning and agentic processes (checklists, edit-test cycles, etc) has reached the point where the agent is now able to complete this task.
It’s difficult for LLMs to maintain consistency across a reasonably large codebase (the TeX Pascal source is ~20,000 lines), but they’re at least now able to successfully refactor code across a codebase using semantic codemods, which (as a post-processing step) gets the output to a reasonably clean state.
To ensure that the conversion is correct, the LLM must concurrently produce a test suite that verifies the behaviour of each function in detail, against a gold standard produced by the original code.
Demo
This demo compiles Plain TeX source to DVI, using a TypeScript port of TeX.
It then uses a WASM build of dvisvgm to convert that to SVG for display (you could instead use (x)dvipdfm(x) to convert the output to PDF).
Source
The tex-codex repository contains the code produced by OpenAI's GPT 5.3 Codex.
Future
Later TeX engines also incorporate the e-TeX extensions, and are usually built from the TeX Live source directory:
- pdfTeX (adds PDF output) uses CWEB (C instead of Pascal).
- XeTeX (adds Unicode and system fonts) uses C++.
- LuaTeX (modern, extensible) originally used CWEB, now uses C + Lua.
There have been manual ports of TeX to other languages, such as Rust, C++, C.
There is of course Web2C, which translates Knuth's TeX system from WEB to C.
There are ports of TeX engines to WebAssembly, most notably BusyTeX and jsTeX.
There are also re-implementations of TeX, like JSBox (C), TeX++ (C++, based on CommonTeX), star-tex (Go), XymosTeX (Rust), Tectonic (Rust, based on XeTeX), and others.