From PUB to pubpub: a history of rich text editing

code as (hyper)text, (hyper)text as code


Ever since the first computers, people have needed to write code telling the computers what to do. They've also needed to write text telling the people how to use the computers (and how to write code to tell the computers what to do).

Once it became possible to point and click at locations in the document (with a light pen, or a mouse), people used that to edit and navigate the document.

At first, code and text editors were the same thing.

Later, editors added understanding of the content and specialised for separate domains.

In code, editors automatically recognise the semantics of the document and create bi-directional links between locations.

In text, people add links from text to other sections or locations in the text.

A link might point to a single anchor, or to a choice of anchors from which the user can choose.

Where there are alternate stresses to be put on a phrase, the text can be marked up with bold or italic, or with emojis. bold and italic come from print, and - like emojis - it's tricky to map them to vocal inflections and gestures.

Where parts of text can be identified as keywords, these can be marked as entities, perhaps of a specific type.

code written in a human-writable (and ideally readable) language is compiled for specific output devices (rendered to screen, compiled to machine code).

text written with markup/formatting commands is compiled to device-specific output (e.g. for screen or printer), and may have positioning/layout commands or may have styles applied (e.g. CSS).

Sometimes code and text (and images, etc!) are mixed together in the same document.



: SIMSCRIPT (Harry Markowitz, Bernard Hausner) (RAND)

SIMSCRIPT is a free-form, English-like general-purpose simulation language conceived by Harry Markowitz and Bernard Hausner at the RAND Corporation in 1962. It influenced Simula.

: Simula

Simula has been used in a wide range of applications such as simulating very-large-scale integration (VLSI) designs, process modeling, communication protocols, algorithms, and other applications such as typesetting, computer graphics, and education. The influence of Simula is often understated, and Simula-type objects are reimplemented in C++, Object Pascal, Java, C#, and many other languages. Computer scientists such as Bjarne Stroustrup, creator of C++, and James Gosling, creator of Java, have acknowledged Simula as a major influence.

   Class Glyph;
      Virtual: Procedure print Is Procedure print;;
   Glyph Class Char (c);
      Character c;
      Procedure print;
   Glyph Class Line (elements);
      Ref (Glyph) Array elements;
      Procedure print;
         Integer i;
         For i:= 1 Step 1 Until UpperBound (elements, 1) Do
            elements (i).print;
   Ref (Glyph) rg;
   Ref (Glyph) Array rgs (1 : 4);
   ! Main program;
   rgs (1):- New Char ('A');
   rgs (2):- New Char ('b');
   rgs (3):- New Char ('b');
   rgs (4):- New Char ('a');
   rg:- New Line (rgs);

: TECO (Dan Murphy) (PDP-1)

Text/Tape Editor & Corrector is both a character-oriented text editor and a programming language

TECO is not only an editor but also an interpreted programming language for text manipulation. Arbitrary programs (called "macros") for searching and modifying text give it great power.

Emacs was originally implemented in TECO macros

: Sketchpad (Sutherland)

With Sketchpad (Sutherland, 1963) graphic images can be rotated, replicated, replaced, revised, and reflected upon by merely pushing buttons and pointing a light pen

: Justify, TJ-1 -> TJ-2 (Peter Samson) (PDP-1)

Taking English text as input, TJ-2 aligns left and right margins, justifying the output using white space and word hyphenation
Text is marked-up with single lowercase characters combined with the PDP-1's overline character, carriage returns, and internal concise codes.
Although it lacks page numbering, page headers and footers, TJ-2 is the first word processor to provide a number of essential typographic alignment and automatic typesetting features

Text processors for the PDP-1 included the Colossal Typewriter (by John McCarthy and Roland Silver), Expensive Typewriter (by Steve Piner, and extended by Peter Deutsch), the first TECO (written by Dan Murphy in 1962). The text formatter "Justify" by Peter Samson and was followed by TJ-2, (which all fed into the CTSS TYPSET and RUNOFF programs).

TJ stood for "Type Justifier."
(lots of good stuff in here)

Normal Mode (paragraphs) Quote Mode (verbatim) Centering Mode (centered) Figure Mode (n blank lines for figure, optionally floating to the next page) Inverted Indenting Mode (list?)

TJ-2 lacked any kind of text attributes or text emphasis (not even underlining).


These two commands provide an alternative to the MEMO, MODIFY, and DITTO commands, and are intended to provide experience with a different approach to editing symbolic files.
High-Speed Input Mode (typing) / Edit Mode (commands)
RUNOFF is a command used to type out memorandum files of English text in manuscript format
Input generally consists of English text, 36 or fewer characters to a line. Control words must begin a new line, and begin with a period so that they may be distinguished from other text.
Note that RUNOFF does not recognize any equivalent of the following DITTO control words: .FOOTNOTE .END FOOTNOTE .COMMENT .CHANGE TYPE BALL .END COMMENT

When you're ready to order,
call us at our toll free number:
Your order will be processed
within two working days and shipped

: TVEDIT (Douglas Engelbart) (Stanford) (PDP-1)

In 1962 at the Stanford Research Lab, Engelbart proposed, and later implemented, a word processor with automatic word wrap, search and replace, user-definable macros, scrolling text, and commands to move, copy, and delete characters, words, or blocks of text.

: qed (Butler Lampson, L. Peter Deutsch) (SDS 940)

QED (for "quick editor") addressed teleprinter usage, but systems "for CRT displays were not considered, since many of their design considerations were quite different."

: GenCode (Tunnicliffe)

(separate of formatting from content)
In the 1970s, Tunnicliffe led the development of a standard called GenCode for the publishing industry and later was the first chairman of the International Organization for Standardization committee that created SGML

: HES Hypertext Editing System

The Hypertext Editing System, or HES, was an early hypertext research project conducted at Brown University in 1967 by Andries van Dam, Ted Nelson, and several Brown students. It was the first hypertext system available on commercial equipment that novices could use.
The program was used by NASA's Houston Manned Spacecraft Center for documentation on the Apollo space program.

The user cannot format text elaborately for display within the current Hypertext Editing System since our display unit has only upper case letters and since the aspect ratio and center separation of the characters differ so widely from those of the line printer's print chain. We therefore only show indents (both regular and hanging), paragraphs, lines skipped, and left justification (the display is formatted as ragged right without truncation). However, the user can format text for elegant printout via IBM's TEXT360 program. The TEXT360 program provides printout with such formatting as indentations, capitalization, underscoring, special characters, centering and margin justification, page-numbering and tables of contents.

The assigned format codes are not distinguished on-from-another on the screen. This would require complex display conventions which could only result in a grotesque appearance. We have chosen rather to display all format codes as the hatch symbol (#), and allow the user to proofread formatting specifications by inquiring with an INQUIRE button as to the individual meaning of each.

The display of hatches may be supressed with a function key. Also, the user may flip back and forth between the editing and formatting phases using a single function key. Thus formatting is in actuality a special case of insertion in which the format codes are actually inserted into the data structure.

: BCPL (Martin Richards) (University of Cambridge)

BCPL ("Basic Combined Programming Language") is a procedural, imperative, and structured programming language. Originally intended for writing compilers for other languages, BCPL is no longer in common use. However, its influence is still felt because a stripped down and syntactically changed version of BCPL, called B, was the language on which the C programming language was based. BCPL introduced several features of many modern programming languages, including using curly braces to delimit code blocks.

: NLS (Douglas Engelbart)

"oN-Line System" (as opposed to FLS, "oFf-Line System")
NLS, or the "oN-Line System", was a revolutionary computer collaboration system developed in the 1960s. Designed by Douglas Engelbart and implemented by researchers at the Augmentation Research Center (ARC) at the Stanford Research Institute (SRI), the NLS system was the first to employ the practical use of hypertext links, the mouse, raster-scan video monitors, information organized by relevance, screen windowing, presentation programs, and other modern computing concepts. It was funded by ARPA (the predecessor to Defense Advanced Research Projects Agency), NASA, and the US Air Force.
Of particular significance is the ability of NLS to display a file from many different points of view. For example, the hierarchical outline structure of a text--the various headings--may be stored as part of the data structure, and one may ask to see, for example, only section 3.1.4, or all sections down to two levels of subsectioning, or the first line in each of the subsections on the fourth level. The text is thus viewed as a collection of sections (called "statements") with a tree structure superimposed on this basic data structure. Each statement (less than 3000 characters) is meant to contain a single complete thought or idea, but may have substatements down to an arbitrary number of levels (see Figure 6). Most standard tree manipulations are allowed at a given level in the tree; e.g., locating or deleting the next node or the previous one, locating the first subnode, rearranging neighboring nodes, etc. Note that this hierarchical approach to files, in contrast to the continuous string approach of HES and FRESS, is useful for documents as well as programs.

Statements may contain such embedded

: RUNOFF -> SCRIPT (Stuart Madnick)

SCRIPT is a procedural markup language. Inline commands called control words, indicated by a period in the first column of a logical line, describe the desired appearance of the formatted text.

: FRESS File Retrieval and Editing System (Andries van Dam, Bob Wallace) (Brown University) (PDS-1)

FRESS allowed multiple users to collaborate on as set of documents, which could be of arbitrary size, and (unlike prior systems) were not laid out in lines until the moment of display. FRESS users could insert a marker at any location within a text document and link the marked selection to any other point either in the same document or a different document. This was much like the World Wide Web of today, but without the need for the anchor hyperlinks that HTML requires. Links were also bi-directional, unlike in today's web.
FRESS had two types of links: tags and "jumps". Tags were links to information such as references or footnotes, while "jumps" were links that could take the user through many separate but related documents.
FRESS was essentially a text-based system and editing links was a fairly complex task unless you had access to the PDS-1 terminal, in which case you could select each end with the lightpen and create a link with a couple of keystrokes.
FRESS was for many years the word processor of choice at Brown and a small number of other sites. It was used for typesetting many books, including those by Roderick Chisholm, Robert Coover and Rosmarie Waldrop.
Van Dam is perhaps most known as the co-designer, along with Ted Nelson, of the first hypertext system, HES, in the late 1960s. With it and its immediate successor, FRESS, he was an early proponent of the use of hypertext in the humanities and in pedagogy. The term hypertext was coined by Ted Nelson, who was working with him at the time.2 Van Dam's continued interest in hypertext was crucial to the development of modern markup and browsing technology, and several of his students were instrumental in the origin of XML, XSLT, and related Web standards.
The editing function is selected by pressing the appropriately labeled function key. The portion(s) of text to which the function applies are then indicated by pointing at the text with the lightpen.
An off-line computer typesetting program, IBM's TEXT 360 program, is used for final hard-copy printing
These areas may be interlinked and cross-referenced in any manner so as to form a directed graph of text segments (the vertices of the graph) and their cross references (the edges)
"branches" are unconditional jumps between two fragments that the user may encounter in the text (typically in a "menu") and which force him to lightpen a choice in order to proceed, while "links" are conditional jumps, which the reader may bypass or lightpen. The link, in effect, is an on-line generalization of the manuscript footnote principle that invites digressions, ancillary explanations, and browsing.
In 1967, van Dam co-founded ACM SICGRAPH, the precursor of today's ACM SIGGRAPH

Document structure and markup

FRESS had considerable support for document structuring and markup, affording separation of structure from formatting and hypertext semantics.

: qed -> ed

(In)famous for its terseness, ed gives almost no visual feedback, and has been called (by Peter H. Salus) "the most user-hostile editor ever created", even when compared to the contemporary (and notoriously complex) TECO.

: SYNTEXT? (Wilfred J. Hansen)

: Smalltalk (Alan Kay, Adele Goldberg, Dan Ingalls)

: GML (IBM, Charles Goldfarb, Edward Mosher, Raymond Lorie)

GML was first publicly disclosed in 1973
Generalized Markup Language (GML) is a set of macros that implement intent-based (procedural) markup tags for the IBM text formatter, SCRIPT.
SCRIPT/VS is the main component of IBM's Document Composition Facility (DCF). A starter set of tags in GML is provided with the DCF product.

: RUNOFF -> roff

roff is the first Unix text-formatting computer program, and a predecessor of the nroff and troff document processing systems.
(from the same source, nroff outputs for terminals and troff outputs for typesetting systems)

: PUB (Larry Tesler, Les Earnest)

Larry Tesler took a short-term project offered by Les Earnest from SAIL to write a "document compiler", a means to easily produce printable manuals from simple text files.
PUB is an advanced text justifier and page formatter intended primarily for use by programmers. It can automatically number pages, sections, figures, footnotes, etc. and can print their numbers in roman numerals as well as in digit or letter form. It can generate cross references, tables of contents, and indexes. Page layout is flexible, and allows multiple column output. Line formatting includes tabs, underlining, superscripts, subscripts, centering, and justification. Macros programmed in a SAIL-like string-processing language can generate text to be printed in the document. The output of the compiler is a file which can be printed on the terminal, on the line printer, or on microfilm.
Before GML, most markup was specific, e.g., indent the next line 3 spaces. In GML, which was conceived in 1969 and realized in 1971, all markup was general, e.g., make the next line a heading. A document thus prepared could be formatted for different page sizes, different output devices, etc., without altering the markup.
PUB, SCRIPT, and other scripted markup languages provided a form of generic markup using macros. In PUB, new elements and new behavior could, in many cases, be defined within the PUB language itself. As stated earlier, Brian Reid's SCRIBE-a complete markup language-was originally implemented in PUB. Similarly, IBM's first version of GML was implemented in SCRIPT.
PUB had no element equivalent to HTML's table. Instead, it provided two poorly documented methods of generating a table.
To generate a table row by row, the PUB author set tabs stops and inserted tab characters between the cells in each row.
That trick gave me the idea of using cut and paste, not only to position blocks of lines from a galley proof onto a page, but also to move text around within a manuscript. I got to implement cut and paste in Gypsy at PARC in 1974.
The dot convention originated in RUNOFF. Successors such as SCRIPT, GML, and troff followed suit. Commands had to start on a new line.
TeX (1978) and SGML (1986) dropped the dot convention completely, and allowed markup to appear anywhere.
PUB's reliance on non-ASCII SAIL characters like β as language delimiters was a barrier to widespread adoption.

: Emily (Wilfred J. Hansen)

Unlike other text editors, the text is not stored and manipulated as a linear string of characters. Instead, operations are performed in terms of the tree structure imposed on the program by the syntax of the programming language.

: On-line text editing: A survey

Andries van Dam and David E. Rice, ‘On-line text editing: A survey’, ACM Computing Surveys, 3(3), 93–114, (September 1971).

The Hard-Copy Formatter (which may in fact be largely merged with the Display Generator) is used in the ease of free-form text (as opposed to program text) to convert the storage structure of the file to conventional hard copy for output on typewriter terminals, highspeed line printers, or even photo-composition devices. In his text, the user may specify "format codes" (typesetting codes) that determine margins, headings, running heads, paragraphs, left and/or right margin justification, indents, centering, underscores, type-face changes, etc. These codes are frequently stored in-line with the text itself and may be treated as indistinguishable from text for editing purposes. Sophisticated formatters may produee other byproducts useful for hard copy: foot-notes, hyphenation, tables of contents, lists of figures, indexes, spelling checks, etc. Very readable accounts of the recent advances in computer-assisted typesetting and printing are contained in 11 and 25.

Program Editors, Free Form Text Editors

: troff / nroff / eqn

nroff (short for "new roff") is a text-formatting program on Unix and Unix-like operating systems. It produces output suitable for simple fixed-width printers and terminal windows. It is an integral part of the Unix help system, being used to format man pages for display.

: As We Will Think (Ted Nelson)

: ed -> grep

: ed -> sed

sed is a Unix utility that parses and transforms text, using a simple, compact programming language

: [Xerox Alto]

: Bravo (Butler Lampson, Charles Simonyi) (Xerox PARC) (Xerox Alto)

Bravo was the first WYSIWYG document preparation program.
Bravo was a modal editor—characters typed on the keyboard were usually commands to Bravo, except when in "insert" or "append" mode, in which case they were entered into the character buffer.
Bravo made extensive use of the mouse for marking locations in the text, as well as selecting areas of the text, but it was not used for command entry.
In addition to a long list of commands for controlling the formatting of the text (e.g. the ability to adjust left and right margins for sections of text, select fonts, etc.) Bravo also supported use of multiple buffers (i.e. files), and also multiple windows.


SGML specified a syntax for including the markup in documents, as well as one for separately describing what tags were allowed, and where (the Document Type Definition (DTD), later known as a schema).
SGML provides an abstract syntax that can be implemented in many different types of concrete syntax. Although the markup norm is using angle brackets as start- and end- tag delimiters in an SGML document (per the standard-defined reference concrete syntax), it is possible to use other characters—provided a suitable concrete syntax is defined in the document's SGML declaration.8 For example, an SGML interpreter might be programmed to parse GML, wherein the tags are delimited with a left colon and a right full stop, thus, an :e prefix denotes an end tag: :xmp.Hello, world:exmp.. According to the reference syntax, letter-case (upper- or lower-) is not distinguished in tag names, thus the three tags: (i) , (ii) , and (iii) are equivalent. (NOTE: A concrete syntax might change this rule via the NAMECASE NAMING declarations).
Charles F. Goldfarb is known as the father of Standard Generalized Markup Language (SGML) and grandfather of HTML and the World Wide Web. He co-invented the concept of markup languages.
In 1974, he designed SGML and subsequently wrote the first SGML parser, ARCSGML. Goldfarb went on working to turn SGML into the ISO 8879 standard, and served as its editor in the standardization committee.

SGML is the International Standard (ISO 8879) language for structured data and document representation, the basis of HTML and XML and many others.

In the meantime, though I didn't know about it, the roots of generalized markup were being planted. Historically, electronic manuscripts contained control codes or macros that caused the document to be formatted in a particular way ("specific coding"). In contrast, generic coding, which began in the late 1960s, uses descriptive tags (for example, "heading", rather than "format-17").
Many credit the start of the generic coding movement to a presentation made by William Tunnicliffe, chairman of the Graphic Communications Association (GCA) Composition Committee, during a meeting at the Canadian Government Printing Office in September 1967: his topic -- the separation of information content of documents from their format.
The product was officially called the "Document Composition Facility" (DCF), but everyone called it "Script". It was derived from the language, designed by Stewart Madnick in the late 1960's, that was used in the Integrated Text Processing project.
GML support was added to Script. Geoff Bartlett developed a macro language with built-in SGML functions, including controls for delimiter assignment and association of element types with processing procedures.
Peter Huckle, DCF's Chief Programmer, designed and implemented a notable "starter set" application, the precursor of the "General Document" in ISO 8879.

:h1.Chapter 1:  Introduction
:p.GML supported hierarchical containers, such as
:li.Ordered lists (like this one),
:li.Unordered lists, and
:li.Definition lists
as well as simple structures.
:p.Markup minimization (later generalized and formalized in SGML),
allowed the end-tags to be omitted for the "h1" and "p" elements.

Both :cit.An End Of Spring:ecit. and
:cit.A Hall Of Mirrors:ecit. were
first novels.

The :BACKM tag identifies a major element of a document that contains

The DCF GML User's Guide (IBM SH20-9160), which I wrote in 1978, includes the first published formal document type "descriptions" (DTDs), for this "General Document" and also for a "GML Markup Guide" document type. The General Document example, except for the delimiter strings, should look very familiar. It was not only the source for the homonymous DTD in ISO 8879, but also, thanks to Anders Berglund's championing of DCF at CERN, it was the source for the World Wide Web's HTML document type as well. The User's Guide itself became the first working paper of the ANSI SGML committee (X3J6/78/33-01).
Brian Reid's Scribe system, for example, begun at Carnegie-Mellon in 1976, had independently arrived at several of the key concepts of SGML, though many years later. Brian, however, personally influenced SGML by encouraging me to write "A Generalized Approach to Document Markup" for SIGPLAN Notices in June 1981. That paper eventually became -- after a global change from "GML" to "SGML" -- Annex A of ISO 8879.

: Bravo -> Gypsy (Larry Tesler, Timothy Mott) (Xerox PARC)

Gypsy was the first document preparation system based on a mouse and graphical user interface to take advantage of those technologies to virtually eliminate modes.

It was the second WYSIWYG document preparation program, a successor to the ground-breaking Bravo on the seminal Xerox Alto personal computer.

The code was built on Bravo as a base and the developers of Bravo, including Tom Malloy, Butler Lampson and Charles Simonyi provided technical support to the effort.

Although similar in capabilities to the then-current version of Bravo, the user interface of Gypsy was radically different from that of Bravo. In both Bravo and Gypsy, a command operated on the current selection. But Bravo had modes and Gypsy didn't. In Bravo, the effect of pressing a character key depended on the current mode, while in Gypsy, pressing a character key by itself always typed the character.

Drag-through selection, double-click and cut-copy-paste were quickly adopted by Dan Ingalls for Smalltalk, beginning with Smalltalk-76

Gypsy is the "typescript" component of the Ginn Publishing System being developed at PARCo It is oriented towards the preparation of the content of a book with little concern for its format. A second component of the system will be called a "pagescript" system and will deal with the makeup of pages for photocomposition.

The primary responsibility of the operator is to key in an author's manuscript as accurately as pOSSible, using boldface, italics, and underlining where appropriate, and noting other formatting instructions in asides typed in a special "remark font". When version one of the typescript is completed and stored, the system produces hard copy for editorial scrutiny.

At this point the editor may work on line to make revisions or may mark up the printout and ask an operator to make the revisions. In either case, the editing facilities of the system are used to alter the stored text producing version two.

Printouts have in the left margin a bar (change marker) next to each paragraph which contains a change from the preceding version.

The software is programmed in BCPL for the Alto computer.

: EasyScript

EasyScript is a set of macro definitions and profiles included with Script/370 Version 3 that implements a primitive version of GML.

.ez on
&P.This is a paragraph.
&N1.First item
&N2.First subitem
&N2.Second subitem
&N1.Second item

: TECO -> Emacs (David A. Moon, Guy L. Steele Jr.), (Richard Stallman)

Editing MACroS
The early screen modes of Emacs, for example, were directly inspired by WAITS' "E" editor -- one of a family of editors that were the first to do real-time editing
The most popular, and most ported, version of Emacs is GNU Emacs, which was created by Richard Stallman for the GNU Project
Steele also designed the original command set of Emacs and was the first to port TeX (from WAITS to ITS)
Steele has served on accredited technical standards committees, including: Ecma International (formerly European Computer Manufacturers Association (ECMA)) TC39 (for the language ECMAScript, for which he was editor of the first edition)

: Mesa (Xeroc PARC) (Alto)

Mesa is an ALGOL-like language with strong support for modular programming. Every library module has at least two source files: a definitions file specifying the library's interface plus one or more program files specifying the implementation of the procedures in the interface. To use a library, a program or higher-level library must "import" the definitions. The Mesa compiler type-checks all uses of imported entities; this combination of separate compilation with type-checking was unusual at the time.

: qed -> ed -> QUIDS (George Coulouris)

QUIDS represented the document as a sequence of paragraphs, not as a sequence of lines as do general-purpose editors. QUIDS included specific commands to get particular effects, such as paragraphs that were not indented, headings, etc. QUIDS also included the ability to specify limited symbolic referencing, providing a specification of a string that replaced corresponding symbolic references in the text. With the exception of symbolic referencing, the representation of the text is a linear sequence of paragraphs. In particular there is no notion of hierarchical relationships among document objects.

: ed -> em -> ex -> vi

The original Unix editor, distributed with the Bell Labs versions of the operating system in the 1970s, was the rather user-unfriendly ed. George Coulouris of Queen Mary College, London, which had installed Unix in 1973, developed an improved version called em in 1975 that could take advantage of video terminals While visiting Berkeley, Coulouris presented his program to Bill Joy, who modified it to be less demanding on the processor; Joy's version became ex and got included in the Berkeley Software Distribution.
The name "vi" is derived from the shortest unambiguous abbreviation for the ex command visual, which switches the ex line editor to visual mode.
According to Bill Joy, inspiration for vi's visual mode came from the Bravo editor, which was a bimodal editor.

: grep -> awk

: personal dynamic media (Alan Kay, Adele Goldberg) (Xerox PARC)

Several years ago, we crystallized our dreams into a design idea for a personal dynamic medium the size of a notebook (the Dynabook) which could be owned by everyone and could have the power to handle virtually all of its owner's information-related needs. Towards this goal we have designed and built a communications system: the Smalltalk language, implemented on small computers we refer to as "interim Dynabooks."

We are exploring the use of this system as a programming and problem solving tool; as an interactive memory for the storage and manipulation of data; as a text editor; and as a medium for expression through drawing, painting, animating pictures, and composing and generating music.

Editing. Every description or object in the Dynabook can be displayed and edited. Text, both sequential and structured, can easily be manipulated by combining pointing and a simple "menu" for commands, thus allowing deletion, transposition, and structuring. Multiple windows, as shown in Figure 8, allow a document (composed of text, pictures, musical notation) to be created and viewed simultaneously at several levels of refinement. Editing operations on other viewable objects (such as pictures and fonts) are handled in analogous ways.

Filing. The multiple-window display capability of Smalltalk has inspired the notion of a dynamic document. A document is a collection of objects that have a sensory display and have something to do with each other; it is a way to store and retrieve related information. Each subpart of the document, or frame, has its own editor which is automatically invoked when pointed at by the "mouse." These frames may be related sequentially, as with ordinary paper usage, or inverted with respect to properties, as in cross-indexed file systems. Sets which can automatically map their contents to secondary storage with the ability to form unions, negations, and intersections are part of this system, as is a "modeless" text editor with automatic right justification. The current version of the system is able to automatically cross-file several thousand multifield records (with formats chosen by the user), which include ordinary textual documents indexed by content, the Smalltalk system, personal files, diagrams, and so on.

: PUB -> TeX (Don Knuth)

Don Knuth developed TeX for authors of mathematical texts.
The first version of TeX, called TeX78, was written in the SAIL programming language to run on a PDP-10 under Stanford's WAITS operating system.
For later versions of TeX, Knuth invented the concept of literate programming, a way of producing compilable source code and cross-linked documentation typeset in TeX from the same original file. The language used is called WEB and produces programs in DEC PDP-10 Pascal.

: ? (Sheldon Borkin, John Prager)

Some Issues in the Design of an Editor-Formatter for Structured Documents. Technical Report, IBM Cambridge Scientific Center, November, 1978

: WordMaster, WordStar

WordStar was the first microcomputer word processor to offer mail merge and textual WYSIWYG.
Commands to enable bold or italics, printing, blocking text to copy or delete, saving or retrieving files from disk, etc. were typically a short sequence of keystrokes, such as Ctrl-P-B for bold, or Ctrl-K-S to save a file. Formatting codes would appear on screen, such as ^B for bold, ^Y for italics, and ^S for underscoring.

: Bravo -> BravoX

BravoX was "modeless"

: TECO -> doc, zed (Vaughan Pratt)

We show only that for j &> k, a_(j+i) &< a_k, as proved by @N.
@Berlitz.    For let $D be the class of solutions and $E its closure;
then $D$E is of degree 2^(2^(2^i)), with residue &r(%an^(1+log_23)).

Each word preceded by ` (backquote) will be printed in italics. For boldface use @ instead. To print several words in a row in italics precede each of them with ` and similarly for boldface.

Subscripts are obtained using _ and superscripts using ^. Only the letter following the _ or ^ is affected, unless that letter is left parenthesis, in which case everything from that parenthesis to the matching right parenthesis is affected, the parentheses themselves being removed.

Greek and script letters are obtained using % and $ respectively. The rules for which characters are affected are as for super- and subscripts rather than as for italics and boldface. Thus to get whole Greek words type %(kata piotin). Of course, %k%a%t%a %p%i%o%t%i%n will also work.

: PUB -> Scribe (Brian Reid)

Brian Reid developed Scribe for nontechnical users. He implemented the first version entirely in PUB.

The Scribe markup language defined the words, lines, pages, spacing, headings, footings, footnotes, numbering, tables of contents, etc. in a way similar to HTML.

@Heading(The Beginning)
    Let's start at the very beginning, a very good place to start

@MakeSection(tag=beginning, title="The Beginning")

Typically, large documents were composed of Chapters, with each chapter in a separate file. These files were then referenced by a master document file, thereby concatenating numerous components into a single large source document.
The master file typically also defined styles (such as fonts and margins) and declared macros like MakeSection shown above; macros had limited programmatic features. From that single concatenated source, Scribe computed chapter numbers, page numbers, and cross-references.
These processes replicate features in later markup languages like HTML. Placing styles in a separate file gave some advantages like Cascading Style Sheets, and programmed macros presaged the document manipulation aspects of JavaScript.

: Enquire

While consulting for CERN June-December of 1980, Tim Berners-Lee writes a notebook program, "Enquire-Within-Upon-Everything", which allows links to be made between arbitrary nodes. Each node had a title, a type, and a list of bidirectional typed links.

: DocBook (SGML)

: PEN ()

: WEB (Donald Knuth)

: EZ (Andrew Toolkit)

Ez is the simplest, most general AUIS application possible. It loads a document (or creates a new one) and displays it in a window. That's all. Everything else that happens is controlled by the document itself. For example, if you are editing a text document, you get all the text-editing commands. If you're editing a picture, you get picture-editing commands. If you're editing a text document with pictures, you get both, depending on which piece you are working on at the time.
It is important to understand that the word processing files created by ez have their own distinctive format, just like your favorite commercial word processors. This format was conveniently designed from the start to allow ez documents to be sent via conventional electronic mail systems.

: Bravo -> Word

Microsoft Word began life as Multi-Tool Word for Xenix in 1983. It was renamed Microsoft Word and ported to MS-DOS in 1983, the Macintosh in 1985, and Windows in 1989.
Word was rooted in Bravo, the GUI word processor created at Xerox PARC. Microsoft hired Charles Simonyi, Bravo’s “father”, in 1981, and Multi-Tool Word was released for Xenix in 1983.

: Postscript (John Gaffney, John Warnock)


From 1983 to 1987, the Association of American Publishers (AAP), a coalition of book and journal publishers in North America, sponsored the Electronic Manuscript Project,1 the earliest effort to develop a commercial SGML application.2 The project sought to create an SGML standard for book, journal, and article creation.
the AAP DTDs were ratified in 1988 as the American National Standards Institute's Electronic Manuscript Preparation and Markup (ANSI/NISO Z39.59) standard
Unlike the DTDs that ANSI/NISO Z39.59 specifies for books, serials and articles, the markup recommended for mathematics and tables is not part of the standard.9 As the standard is based on ASCII character encoding, it includes a large set of entity definitions for special characters.
The AAP and the European Physical Society further collaborated on a standard method for marking up mathematical notation and tables in scientific documents.16 Building on this work, Eric van Herwijnen, then head of the text processing section at CERN,17 edited the specification for adoption by the International Organization for Standardization as ISO 12083,1018 which was first published in 1993,1910 revised in 199420 and last reconfirmed in 2016.21 ISO 12083 specifies four DTDs: Article, Book, Serial, and Math.
In 1995 ANSI/NISO Z39.59:1988 was superseded by ISO 12083,7 which was adopted as U.S. standard ANSI/NISO/ISO 12083-1995 (R2009) Electronic Manuscript Preparation and Markup. This U.S. standard was withdrawn in 2016.22
AAP DTD also informed other SGML applications, such as CERN's SGMLguid,2425 the Elsevier Science Article DTD, and EWS MAJOUR, a DTD developed between 1989 and 1991 in an effort led by the publishing houses Elsevier, Wolters Kluwer, and Springer.7

: Waterloo SCRIPT GML + SGML -> SGMLguid in CERNDOC

CERNDOC supported two markup systems: a GML application named CERNPAPER, developed locally in 1985,78 and a SGML application created in 1986 by Anders Berglund, who was at the time responsible for text processing in the CERN data handling division. Berglund mapped a Waterloo SCRIPT macro set onto SGML, basing his application on the document type defined in Annex E of ISO 88791 and on AAP DTD, the American Association of Publishers' document type.95 Prior art also includes the IBM GML starter set.101112 The application features an extensive tag set for preparing foils, memos, letters, scientific papers, and manuals, amongst other use cases.8
Tim Berners-Lee, who was working as a CERN contractor when he created the Web, encountered CERNguid in October 1987, when CERN's Online Computing Group started to maintain its documentation in CERNDOC. Berners-Lee found its hierarchical structure highly limiting.
For HTML, Berners-Lee adopted SGML syntax and a subset of the tags specified in CERN's SGMLguid.

: LaTeX (Leslie Lamport)

The writer uses markup tagging conventions to define the general structure of a document (such as article, book, and letter), to stylise text throughout a document (such as bold and italics), and to add citations and cross-references.

: NoteCards (Randall Trigg, Frank Halasz, Thomas Moran)

One interesting feature of NoteCards is that authors may use LISP commands to customize or create entirely new node types. The powerful programming language allows almost complete customization of the entire NoteCards work environment.

: Intermedia (Norman Meyrowitz)

: HyperCard (Bill Atkinson)

HyperCard had a significant impact on the web as it inspired the creation of both HTTP (through its influence on Tim Berners-Lee's colleague Robert Cailliau), and JavaScript (whose creator, Brendan Eich, was inspired by HyperTalk). It was also a key inspiration for ViolaWWW, an early web browser.

According to Ward Cunningham, the inventor of Wiki, the wiki concept can be traced back to a HyperCard stack he wrote in the late 1980s.

: Mathcad

Mathcad's central interface is an interactive notebook in which equations and expressions are created and manipulated in the same graphical format in which they are presented (WYSIWYG).

: Scribe -> Texinfo (Richard Stallman)

Stallman's GPL Texinfo is "loosely based on Brian Reid's Scribe and other formatting languages of the time".
a mark up language for text that is intended to be read both online and as printed hard copy.

: awk, sed -> Perl

: Markup Systems and the Future of Scholarly Text Processing (James Coombs, Allen Renear, Steven DeRose)


When the Text Encoding Initiative (TEI) was originally established, scholarly projects and libraries attempting to take advantage of digital technology seemed to be faced with an overwhelming obstacle to creating sustainable and shareable archives and tools: the proliferating systems for representing textual material. These systems seemed almost always to be incompatible, often poorly designed, and multiplying at nearly the same rapid rate as the electronic text projects themselves. This situation was inhibiting the development of the full potential of computers to support humanistic inquiry by erecting barriers to access, creating new problems for preservation, making the sharing of data (and theories) difficult, and making the development of common tools impractical.


CALS includes standards for electronic data interchange, electronic technical documentation, and guidelines for process improvement.
The CALS Table Model is a DTD standard for representing tables in SGML/XML. (see also DocBook) The CALS Raster file format was developed in the mid-1980s to standardize on graphics data interchange for electronic publishing for the federal government.



: Wolfram Mathematica (Stephen Wolfram, Theodore Gray)

: CALS Table Model

: Ness (Wilfred J. Hansen)

: DynaText (Louis Reynolds, Steven DeRose, Jeffrey Vogel, Andries van Dam) (Electronic Book Technologies)

DynaText is an SGML publishing tool.

DynaText stands in the long tradition of hypermedia at Brown University, and adopted many features pioneered by FRESS, such as unlimited document sizes, dynamically-controllable styles and views, and reader-created links and trails.

: HyTime (Charles Goldfarb, Steven Newcomb)

: HTML Tags (note no formatting, only links)

: HTML (Tim Berners-Lee, Dan Connolly) (SGML)

The first publicly available description of HTML was a document called "HTML Tags", first mentioned on the Internet by Tim Berners-Lee in late 1991. It describes 18 elements comprising the initial, relatively simple design of HTML. Except for the hyperlink tag, these were strongly influenced by SGMLguid, an in-house Standard Generalized Markup Language (SGML)-based documentation format at CERN. Eleven of these elements still exist in HTML 4.
SGML is a system for defining markup languages. Authors mark up their documents by representing structural, presentational, and semantic information alongside content. HTML is one example of a markup language. Here is an example of an HTML document:

      <TITLE>My first HTML document</TITLE>
      <P>Hello world!

: Wiki (Ward Cunningham)



: Internet Explorer v4 with designMode

: iPython / Jupyter Notebook

: JSON (Douglas Crockford)

Crockford first specified and popularized the JSON format.


: FCKEditor / CKEditor (Frederico Caldeira Knabben)

: Markdown (John Gruber, Aaron Swartz)

: TiddlyWiki (Jeremy Ruston)

: Gmail (Paul Buchheit)

: Writely / Google Docs (Sam Schillace, Steve Newman, Claudia Carpenter)

: HAML (Hampton Catlin)

  %h1= post.title
  %h2= post.subtitle
    = post.content

: CodeMirror

The first version of the editor was written early 2007, for the console in the Eloquent JavaScript website. The code was first packaged up and released under the name CodeMirror in May 2007. This version was based on the contentEditable feature of browsers.

In late 2010, the Ace project, another JavaScript-based code editor, pioneered new implementation techniques and demonstrated that it is possible, even in JavaScript, to handle documents with many thousands of lines without degraded performance. This prompted a rewrite of CodeMirror along the same principles. The result was version 2, which no longer relied on contentEditable and significantly improved performance.

: Bespin

: Ace

: Jade / 2016: Pug

Jade is a high performance template engine heavily influenced by Haml and implemented with JavaScript for node.

Pug is a high-performance template engine heavily influenced by Haml and implemented with JavaScript for Node.js and browsers.

    title= title
    h1= message

: JSON5 (Asheem Kishore, Michael Bolin)

: CommonMark (Jeff Atwood, John MacFarlane)

: Medium

: ProseMirror (Marijn Haverbeke)

: pubpub

: Observable (Melody Meckfessel, Mike Bostock)

Uncategorised notes

Interactive editing systems

Document formatting systems

General-purpose editors

General-purpose editors permit editing of a wide range of objects by reducing all to a

Syntax-directed program editors

Presentational markup
The kind of markup used by traditional word-processing systems: binary codes embedded within document text that produce the WYSIWYG
Procedural markup
Markup is embedded in text which provides instructions for programs to process the text. Well-known examples include troff, TeX, and PostScript. It is expected that the processor will run through the text from beginning to end, following the instructions as encountered. Text with such markup is often edited with the markup visible and directly manipulated by the author. Popular procedural markup systems usually include programming constructs, and macros or subroutines are commonly defined so that complex sets of instructions can be invoked by a simple name (and perhaps a few parameters).
Descriptive markup
Markup is specifically used to label parts of the document for what they are, rather than how they should be processed. Well-known systems that provide many such labels include LaTeX, HTML, and XML. The objective is to decouple the structure of the document from any particular treatment or rendition of it. Such markup is often described as "semantic".
"Document structure and modularity in Mentor"
The current development of Mentor reflects our belief that a major component of programming is the maintenance of large documents of a varied nature: specifications, programs, manuals, progress reports, documentation, etc... In addition, information of various kinds, and in different languages, is often mixed in a single document, and one may have to extract this information selectively upon request (e.g. text, examples and formal specification in a manual, or instructions, comments and assertions in a program).


A structure editor, also structured editor or projectional editor, is any document editor that is cognizant of the document's underlying structure. Structure editors can be used to edit hierarchical or marked up text, computer programs, diagrams, chemical formulas, and any other type of content with clear and well-defined structure. In contrast, a text editor is any document editor used for editing plain text files.

It is common for a language sensitive editor to represent a document as a parse tree with respect to language's grammar, or as an abstract syntax tree (AST). For example, a DOM tree is essentially an AST with respect to a given DTD.

Reveal Codes

A feature of WordPerfect wherein at the stroke of the Alt-F3 key combination, the document is displayed in a panel as a text document with visible markup, markup which is editable. Turning on "Reveal Codes" makes finding and correcting tricky formatting problems in WordPerfect a breeze (at least for those who know how to use the feature).

                                 Colossal Typewriter
                           by John McCarthy and Roland
                               Silver for the PDP-1  |   Photon typesetter
                                ?                    |   editors by Michael
                                ?                     \  Barnett & Kalon
             Expensive Typewriter        CREATE/EDIT   \  Kelley for      TECO
           for PDP-1 by Steve Piner       for CTSS      \ IBM 704     for PDP-1
                   /         |           /     |  \      \__  \   by Dan Murphy
                  /          |          /      |   \        \  \            |
  * Expensive Typewriter  editors      EDITS   | MEMO/MODIFY |  |   VEDIT   |
    for PDP-1, improved    for       by Arthur | by Leslie   |  | by Victor |
    by Peter Deutsch      PDP-4,       Samuel  |   Lowry    /   |  Yngve    |
            |            PDP-5/8      for CTSS  \ for CTSS /  _/  for     PDP-6
            |                            ?   \_  \    |   / _/   CTSS  TECO by
           QED                           ?     \_ \   |  / /   /    Greenblatt,
    for Berkeley SDS-940                 ?       \ \  | | |  /       Holloway,
       by Deutsch and                  LINED    TYPSET for CTSS     and Nelson
       Butler Lampson                for PDP-6  by Jerry Saltzer     ?  |   |
         /           \                  |          |          |     ?   |   |
        /             \                 |      PDP-7/9 editor |    ?  DEC   |
      QED,          * QED               |                     |   ?  TECO   |
  as published      for CTSS            |     ????????????????|???          |
    in CACM     by Ken Thompson         |   ??                |            ITS
        |               |               |  ?      ED (and EDL, EDA, EDB)  TECO
        |               |             STOPGAP            for CTSS        /  |
        |              QED           for PDP-10              |          /  /
        |           for Multics    by Bill Weiher            |         /  |
        |       by Ken Thompson          |                  edit      |   |
        |         /         \           SOS            for Multics    |   |
        |        /           \       for PDP-10    by Charles Garman  |   |
        |      qedx          QED       by Steve           |       ___/    |
        |  for Multics     for GCOS      Savitzky         |      /     EMACS
        |               by Dennis Ritchie  ?            edm     /    in TECO
        |                    |             ?       for Multics /  by RMS et al.
      QUIDS                  |             ?                  /   /  |      |
by George Coulouris       * ed             ?            _____/   /   |      |
      et al.            for PDP-7 Unix     ?           /        /    |      |
        |              by Ken Thompson     ?          |        /     |      |
        |                   |              ?     ZED/DOC      /    Multics  |
        |                   |              ?   by Vaughan    /      EMACS   |
        |                  ed              ?    Pratt       /   by Bernard  |
        |               for Unix           ?   in TECO     /    Greenberg   |
        |           (various versions)     ?     |        /         |      /
        |                 |                ?     |       /           \    /
        |                 |               ?      |      /           GNU Emacs
         \               ed              ?      /      /
          \         for Unix v6         ?      /      /
           \      /     |     \        ?      /      /
            \   /   other eds  |      ?      /      /
             em      (UCLA?)   |     ?      /      /
Unix ed with additions    |    |    ?      |      /
 from George Coulouris     \   |   ?       |     /
   |      |          \__    |  |  ?        |    /
   |      |             \__ |  | ?         |   /
other     |                ex (v1)         |  /
 em       |      Unix ed with additions   /  /
variants  |      by Bill Joy and Charles /  /
         DED                Haley       /  /
   by Richard Bornat,         |        /  /
   Harold Thimbleby         ex (v2)   /  /
                    Unix ed with additions
                          by Bill Joy
                             ex/vi (v3)
                     extended by Mark Horton

GML provided a sound expression of the conceptual foundation for the systematic use of descriptive markup. Unlike ad hoc macro packages. GML is a descriptive language generally implemented on top of a clearly distinct, user-accessible procedural language. In addition, GML contributed “attributes” to descriptive markup languages, providing markup support for such essential functions as cross-references (which are automatically resolved by applications).

Another influential system, Scribe, enforces the use of descriptive markup by eliminating procedural markup from the author’s normal access to the system. Instead of tuning procedural markup to control the processing of descriptive markup, authors select “document format definitions” for various types of documents.

The Scribe approach has been widely emulated recently, but with moderate success at best. LATEX, for example, provides a high-level interface with TEX, which is designed to provide low-level typesetting control. Unfortunately, even the beginning LATEX user must think in terms of low-level markup.

Similarly, a number of word processors (Microsoft Word, XyWrite, Nota Bene) have recently adopted Scribe’s document format definitions under the metaphor of electronic “style sheets.”

More sophisticated systems often process electronic markup and then disguise it behind a special character. XyWrite and Nota Bene, for example, display a “delta” so that authors can locate and edit markup. Such systems usually have the capacity to expose the markup as well. Other systems (Xerox Bravo and Star, MacWrite) conceal electronic markup entirely. One system (Janus ll) exposes descriptive markup on one monitor and conceals it on the other

Finally, systems have recently begun to display electronic markup; that is, an especially formatted representation of the markup in the source file is displayed along with the text. Etude and Interleaf, for example, format text for editing, but display descriptive markup in a margin at the left of the editing window.

When FRESS (File Retrieval and Editing System.) users at Brown University learned FRESS would no longer be supported, authors either spent hours converting their files to the new format (Waterloo SCRIPT)

Steven J. DeRose, David G. Durand, Elli Mylonas, and Allen H. Renear. 1997. What is text, really? SIGDOC Asterisk J. Comput. Doc. 21, 3 (August 1997), 1–24.

Style Sheet Languages for Hypertext

: The Dexter hypertext reference model (Frank Halasz, Mayer Schwartz)

: A Network-Based Approach to Text Handling for the Online Scientific Community (Randall Trigg) (University of Maryland)

XHTML(2) died because it wasn't backwards-compatible. But HTML can't be validated with XML tools until it's been run through a HTML parser. So JATS XML is used instead (the NLM DTD was introduced in the same year)

Although PMC is also used to store articles based on research funded with NIH grants as part of the NIH Public Access project, the original intent of the project was to take fulltext article submissions from publishers and make them available through the database. The only technical requirement at the time was that the publisher had to supply the articles in some SGML or XML format and include all images so that the articles could be displayed at PMC.

The PMC workflow was changed so that content coming in in different publishers’ article formats was to be converted into a common format (Figure 2). The PubMed Central DTD was written based on the two article models that were being submitted to PMC at the time: the keton SGML DTD and the BioMed Central XML DTD.

In 2001, the Harvard University Library E-Journal Archiving Project (using funds from the Mellon Foundation) commissioned a study into the feasibility of having one DTD that could be used to archive all electronic journals

Research on WYSIWYG mathematical systems supporting mixed text and calculations with a document metaphor begin to be published in 1987: Ron Avitzur's Milo, ​William Schelter's INFOR, Xerox PARC's Tioga and CaminoReal.