A teletype no more
Incredible effort has been put into interaction design recently. Unfortunately, all that effort has been targeted towards inexperienced users, pretty much ignoring those of us who use computers for hard work, who run the show behind the scenes. Sure, in our spare time we watch videos of Maru like everyone else, but our working day involves nitty-gritty computation. And the innovation in user interfaces for power users just isn’t there.
I’m talking about programmers and sysadmins, so ignored by designers that our primary interface emulates a DEC VT100 terminal from 1978. We are the forgotten few, left interacting with the world through skeuomorphic terminal emulators that mimic graphical applications by streaming reams of ANSI escape codes to a virtual device pretending to be a teletype. And we, the users, play along, pretending our machine is a video terminal presenting a grid of ASCII characters in all of 256 colours. This is ridiculous.
One can understand how this happened: terminals really are far more powerful for most of the work we do than some user-obsequious GUI. The problem is we have not developed and improved terminals in step with the advancements in hardware. TermKit attempted to address this, creating a graphical terminal based on WebKit, and although I appreciate the attempt to change things, to be honest it does not seem like an environment I could spend eight hours in without getting a headache. Widgets and gradients and icons are fetishes of the WIMP interface, and I get the feeling that they are there because “all GUIs have them.”
Here I will discuss my vision of the ‘next generation’ terminal, and what would make it a truly effective interface, a terminal which embraces the past 30 years of technological progress, but doesn’t try to hide, as though ashamed, that it is indeed a tool “for the few of us.”
First we need to work out why command-line interfaces are so useful to us in the first place. There are many reasons why power users prefer a shell over graphical interfaces, but in my opinion they boil down to four main points: commands are…
Composed. Programs can form a pipeline of text streams, Doug McIlroy’s ‘universal interface’. This means you can, with a simple interactive programming language, write commands which achieve complex tasks the authors of these tools did not need to consider.
Remote. One can use a terminal to log in to a server via SSH, assuming control of that machine as though it were one’s own. A command-line interface results in an almost transparent approach to system administration, rather than having to use some disjointed remote desktop.
Momentary. Most commands are transient in nature. Instead of having to wait for an application to load before you can do anything, you just type what you want to do and it happens, often instantly. There is absolutely no context switch involved, and less ‘state’ to keep in mind.
Automated. Once a command has been written, however complex, it can be automated again and again by shell script or cron job. This frees up the user to deal with creative, non-repetitive tasks.
In addition, the keyboard is often more effective than the mouse for our work, since instead of floundering around in nested menus we can just type what we want. However, it’s worth noting that we don’t avoid the mouse because it is slow — if one wants to move the cursor to an arbitrary location elsewhere on the screen, one can often do so faster with a mouse than a keyboard. The problem is the transition from the keyboard to the mouse. It’s an expensive context switch, which should not be done lightly.
The state of the art
The Bourne shell and its descendants are clearly more than the sum of their parts: their features are relatively limited, but their having a large set of utilities to hand makes them a very pragmatic option. However, other command-line interfaces have appeared in different settings than that of Unix shells, often bringing with them some fresh ideas, such as…
Visualisation. Theodore Gray’s interface for Mathematica, the ‘Notebook’, presents interactive visualisations of users’ equations, more in the style of Edward Tufte than traditional GUIs, which seems to me far more useful than merely widgeting up the interface à la TermKit.
Hypermedia. The Common Lisp Interface Manager (CLIM) provided a command-line interface with rich text formatting and hyperlinks, so manual pages would properly crossreference one another, for instance. Search engines like Google and DuckDuckGo are hypermedia command-lines of a sort. I also believe that typography is the future, especially for a distraction-free terminal.
Data structure. Windows PowerShell had a chance to redesign the terminal from scratch, but defaulted to the same old grid of ASCII. One innovative thing they did do was add structure to their data, piping .NET objects instead of raw text, allowing the user to select fields by name instead of writing elaborate AWK scripts. The shell for the research OS Famke does a similar thing for higher-order functions.
These ideas are all variations on a single theme: adding some kind of metadata to the traditional command-line, be it presentational (in the case of visualisations), structural (data types), or a compromise between the two (hypermedia). Shells can therefore be placed on a ‘metadata spectrum’ from raw binary data to serialised typed objects, with annotated binary data somewhere in the middle.
In my experience strongly-typed command-lines require as much work, if not more, to cast the output from one command into input for another. It may be possible to build a weakly-typed system which implicitly coerces types for you, but doing so in a generic way (such as ad-hoc polymorphism) would require vast amounts of groundwork, and having such complicated metadata for each type would probably lead to the morass that is other component models (cf., Bonobo, COM, etc.).
I believe the solution lies closer to the middle of the spectrum. If we were to define an efficient markup which allows us to annotate data with arbitrary metadata, certain tools could make use of this without having to worry about the type or structure of the underlying data. Grep, for instance, would ignore annotations when matching, and insert match annotations itself. The terminal could then, reading the output, highlight or underline matches — content and presentation are separate.1
The problem with markup languages like XML is that since they rely on tag characters (e.g.,
>), all other instances of those characters must be escaped (
<). This is very inefficient for command-line utilities, since it would mean having to look at every byte of each read or write, whereas tools like cat currently deal with entire blocks of data. Since our markup does not need to be written in ASCII — it is being produced by Unicode-aware programs — we can look further afield: data streams would be raw binary with annotations using the ‘tag’ characters in the U+E00xx Unicode block, which are otherwise deprecated since Unicode 5.1. Only a subset of utilities would have to understand that these annotations exist, and would be able to ignore unknown annotations.
Ultimately, a terminal is a special case of a text editor: you edit text, just like normal, except that when you finish a command it is immediately executed by the shell. As such, the interface for your terminal needs to be the same as that of your editor, else you’re left editing text in a poor surrogate editor where everything acts differently and none of your normal features, settings, or macros are available.2 The same is true for anywhere you edit text: anything without the capabilities of your text editor is inadequate and inconsistent.
I believe that the ‘next generation’ terminal must be built with the Model–View–Controller approach pioneered by Smalltalk. The terminal would display the contents of the shell, and the user at the terminal could edit the buffer using their standard text editing tools, because the terminal and editor are one and the same. Meanwhile, the shell would read the user’s input and once a command had been completed would spawn the appropriate program, which would perform operations on a separate filesystem model.
The terminal is therefore both a view and a controller, the shell is a model, and the commands it spawns are controllers. File editing would be done the same way but with filesystem-backed models instead of a shell. The components would communicate over a shared protocol, which could be tunnelled through SSL over the Internet, so we gain remote editing (and hence remote shells) for free. Indeed, a user could use a terminal on a thin client to use a shell on a CPU server to modify files on a separate file server. And since the components are modular anyone could use the front-end most suited to them.
As soon as we bridge the gap between terminals and editors everything just falls into place. We take the benefits of the conventional Unix command-line and build them into an editor, which gives us much more powerful line editing and rids us of SSH lag, a relic of ANSI escape codes. Opening a man page would scroll gently to the top of the page, letting you scroll down and read, or search through it as you would any text; but you can choose to type a new command at any time and the terminal would just scroll to your prompt, or perhaps you fork the shell into a separate terminal with the page open to the side for reference.
We then add syntax highlighting and hyperlinks, so you can easily navigate between man pages, or click on a grep result to visit that line in a file. Clicking on a hyperlinked directory in a file listing would reveal the contents of that directory in a nested list, slightly indented; clicking on a file would open it in a new tab. This is one of the situations in which our metadata introduces subtle but powerful gains: one could have a whole chain of filters, sorting and sedding, and clicking on the result will still take you to the correct location.
Finally we add visualisations so you can view plots of lines of code, etc., without having to context-switch. But these do not appear unduly (when formatting or a hyperlink would do), and instead of being widgets they would be simple representations of information, so they do not distract you from your work. We could for example present to the user their git repository as a minimalistic graph of commits.
This also makes the shell extremely well-suited to the blind, since a terminal could interpret these annotations when translating into speech synthesis.
I’ve always found those find–replace dialogs ridiculous. We’re using a fully-featured text editor, and then we enter this tiny one-line entry box where we have to escape newlines (
\n) and tabs (
\t), with none of our standard text editing features available. How does that make sense? We’re in a text editor! Give (incremental) search mode a proper text buffer of its own; and use multiple concurrent selections to make the substitution, à la Sublime Text.
Thanks to Robert Ransom for reading drafts and giving suggestions.