The Linguistic Interface

I’ve recently been thinking more about command line interfaces, and what it is that truly sets them apart from WIMP interfaces — beyond sh, what is a Shell as a Platonic ideal?

The loosest description would be that a shell is a linguistic interface: while WIMP is largely about literal interaction with objects (e.g., picking up a pencil and drawing a cat), shells feature a sophisticated syntactic structure which allows them to describe complicated concepts, even those which have no physical representation. As concepts become more complicated WIMP scales poorly, with a button labelled “Open…” followed by tedious rummaging through pictures of files, instead of vastly more expressive linguistic commands like “open all files containing the word ‘kittens’.”

Concepts like this cannot be expressed without language; WIMP interfaces must restrict the user to a handful of possibilities, or else attempt to mimic language with an arrangement of pictographs, with all the complexities of language but none of the familiarity of one’s native tongue. While recent software like Siri and Android’s voice assistant have been relatively successful, natural language parsing remains very much unsolved. The trouble is the sheer ambiguity of speech, which we handle very well but (existing) computers flounder at. So as a compromise we create artificial languages with unambiguous syntax, such as the Bourne shell: open `grep -Rl kittens .`. It’s not pretty, but we can’t do all that much better right now. Lojban might work too, but no-one speaks it.

(As an aside, while Lojban is syntactically unambiguous, it is not semantically unambiguous. If you were to say sarcastically, “oh sure, and just delete all my files while you’re at it,” you’d better hope it’s smart enough not to. There is no sarcastic rm -fr /.)

The nature of a shell’s input and output depends on what it can do. For example, a shell that only needs to set reminders may only need a microphone and a speaker, all IO being vocal. On the other hand a shell may need to present the user with images and so on, in which case they would need (in terms of current technology) an LCD screen. But I’d like to explore more specifically operating system shells, so let’s assume our shell is used for general desktop and server computer use.

There is of course nothing limiting the shell to a raw teletype-style command line. Computer algebra systems like Mathematica incorporate diagrams and typeset equations into their ‘notebooks’, which are equivalent to Unix terminals. What distinguishes shells from WIMP application-centricity is that when dealing with an application one must first construct an environment in which the desired set of actions are available. In order to view an image we must first get out the image viewer; in order to spell check we must first get out the word processor. These are all self-contained environments whose features are not available to any other program in the system. In order to do anything you must first ready the thing-doer. These thing-doers are isolated from one another, so you cannot possibly combine their functionality. We live in a Kingdom of Nouns.

Shells don’t construct these isolated application environments, they embrace verbs (as well as the other parts of speech). You can say “compile this; spell check that.” This is closer to how we think: we work with tasks and tools, not application environments. There is the problem that learning a language is harder (“less intuitive”) than gesturing at pictographs, but the added complexity comes with vastly greater expressiveness. One may well have some trouble when learning, but simpler languages and a little help on the side could mitigate this. (One can easily imagine a Unix terminal sidebar that acts as apropos and man as you type.)

There does come a time when all you want to do is pick up a pencil and draw a cat. But we must remember that we aren’t using an application in which one draws cats, we’re simply acknowledging that paper is a thing we can draw on. There is still no application harness set up to isolate us from the rest of the world, and the pencil is not inextricably bound to the paper. The terminal — a record of the conversation we’ve been having with the shell — happens to be one thing to look at, but even as we scribble over the page we can still talk to the shell, and it can do things to the drawing just as it can anything else. “Now add to this all the pictures I drew of kittens. All of them.”