A recent story airing on NPR brought to mind how the ubiquity of technology in modern life can progress, or be thwarted, by the degree to which customs, emotional traits, mannerisms and simple politeness are encoded in that technology.
In his list of 10 usability heuristics, Jakob Nielsen outlines how "system[s] should speak the users' language, with words, phrases and concepts familiar to the user..." Machines which "speak" and "behave" like people are more personable; we're hard-wired to relate effectively to other things which act the way we act.
It shouldn't be surprising then to find that programs, websites and products which emulate the mannerisms of people find greater acceptance than those which may offer greater functionality, but offer poor interaction. Or, more to the point: manners matter.
The story on NPR is of an elderly woman who has trouble remembering to take all her medications. The solution comes in the form of an automated pill dispensing machine which, once programmed (presumably not by her), provides gentle reminders of dosage times, dispenses all (and only) the necessary medications into a cup, and politely thanks her for taking her pills. (Brilliantly thought through, the machine also calls her son if she doesn't take her required pills within a certain period of time.)
Here is a situation, literally a matter of life and death (missing medications or "noncompliance," as it is somewhat ruthlessly referred to in the medical profession), turned from a difficult chore into a delightful experience. The woman even extols the machines virtues by exclaiming "isn't that just so polite - it says 'thank you' when I take my pills!"
Why should such a little thing - two syllables, a brief staccato of grunts - make such a difference? Pre-recorded and emanated from a small, low-fidelity speaker whenever a pre-defined series of events occur, the "thank you" this machine utters isn't very personal, so why is it perceived as such? And why is it important that it be so?
Consider the facts biology presents: two million years of human evolution have produced an organ - the brain - which can recognize at considerable distance an old friend who is approaching to greet us with joy, or an enemy who is charging at us with murderous intent. Both may be approaching quickly, arms raised, teeth bared - yet we can easily and instantly distinguish between the two, partly by considering factors like gait, stature, the position of the person's hands, and even the direction of their gaze. The ability to observe, process and draw conclusions from such an assortment of behaviors also allows us to spot inconsistencies in those behaviors: that person is smiling, but his eyes are darting about the room - "something's wrong." we may think.
So too with machines. I remember the first time I started up a Macintosh - it was a Performa 636 - and before the monitor even came on, I heard a gentle, satisfying chord ring out. It set me immediately at ease and the next thing I saw - a smiley face on a small icon of a computer - instantly quelled any remaining anxiety. Compare this to the short, perfunctory "burp" my current Windows laptop makes when I wake it from its sleep. "How dare you," it seems to be saying, "I was HI-ber-na-ting." At the very outset of these respective events, the tenor of the experience to come is shaded by the literal tone of the machine.
Humanizing machines is no simple feat, because the trouble comes not in encoding the behaviors (it's easy to make a recording of a voice saying "thank you") but in choosing when to exhibit these behaviors. Real human behavior - except in cases of brain dysfunction - is highly contextual. Encoding context into a machine's behavior is difficult, because the inputs a computer or other device has at its disposal are limited. I can tell a search engine what I'm looking for, but I can't tell the search engine how urgent my request is. Even the most dimwitted person could probably tell from the expression on my face whether I need to know what time of day I should take my medication, or whether to call the emergency room. But lacking that input, even Google is oblivious to my real needs.
Some people, many of whom work for Google, Yahoo and Amazon, will tell you that understanding the context of a request is achievable through profiling, personalization, and collaborative filtering. And they're right, to some extent. But context is more than the words and phrases I use to describe a need; it's also the look on my face, the tone of my voice, the force with which I hit each key, and probably linked in some way to what I ate for breakfast.
When a person starts a computer, especially for the first time, it's important to that person to know that everything is OK. So the Mac says "I'm OK, I'm getting ready for you," as soon as it can.
When that elderly woman is supposed to take her medication, the machine says, "It’s time for your medication!" Its tone is chirpy, upbeat, and polite. After the woman takes her medication, the machine says "thank you!" Notice that it does not say, in a resonating, metallic voice: "Medication consumed. End transaction. Resetting timer. Awaiting next trigger event." That may be what it's thinking, but that's not what it says.
In each of these situations, a program is delivering a "humanizing" response. It is humanizing to the device, and to the person receiving the message, because it is the right behavior delivered at the right moment. Delightful user experiences are made, in part, by ensuring that these two factors are understood as important, and linked.