“I’LL be back.” “Hasta la vista, baby.” Arnold Schwarzenegger’s Teutonic drone in the “Terminator” films is world-famous. But in this instance film-makers looking into the future were overly pessimistic. Some applications do still feature a monotonous “robot voice”, but that is changing fast.
Heteronyms require special care. How should a computer pronounce a word like “lead”, which can be a present-tense verb or a noun for a heavy metal, pronounced quite differently? Once again a language model can make accurate guesses: “Lead us not into temptation” can be parsed for its syntax, and once the software has worked out that the first word is almost certainly a verb, it can cause it to be pronounced to rhyme with “reed”, not “red”.
Traditionally, text-to-speech models have been “concatenative”, consisting of very short segments recorded by a human and then strung together as in the acoustic model described above. More recently, “parametric” models have been generating raw audio without the need to record a human voice, which makes
Machine translation: Beyond Babel
Computer translations have got strikingly better, but still need human input
IN “STAR TREK” it was a hand-held Universal Translator; in “The Hitchhiker’s Guide to the Galaxy” it was the Babel Fish popped conveniently into the ear. In science fiction, the meeting of distant civilisations generally requires some kind of device to allow them to talk. High-quality automated translation seems even more magical than other kinds of language technology because many humans struggle to speak more than one language, let alone translate from one to another.
The idea has been around since the 1950s, and computerised translation is still known by the quaint moniker “machine translation” (MT). It goes back to the early days of the cold war, when American scientists were trying to get computers to translate from Russian. They were inspired by the code-breaking successes of the second world war, which had led to the development of computers in the first place. To them, a scramble of Cyrillic letters on a page of Russian text was just a coded version of English, and turning it into English was just a question of breaking the code.
Scientists at IBM and Georgetown University were among those who thought that the problem would be cracked quickly. Having programmed just six rules and a vocabulary of 250 words into a computer, they gave a demonstration in New York on January 7th 1954 and proudly produced 60 automated translations, including that of “Mi pyeryedayem mislyi posryedstvom ryechyi,” which came out correctly as “We transmit thoughts by means of speech.” Leon Dostert of Georgetown, the lead scientist, breezily predicted that fully realised MT would be “an accomplished fact” in three to five years.
Instead, after more than a decade of work, the report in 1966 by a committee chaired by John Pierce, mentioned in the introduction to this report, recorded bitter disappointment with the results and urged researchers to focus on narrow, achievable goals such as automated dictionaries. Government-sponsored work on MT went into near-hibernation for two decades. What little was done was carried out by private companies. The most notable of them was Systran, which provided rough translations, mostly to America’s armed forces.
La plume de mon ordinateur
The scientists got bogged down by their rules-based approach. Having done relatively well with their six-rule system, they came to believe that if they programmed in more rules, the system would become more sophisticated and subtle. Instead, it became more likely to produce nonsense. Adding extra rules, in the modern parlance of software developers, did not “scale”.
Besides the difficulty of programming grammar’s many rules and exceptions, some early observers noted a conceptual problem. The meaning of a word often depends not just on its dictionary definition and the grammatical context but the meaning of the rest of the sentence. Yehoshua Bar-Hillel, an Israeli MT pioneer, realised that “the pen is in the box” and “the box is in the pen” would require different translations for “pen”: any pen big enough to hold a box would have to be an animal enclosure, not a writing instrument.
How could machines be taught enough rules to make this kind of distinction? They would have to be provided with some knowledge of the real world, a task far beyond the machines or their programmers at the time. Two decades later, IBM stumbled on an approach that would revive optimism about MT. Its Candide system was the first serious attempt to use statistical probabilities rather than rules devised by humans for translation. Statistical, “phrase-based” machine translation, like speech recognition, needed training data to learn from. Candide used Canada’s Hansard, which publishes that country’s parliamentary debates in French and English, providing a huge amount of data for that time. The phrase-based approach would ensure that the translation of a word would take the surrounding words properly into account.
But quality did not take a leap until Google, which had set itself the goal of indexing the entire internet, decided to use those data to train its translation engines; in 2007 it switched from a rules-based engine (provided by Systran) to its own statistics-based system. To build it, Google trawled about a trillion web pages, looking for any text that seemed to be a translation of another—for example, pages designed identically but with different words, and perhaps a hint such as the address of one page ending in /en and the other ending in /fr. According to Macduff Hughes, chief engineer on Google Translate, a simple approach using vast amounts of data seemed more promising than a clever one with fewer data.
Training on parallel texts (which linguists call corpora, the plural of corpus) creates a “translation model” that generates not one but a series of possible translations in the target language. The next step is running these possibilities through a monolingual language model in the target language. This is, in effect, a set of expectations about what a well-formed and typical sentence in the target language is likely to be. Single-language models are not too hard to build. (Parallel human-translated corpora are hard to come by; large amounts of monolingual training data are not.) As with the translation model, the language model uses a brute-force statistical approach to learn from the training data, then ranks the outputs from the translation model in order of plausibility.
Statistical machine translation rekindled optimism in the field. Internet users quickly discovered that Google Translate was far better than the rules-based online engines they had used before, such as BabelFish. Such systems still make mistakes—sometimes minor, sometimes hilarious, sometimes so serious or so many as to make nonsense of the result. And language pairs like Chinese-English, which are unrelated and structurally quite different, make accurate translation harder than pairs of related languages like English and German. But more often than not, Google Translate and its free online competitors, such as Microsoft’s Bing Translator, offer a usable approximation.
Such systems are set to get better, again with the help of deep learning from digital neural networks. The Association for Computational Linguistics has been holding workshops on MT every summer since 2006. One of the events is a competition between MT engines turned loose on a collection of news text. In August 2016, in Berlin, neural-net-based MT systems were the top performers (out of 102), a first.
Now Google has released its own neural-net-based engine for eight language pairs, closing much of the quality gap between its old system and a human translator. This is especially true for closely related languages (like the big European ones) with lots of available training data. The results are still distinctly imperfect, but far smoother and more accurate than before. Translations between English and (say) Chinese and Korean are not as good yet, but the neural system has brought a clear improvement here too.
The Coca-Cola factor
Neural-network-based translation actually uses two networks. One is an encoder. Each word of an input sentence is converted into a multidimensional vector (a series of numerical values), and the encoding of each new word takes into account what has happened earlier in the sentence. Marcello Federico of Italy’s Fondazione Bruno Kessler, a private research organisation, uses an intriguing analogy to compare neural-net translation with the phrase-based kind. The latter, he says, is like describing Coca-Cola in terms of sugar, water, caffeine and other ingredients. By contrast, the former encodes features such as liquidness, darkness, sweetness and fizziness.
Once the source sentence is encoded, a decoder network generates a word-for-word translation, once again taking account of the immediately preceding word. This can cause problems when the meaning of words such as pronouns depends on words mentioned much earlier in a long sentence. This problem is mitigated by an “attention model”, which helps maintain focus on other words in the sentence outside the immediate context.
Neural-network translation requires heavy-duty computing power, both for the original training of the system and in use. The heart of such a system can be the GPUs that made the deep-learning revolution possible, or specialised hardware like Google’s Tensor Processing Units (TPUs). Smaller translation companies and researchers usually rent this kind of processing power in the cloud. But the data sets used in neural-network training do not need to be as extensive as those for phrase-based systems, which should give smaller outfits a chance to compete with giants like Google.
Fully automated, high-quality machine translation is still a long way off. For now, several problems remain. All current machine translations proceed sentence by sentence. If the translation of such a sentence depends on the meaning of earlier ones, automated systems will make mistakes. Long sentences, despite tricks like the attention model, can be hard to translate. And neural-net-based systems in particular struggle with rare words.
Training data, too, are scarce for many language pairs. They are plentiful between European languages, since the European Union’s institutions churn out vast amounts of material translated by humans between the EU’s 24 official languages. But for smaller languages such resources are thin on the ground. For example, there are few Greek-Urdu parallel texts available on which to train a translation engine. So a system that claims to offer such translation is in fact usually running it through a bridging language, nearly always English. That involves two translations rather than one, multiplying the chance of errors.
Even if machine translation is not yet perfect, technology can already help humans translate much more quickly and accurately. “Translation memories”, software that stores already translated words and segments, first came into use as early as the 1980s. For someone who frequently translates the same kind of material (such as instruction manuals), they serve up the bits that have already been translated, saving lots of duplication and time.
A similar trick is to train MT engines on text dealing with a narrow real-world domain, such as medicine or the law. As software techniques are refined and computers get faster, training becomes easier and quicker. Free software such as Moses, developed with the support of the EU and used by some of its in-house translators, can be trained by anyone with parallel corpora to hand. A specialist in medical translation, for instance, can train the system on medical translations only, which makes them far more accurate.
At the other end of linguistic sophistication, an MT engine can be optimised for the shorter and simpler language people use in speech to spew out rough but near-instantaneous speech-to-speech translations. This is what Microsoft’s Skype Translator does. Its quality is improved by being trained on speech (things like film subtitles and common spoken phrases) rather than the kind of parallel text produced by the European Parliament.
Translation management has also benefited from innovation, with clever software allowing companies quickly to combine the best of MT, translation memory, customisation by the individual translator and so on. Translation-management software aims to cut out the agencies that have been acting as middlemen between clients and an army of freelance translators. Jack Welde, the founder of Smartling, an industry favourite, says that in future translation customers will choose how much human intervention is needed for a translation. A quick automated one will do for low-stakes content with a short life, but the most important content will still require a fully hand-crafted and edited version. Noting that MT has both determined boosters and committed detractors, Mr Welde says he is neither: “If you take a dogmatic stance, you’re not optimised for the needs of the customer.”
Translation software will go on getting better. Not only will engineers keep tweaking their statistical models and neural networks, but users themselves will make improvements to their own systems. For example, a small but much-admired startup, Lilt, uses phrase-based MT as the basis for a translation, but an easy-to-use interface allows the translator to correct and improve the MT system’s output. Every time this is done, the corrections are fed back into the translation engine, which learns and improves in real time. Users can build several different memories—a medical one, a financial one and so on—which will help with future translations in that specialist field.
TAUS, an industry group, recently issued a report on the state of the translation industry saying that “in the past few years the translation industry has burst with new tools, platforms and solutions.” Last year Jaap van der Meer, TAUS’s founder and director, wrote a provocative blogpost entitled “The Future Does Not Need Translators”, arguing that the quality of MT will keep improving, and that for many applications less-than-perfect translation will be good enough.
The “translator” of the future is likely to be more like a quality-control expert, deciding which texts need the most attention to detail and editing the output of MT software. That may be necessary because computers, no matter how sophisticated they have become, cannot yet truly grasp what a text means.
Meaning and machine intelligence: What are you talking about?
Machines cannot conduct proper conversations with humans because they do not understand the world
IN “BLACK MIRROR”, a British science-fiction satire series set in a dystopian near future, a young woman loses her boyfriend in a car accident. A friend offers to help her deal with her grief. The dead man was a keen social-media user, and his archived accounts can be used to recreate his personality. Before long she is messaging with a facsimile, then speaking to one. As the system learns to mimic him ever better, he becomes increasingly real.
This is not quite as bizarre as it sounds. Computers today can already produce an eerie echo of human language if fed with the appropriate material. What they cannot yet do is have true conversations. Truly robust interaction between man and machine would require a broad understanding of the world. In the absence of that, computers are not able to talk about a wide range of topics, follow long conversations or handle surprises.
Machines trained to do a narrow range of tasks, though, can perform surprisingly well. The most obvious examples are the digital assistants created by the technology giants. Users can ask them questions in a variety of natural ways: “What’s the temperature in London?” “How’s the weather outside?” “Is it going to be cold today?” The assistants know a few things about users, such as where they live and who their family are, so they can be personal, too: “How’s my commute looking?” “Text my wife I’ll be home in 15 minutes.”
And they get better with time. Apple’s Siri receives 2bn requests per week, which (after being anonymised) are used for further teaching. For example, Apple says Siri knows every possible way that users ask about a sports score. She also has a delightful answer for children who ask about Father Christmas. Microsoft learned from some of its previous natural-language platforms that about 10% of human interactions were “chitchat”, from “tell me a joke” to “who’s your daddy?”, and used such chat to teach its digital assistant, Cortana.
The writing team for Cortana includes two playwrights, a poet, a screenwriter and a novelist. Google hired writers from Pixar, an animated-film studio, and The Onion, a satirical newspaper, to make its new Google Assistant funnier. No wonder people often thank their digital helpers for a job well done. The assistants’ replies range from “My pleasure, as always” to “You don’t need to thank me.”
Good at grammar
How do natural-language platforms know what people want? They not only recognise the words a person uses, but break down speech for both grammar and meaning. Grammar parsing is relatively advanced; it is the domain of the well-established field of “natural-language processing”. But meaning comes under the heading of “natural-language understanding”, which is far harder.
First, parsing. Most people are not very good at analysing the syntax of sentences, but computers have become quite adept at it, even though most sentences are ambiguous in ways humans are rarely aware of. Take a sign on a public fountain that says, “This is not drinking water.” Humans understand it to mean that the water (“this”) is not a certain kind of water (“drinking water”). But a computer might just as easily parse it to say that “this” (the fountain) is not at present doing something (“drinking water”).
As sentences get longer, the number of grammatically possible but nonsensical options multiplies exponentially. How can a machine parser know which is the right one? It helps for it to know that some combinations of words are more common than others: the phrase “drinking water” is widely used, so parsers trained on large volumes of English will rate those two words as likely to be joined in a noun phrase. And some structures are more common than others: “noun verb noun noun” may be much more common than “noun noun verb noun”. A machine parser can compute the overall probability of all combinations and pick the likeliest.
A “lexicalised” parser might do even better. Take the Groucho Marx joke, “One morning I shot an elephant in my pyjamas. How he got in my pyjamas, I’ll never know.” The first sentence is ambiguous (which makes the joke)—grammatically both “I” and “an elephant” can attach to the prepositional phrase “in my pyjamas”. But a lexicalised parser would recognise that “I [verb phrase] in my pyjamas” is far more common than “elephant in my pyjamas”, and so assign that parse a higher probability.
But meaning is harder to pin down than syntax. “The boy kicked the ball” and “The ball was kicked by the boy” have the same meaning but a different structure. “Time flies like an arrow” can mean either that time flies in the way that an arrow flies, or that insects called “time flies” are fond of an arrow.
“Who plays Thor in ‘Thor’?” Your correspondent could not remember the beefy Australian who played the eponymous Norse god in the Marvel superhero film. But when he asked his iPhone, Siri came up with an unexpected reply: “I don’t see any movies matching ‘Thor’ playing in Thor, IA, US, today.” Thor, Iowa, with a population of 184, was thousands of miles away, and “Thor”, the film, has been out of cinemas for years. Siri parsed the question perfectly properly, but the reply was absurd, violating the rules of what linguists call pragmatics: the shared knowledge and understanding that people use to make sense of the often messy human language they hear. “Can you reach the salt?” is not a request for information but for salt. Natural-language systems have to be manually programmed to handle such requests as humans expect them, and not literally.
Shared information is also built up over the course of a conversation, which is why digital assistants can struggle with twists and turns in conversations. Tell an assistant, “I’d like to go to an Italian restaurant with my wife,” and it might suggest a restaurant. But then ask, “is it close to her office?”, and the assistant must grasp the meanings of “it” (the restaurant) and “her” (the wife), which it will find surprisingly tricky. Nuance, the language-technology firm, which provides natural-language platforms to many other companies, is working on a “concierge” that can handle this type of challenge, but it is still a prototype.
Such a concierge must also offer only restaurants that are open. Linking requests to common sense (knowing that no one wants to be sent to a closed restaurant), as well as a knowledge of the real world (knowing which restaurants are closed), is one of the most difficult challenges for language technologies.
Common sense, an old observation goes, is uncommon enough in humans. Programming it into computers is harder still. Fernando Pereira of Google points out why. Automated speech recognition and machine translation have something in common: there are huge stores of data (recordings and transcripts for speech recognition, parallel corpora for translation) that can be used to train machines. But there are no training data for common sense.
Brain scan: Terry Winograd
The Winograd Schema tests computers’ “understanding” of the real world
THE Turing Test was conceived as a way to judge whether true artificial intelligence has been achieved. If a computer can fool humans into thinking it is human, there is no reason, say its fans, to say the machine is not truly intelligent.
Few giants in computing stand with Turing in fame, but one has given his name to a similar challenge: Terry Winograd, a computer scientist at Stanford. In his doctoral dissertation Mr Winograd posed a riddle for computers: “The city councilmen refused the demonstrators a permit because they feared violence. Who feared violence?”
It is a perfect illustration of a well-recognised point: many things that are easy for humans are crushingly difficult for computers. Mr Winograd went into AI research in the 1960s and 1970s and developed an early natural-language program called SHRDLU that could take commands and answer questions about a group of shapes it could manipulate: “Find a block which is taller than the one you are holding and put it into the box.” This work brought a jolt of optimism to the AI crowd, but Mr Winograd later fell out with them, devoting himself not to making machines intelligent but to making them better at helping human beings. (These camps are sharply divided by philosophy and academic pride.) He taught Larry Page at Stanford, and after Mr Page went on to co-found Google, Mr Winograd became a guest researcher at the company, helping to build Gmail.
In 2011 Hector Levesque of the University of Toronto became annoyed by systems that “passed” the Turing Test by joking and avoiding direct answers. He later asked to borrow Mr Winograd’s name and the format of his dissertation’s puzzle to pose a more genuine test of machine “understanding”: the Winograd Schema. The answers to its battery of questions were obvious to humans but would require computers to have some reasoning ability and some knowledge of the real world. The first official Winograd Schema Challenge was held this year, with a $25,000 prize offered by Nuance, the language-software company, for a program that could answer more than 90% of the questions correctly. The best of them got just 58% right.
Though officially retired, Mr Winograd continues writing and researching. One of his students is working on an application for Google Glass, a computer with a display mounted on eyeglasses. The app would help people with autism by reading the facial expressions of conversation partners and giving the wearer information about their emotional state. It would allow him to integrate linguistic and non-linguistic information in a way that people with autism find difficult, as do computers.
Asked to trick some of the latest digital assistants, like Siri and Alexa, he asks them things like “Where can I find a nightclub my Methodist uncle would like?”, which requires knowledge about both nightclubs (which such systems have) and Methodist uncles (which they don’t). When he tried “Where did I leave my glasses?”, one of them came up with a link to a book of that name. None offered the obvious answer: “How would I know?”
Knowledge of the real world is another matter. AI has helped data-rich companies such as America’s West-Coast tech giants organise much of the world’s information into interactive databases such as Google’s Knowledge Graph. Some of the content of that appears in a box to the right of a Google page of search results for a famous figure or thing. It knows that Jacob Bernoulli studied at the University of Basel (as did other people, linked to Bernoulli through this node in the Graph) and wrote “On the Law of Large Numbers” (which it knows is a book).
Organising information this way is not difficult for a company with lots of data and good AI capabilities, but linking information to language is hard. Google touts its assistant’s ability to answer questions like “Who was president when the Rangers won the World Series?” But Mr Pereira concedes that this was the result of explicit training. Another such complex query—“What was the population of London when Samuel Johnson wrote his dictionary?”—would flummox the assistant, even though the Graph knows about things like the historical population of London and the date of Johnson’s dictionary. IBM’s Watson system, which in 2011 beat two human champions at the quiz show “Jeopardy!”, succeeded mainly by calculating huge numbers of potential answers based on key words by probability, not by a human-like understanding of the question.
Making real-world information computable is challenging, but it has inspired some creative approaches. Cortical.io, a Vienna-based startup, took hundreds of Wikipedia articles, cut them into thousands of small snippets of information and ran an “unsupervised” machine-learning algorithm over it that required the computer not to look for anything in particular but to find patterns. These patterns were then represented as a visual “semantic fingerprint” on a grid of 128×128 pixels. Clumps of pixels in similar places represented semantic similarity. This method can be used to disambiguate words with multiple meanings: the fingerprint of “organ” shares features with both “liver” and “piano” (because the word occurs with both in different parts of the training data). This might allow a natural-language system to distinguish between pianos and church organs on one hand, and livers and other internal organs on the other.
Proper conversation between humans and machines can be seen as a series of linked challenges: speech recognition, speech synthesis, syntactic analysis, semantic analysis, pragmatic understanding, dialogue, common sense and real-world knowledge. Because all the technologies have to work together, the chain as a whole is only as strong as its weakest link, and the first few of these are far better developed than the last few.
The hardest part is linking them together. Scientists do not know how the human brain draws on so many different kinds of knowledge at the same time. Programming a machine to replicate that feat is very much a work in progress.
Looking ahead: For my next trick
Talking machines are the new must-haves
IN “WALL-E”, an animated children’s film set in the future, all humankind lives on a spaceship after the Earth’s environment has been trashed. The humans are whisked around in intelligent hovering chairs; machines take care of their every need, so they are all morbidly obese. Even the ship’s captain is not really in charge; the actual pilot is an intelligent and malevolent talking robot, Auto, and like so many talking machines in science fiction, he eventually makes a grab for power.
Speech is quintessentially human, so it is hard to imagine machines that can truly speak conversationally as humans do without also imagining them to be superintelligent. And if they are super intelligent, with none of humans’ flaws, it is hard to imagine them not wanting to take over, not only for their good but for that of humanity. Even in a fairly benevolent future like “WALL-E’s”, where the machines are doing all the work, it is easy to see that the lack of anything challenging to do would be harmful to people.
Fortunately, the tasks that talking machines can take off humans’ to-do lists are the sort that many would happily give up. Machines are increasingly able to handle difficult but well-defined jobs. Soon all that their users will have to do is pipe up and ask them, using a naturally phrased voice command. Once upon a time, just one tinkerer in a given family knew how to work the computer or the video recorder. Then graphical interfaces (icons and a mouse) and touchscreens made such technology accessible to everyone. Frank Chen of Andreessen Horowitz, a venture-capital firm, sees natural-language interfaces between humans and machines as just another step in making information and services available to all. Silicon Valley, he says, is enjoying a golden age of AI technologies. Just as in the early 1990s companies were piling online and building websites without quite knowing why, now everyone is going for natural language. Yet, he adds, “we’re in 1994 for voice.”
1995 will soon come. This does not mean that people will communicate with their computers exclusively by talking to them. Websites did not make the telephone obsolete, and mobile devices did not make desktop computers obsolete. In the same way, people will continue to have a choice between voice and text when interacting with their machines.
Not all will choose voice. For example, in Japan yammering into a phone is not done in public, whether the interlocutor is a human or a digital assistant, so usage of Siri is low during business hours but high in the evening and at the weekend. For others, voice-enabled technology is an obvious boon. It allows dyslexic people to write without typing, and the very elderly may find it easier to talk than to type on a tiny keyboard. The very young, some of whom today learn to type before they can write, may soon learn to talk to machines before they can type.
Those with injuries or disabilities that make it hard for them to write will also benefit. Microsoft is justifiably proud of a new device that will allow people with amyotrophic lateral sclerosis (ALS), which immobilises nearly all of the body but leaves the mind working, to speak by using their eyes to pick letters on a screen. The critical part is predictive text, which improves as it gets used to a particular individual. An experienced user will be able to “speak” at around 15 words per minute.
People may even turn to machines for company. Microsoft’s Xiaoice, a chatbot launched in China, learns to come up with the responses that will keep a conversation going longest. Nobody would think it was human, but it does make users open up in surprising ways. Jibo, a new “social robot”, is intended to tell children stories, help far-flung relatives stay in touch and the like.
Another group that may benefit from technology is smaller language communities. Networked computers can encourage a winner-take-all effect: if there is a lot of good software and content in English and Chinese, smaller languages become less valuable online. If they are really tiny, their very survival may be at stake. But Ross Perlin of the Endangered Languages Alliance notes that new software allows researchers to document small languages more quickly than ever. With enough data comes the possibility of developing resources—from speech recognition to interfaces with software—for smaller and smaller languages. The Silicon Valley giants already localise their services in dozens of languages; neural networks and other software allow new versions to be generated faster and more efficiently than ever.
There are two big downsides to the rise in natural-language technologies: the implications for privacy, and the disruption it will bring to many jobs.
Increasingly, devices are always listening. Digital assistants like Alexa, Cortana, Siri and Google Assistant are programmed to wait for a prompt, such as “Hey, Siri” or “OK, Google”, to activate them. But allowing always-on microphones into people’s pockets and homes amounts to a further erosion of traditional expectations of privacy. The same might be said for all the ways in which language software improves by training on a single user’s voice, vocabulary, written documents and habits.
All the big companies’ location-based services—even the accelerometers in phones that detect small movements—are making ever-improving guesses about users’ wants and needs. The moment when a digital assistant surprises a user with “The chemist is nearby—do you want to buy more haemorrhoid cream, Steve?” could be when many may choose to reassess the trade-off between amazing new services and old-fashioned privacy. The tech companies can help by giving users more choice; the latest iPhone will not be activated when it is laid face down on a table. But hackers will inevitably find ways to get at some of these data.
The other big concern is for jobs. To the extent that they are routine, they face being automated away. A good example is customer support. When people contact a company for help, the initial encounter is usually highly scripted. A company employee will verify a customer’s identity and follow a decision-tree. Language technology is now mature enough to take on many of these tasks.
For a long transition period humans will still be needed, but the work they do will become less routine. Nuance, which sells lots of automated online and phone-based help systems, is bullish on voice biometrics (customers identifying themselves by saying “my voice is my password”). Using around 200 parameters for identifying a speaker, it is probably more secure than a fingerprint, says Brett Beranek, a senior manager at the company. It will also eliminate the tedium, for both customers and support workers, of going through multi-step identification procedures with PINs, passwords and security questions. When Barclays, a British bank, offered it to frequent users of customer-support services, 84% signed up within five months.
Digital assistants on personal smartphones can get away with mistakes, but for some business applications the tolerance for error is close to zero, notes Nikita Ivanov. His company, Datalingvo, a Silicon Valley startup, answers questions phrased in natural language about a company’s business data. If a user wants to know which online ads resulted in the most sales in California last month, the software automatically translates his typed question into a database query. But behind the scenes a human working for Datalingvo vets the query to make sure it is correct. This is because the stakes are high: the technology is bound to make mistakes in its early days, and users could make decisions based on bad data.
This process can work the other way round, too: rather than natural-language input producing data, data can produce language. Arria, a company based in London, makes software into which a spreadsheet full of data can be dragged and dropped, to be turned automatically into a written description of the contents, complete with trends. Matt Gould, the company’s chief strategy officer, likes to think that this will free chief financial officers from having to write up the same old routine analyses for the board, giving them time to develop more creative approaches.
Carl Benedikt Frey, an economist at Oxford University, has researched the likely effect of artificial intelligence on the labour market and concluded that the jobs most likely to remain immune include those requiring creativity and skill at complex social interactions. But not every human has those traits. Call centres may need fewer people as more routine work is handled by automated systems, but the trickier inquiries will still go to humans.
Much of this seems familiar. When Google search first became available, it turned up documents in seconds that would have taken a human operator hours, days or years to find. This removed much of the drudgery from being a researcher, librarian or journalist. More recently, young lawyers and paralegals have taken to using e-discovery. These innovations have not destroyed the professions concerned but merely reshaped them.
Machines that relieve drudgery and allow people to do more interesting jobs are a fine thing. In net terms they may even create extra jobs. But any big adjustment is most painful for those least able to adapt. Upheavals brought about by social changes—like the emancipation of women or the globalisation of labour markets—are already hard for some people to bear. When those changes are wrought by machines, they become even harder, and all the more so when those machines seem to behave more and more like humans. People already treat inanimate objects as if they were alive: who has never shouted at a computer in frustration? The more that machines talk, and the more that they seem to understand people, the more their users will be tempted to attribute human traits to them.
That raises questions about what it means to be human. Language is widely seen as humankind’s most distinguishing trait. AI researchers insist that their machines do not think like people, but if they can listen and talk like humans, what does that make them? As humans teach ever more capable machines to use language, the once-obvious line between them will blur.
Source: The Economist, Technology Quarterly, Jan 7th, 2017