Chetan Dube and Christopher Manning on AI’s challenges and potential

Humans are very good at “thinking about thinking.” Such pondering may have offered some diversion and stimulation, but for most of history, it’s offered little in the way of practicality. Fast forward to 2018, and studying the nature of intelligence and devising ways to replicate it isn’t only practical, it’s fundamentally reinventing how the world does business.

As we at IPsoft developed and refined our industry-leading digital colleague, Amelia, we tapped the insight of the renowned and influential Stanford University Professor of Linguistics and Computer Science Christopher Manning. Every week for the last several years, Manning has conferred with our engineers to influence and shape the DNA that goes into making Amelia the most-human AI that you can find on the market.

At the recent Digital Workforce Summit in New York, IPsoft CEO Chetan Dube sat down with Professor Manning to talk about his ground-breaking research, how software can learn to replicate humans, and what the world should expect to see from AI in the “exponential future.”

The following transcript has been edited for brevity and clarity.

Chetan Dube: I want to start by asking you how your research on embedded language model vectors has allowed technologies like Amelia to develop contextual awareness? What does it take for a machine to understand context, if you could explain?

Christopher Manning: In traditional NLP [natural language processing], words were just words. But we’ve found that we can get a lot more power representing words as vectors of numbers. That sounds kind of weird when you first hear it. Why does a word become a vector of numbers? It gives us a powerful way to represent meaning, so that we can understand word similarities and phrase similarities in very flexible ways.

Most words don’t have just one meaning. For example, a word like “contract” has one meaning in a legal context, but it has totally different meaning when you’re watching a gangster movie. So we have to understand the meaning of words in context. New machine-learning methods involving vectors are giving us much better ways to do that.

Christopher Manning, Stanford Professor, in discussion with Chetan Dube, IPsoft CEO

Christopher Manning, Stanford Professor, in discussion with Chetan Dube, IPsoft CEO

Dube: Extending that concept into learning, when you look at learning and some of your other pioneering work on contextual long short-term memory [LSTM] and deep neural networks – they’ve allowed Amelia to benefit from volumes and volumes of existing chat data. Large corporations have assimilated copious amounts of data and calls into their AI. Tell us about how organizations can benefit from the billions of records of conversation that they have in the past.

Manning: Many organizations have handled hundreds of thousands of support calls. And from those, you can extract various kinds of knowledge bases. AI can automatically extract knowledge from your past recordings and chat logs, which is a gold mine of knowledge.

Typically, these interactions are full of examples of a support person helping a user solve a problem or answer a question. Now, with modern deep learning models, we can take a new customer’s problem and find someone who had the same problem in the past and match them with that answer, assuming the problem was successfully solved the last time. This method allows us to easily route information to the new person, so they can get an answer immediately, without having to spend 20 minutes on hold.

Dube: That is absolutely brilliant, and we are indebted to you having shared that research with us that has allowed Amelia to learn from transcripts and other accumulated data, exactly as you pointed out.

Let’s talk about some of differences in systems like, say, Google Duplex that are domain-restricted, where the intent is known. There is a difference between a user telling an AI “I want to make an appointment at the salon” versus when a call comes in to MetLife and the person is asking, “Hey, I’m moving from Ohio to NJ, and I want to be able to now get a similar insurance there.” In the latter example, the AI needs to diagnose the intent before leading the person to a solution. The difference is a domain-restricted problem versus an unbounded one. Comment on that, Professor, for us.

Manning: I think developing this unbounded-domain capability, which we’ve really been working hard on for Amelia, is absolutely essential. I was thinking about this last week, actually, when I headed to the CVS to pick up a prescription. Now, initially I was annoyed because there were three people in line in front of me, but I decided to be a good scientist and get some data here on customer interactions.

We’re in a period of amazing progress in Artificial Intelligence.

— Christopher Manning

What was really remarkable is, each one of those three people in front of me wanted to do something else at the same time. They didn’t just want to pick up their prescription. The first one wanted to talk about details of taking the medicine, like what do they if they missed a dose? The second one had heard this CVS is going to be moving to a new location and wanted to know where it was. The third one just wanted to talk about the baseball results.

In all three cases, they were interweaving the transaction of picking-up stuff at the pharmacy with having another conversation going on at the same time, and that’s just so typical of human interactions. Humans can seamlessly interweave the topics – that’s an absolutely essential capability if you’re going to have a successful cognitive agent.

Dube: Again, that’s perfect how it highlights the fact that the biggest challenge with deep contextual understanding is the ability to understand the intent in order to navigate towards solution. Continuing with that concept – we’ve learned a lot from your research about how to disambiguate what humans are saying. Your technologies allow us to generate questions, so that Amelia is able to ask for elaboration and clarification as she disambiguates between the choices that she has in her semantic memory, which typical chatbot systems don’t have the benefit of.

Christopher Manning and Chetan Dube

Christopher Manning, Stanford Professor, and Chetan Dube, IPsoft CEO

Manning: Chatbots aren’t anything much more than just script to go through and answer some questions and get some information. One of the first big differences is an underlying knowledge and reasoning. We have ontologies underlying Amelia. We have process models related to the kind of things that you can do at a bank, an insurance company, whatever it is. But that means we have to constantly manage this mapping from the surface form of natural language to places in the ontology. That’s a hard thing to do. It’s nothing we can do with 100% accuracy yet, but Chetan is confident that we will be able to by 2025. We’ve got to keep working hard!

But again, this is a place where it’s important to detect intents, which includes understanding similarities and disambiguations between meanings. Often what human beings say is sort of general, unclear, or interpretable. But humans can succeed in conversing naturally, because when they’re not sure of something, they’ll ask for a clarification. So what we’re also aiming to do is to be able to teach Amelia to ask further clarifying questions – in a natural way – to make sure that we’re doing the right thing.

Dube: I remember one call where talked about the exponential times we are living in. Research is going forward at a very fast pace. So, tell us about the future.

Manning: We’re in a period of amazing progress in AI. A lot of the initial process was about sensory tasks, so there was an amazing progress in computer vision, which has been leading into these developments where things like autonomous driving is seeming possible. There was amazing progress in speech recognition, which means that now, when you talk to your phone, it understands brilliantly what you’re trying to say. What we’re now trying to do now is push towards higher levels of cognition –to actually have some kind of models of thoughts.

Deep learning is a way of moving up to deeper levels of abstraction, and our goal is – if we can keep on moving to deeper levels of abstraction – that we’ll have a way of representing thoughts that are somewhat like human thoughts. On the surface, the world is messy, right? The signals that come out of my mouth are completely messy sound wave forms, if you look at them in a kind of spectrogram analysis. But you never notice those messy sound signals because after they go into your ear, they start getting decoded by your brain and moved up in levels of abstraction. So you sort of hear the words, but mainly you hear the thoughts. What we’re wanting to do is do the same kind of disentanglement, so we can have our machines understand thoughts. That’s where we’re trying to push deep-learning technology today.


More DWS Coverage