Natural language processing is the use of software to communicate in ordinary human languages, such as English and Chinese. It falls under the broad area of artificial intelligence. Computer pioneer Alan Turing believed that for a computer to ”think,” it would have to be able to accept natural-language text. Many chatbots rely on NLP to process user input.
NLP can also be called NLU – Natural Language Understanding – to emphasize the goal of understanding the main intent. NLP focuses on processing natural language, with focus on things like sentence structure, word cases, and semantics. NLP may tell that in the sentence ”I am hungry”, ”being” is the main verb, it has present tense and a dozen other data points. NLU aims to simplify the sentence to a computer readable structured format, usually selecting the most suitable intent from the finite list of option available. For example, ”I am super hungry” is a request to show the menu.
NLP focuses on understanding unrestricted text input. Producing output is a simpler job, since the software can work from templates. The two big issues in NLP are syntax and semantics. First it has to recognize the grammatical structure of a sentence, then it has to extract the meaning.
With spoken input, there is an additional step at the beginning. The software has to identify the sounds of the user’s voice and convert them into words. This stage often includes machine learning to adapt to a particular user’s voice.
The difficulties in NLP
Natural-language statements are often ambiguous. Suppose that a customer says to a purchasing system, ”I want to buy another one.” Does that mean ”I want to buy one more” or ”I want to buy a different one instead”? Software needs to recognize ambiguities, pick the most likely interpretation based on the context, and backtrack if the user objects.
In an extended interaction, statements will refer back to previous ones, and the software needs to get the referents of pronouns right. What does the ”it” in ”I would like to order it?” mean in the context of the earlier discussion?
An inexact science
Human input isn’t always precisely correct. Even when they know they’re communicating with a computer, people will use sentence fragments, incorrect grammar, and misspelled words. NLP software needs a reasonably high level of tolerance for them, or it will constantly fail to handle input. Much of NLP is based on probabilities; it decides which of several possible interpretations is most likely.
NLP is constantly improving but will never be a perfect science. More formal input methods are necessary when precision is important, but natural-language input is a user-friendly and convenient method for many situations.