Topic 01: Introduction to Natural Language Proccessing
NLP concepts and work flow
What Is NLP?
Everything we express (either verbally or in written) carries a huge amount of information. The topic we choose, our tone, our selection of words, everything adds some type of information that can be interpreted and value can be extracted from it. In theory, we can understand and even predict human behaviour using that information.
Even for centuries a large amount of data has been stored in books, manuscripts and as data is a boon in(IT sector) today's world so using this data may act as a helping hand for making our technology more innovative.
Well, it's not that as simple as it seems. There is some issue while handling these types of data.
Data is unstructured: Data generated from conversations, declarations, or even tweets are examples of unstructured data. Unstructured data doesn’t fit neatly into the traditional row and column structure of relational databases, and represent the vast majority of data available in the actual world. It is messy and hard to manipulate.
A computer only understands numeric data: that data generated from books, speech, text, etc are not in numeric form hence it's difficult for a computer to interpret these data.
Here where Natural Language Processing comes to the rescue.
Natural Language is a language that we humans can understand e.g. Hindi, English, Spanish, etc and these languages are not understood by computer hence for a computer to understand these languages they need to process it and convert it into the form which the computer can interpret so the term Natural Language Processing is coiled.
How NLP is applicable in today's industries?
There is a collection of fundamental tasks that appear frequently across various NLP projects.
1).Language modeling
This is the task of predicting what the next word in a sentence will be based on the history of previous words. The goal of this task is to learn the probability of a sequence of words appearing in a given language. Language modeling is useful for buildings solutions for a wide variety of problems, such as speech recognition, optical character recognition, handwriting recognition, machine translation, and spelling correction.
2).Text classification
This is the task of bucketing the text into a known set of categories based on its content. Text classification is by far the most popular task in NLP and is used in a variety of tools, from email spam identification to sentiment analysis.
3).Information extraction
As the name indicates, this is the task of extracting relevant information from text, such as calendar events from emails or the names of people mentioned in a social media post.
4).Information retrieval
This is the task of finding documents relevant to a user query from a large collection. Applications like Google Search are well-known use cases of information retrieval.
5).Conversational agent
This is the task of building dialogue systems that can converse in human languages. Alexa, Siri, etc., are some common applications of this task.
6).Text summarization
This task aims to create short summaries of longer documents while retaining the core content and preserving the overall meaning of the text.
7).Question answering
This is the task of building a system that can automatically answer questions posed in natural language.
8).Machine translation
This is the task of converting a piece of text from one language to another. Tools like Google Translate are common applications of this task.
9).Topic modeling
This is the task of uncovering the topical structure of a large collection of documents. Topic modeling is a common text mining tool and is used in a wide range of domains, from literature to bioinformatics.
Approaches to NLP
The different approaches used to solve NLP problems commonly fall into three categories: heuristics, machine learning, and deep learning.
Heuristic-Based NLP
Similar to other early AI systems, early attempts at designing NLP systems were based on building rules for the task at hand. This required that the developers had some expertise in the domain to formulate rules that could be incorporated into a program. Such systems also required resources like dictionaries and thesauruses, typically compiled and digitized over a period of time.
-
Advantage :
-
It's easily adaptable
-
Simple to debug
-
Enormous training corpus not needed
-
Comprehends the language
-
High perfection
-
-
Disadvantage :
-
Proficient developers and linguists required
-
Slow parser development
-
Moderate recall (coverage)
-
Machine Learning in NLP
Machine learning techniques are applied to textual data just as they’re used on other forms of data, such as images, speech, and structure data. Supervised machine learning techniques such as classification and regression methods are heavily used for various NLP tasks. As an example, an NLP classification task would be to classify news articles into a set of news topics like sports or politics. On the other hand, regression techniques, which give a numeric prediction, can be used to estimate the price of a stock based on processing the social media discussion about that stock. Similarly, unsupervised clustering algorithms can be used to club together text documents.
-
Advantage :
-
It's can scale effortlessly
-
Learnability without clear programming.
-
Quick development if the dataset is available.
-
-
Disadvantage :
-
Training corpus with annotation needed
-
Hard to debug
-
Zero understanding of the language.
-
Deep Learning for NLP
In the last few years, we have seen a huge surge in using neural networks to deal with complex, unstructured data. Language is inherently complex and unstructured. Therefore, we need models with better representation and learning capability to understand and solve language tasks. Here are a few popular deep neural network architectures that have become the status quo in NLP.
-
Recurrent Neural Networks
-
Long-short Term Memory
-
Convolutional Neural Networks
-
Transformers
-
Autoencoders
With this broad overview in place, let’s start delving deeper into the world of NLP.
1. Notes are compiled from Practical Natural Language Processing: A Comprehensive Guide to Building Real-World NLP Systems, Introduction to Natural Language Processing (NLP)↩
2. If you face any problem or have any feedback/suggestions feel free to comment.↩