Comprehensive Analysis of NLP

Ansh
5 min readAug 22, 2023

--

Hello guys, welcome to this technical blog

So the full form of NLP is Natural Language Processing

A little context development about its past

“ The field of natural language processing began in the 1940s, after World War II. At this time, people recognized the importance of translation from one language to another and hoped to create a machine that could do this sort of translation automatically. However, the task was obviously not as easy as people first imagined”

“In the 1980s, there was a shift towards statistical NLP, which uses machine learning algorithms to learn the statistical relationships between words and phrases. Statistical NLP systems are more robust and scalable than rule-based systems, and they have been used to achieve significant results in a variety of NLP tasks, such as machine translation, speech recognition, and text summarization”

Why do we actually need NLP

Natural language processing (NLP) helps computers communicate with humans in their own language and scale other language-related tasks. However, human speech is far more complex than most people realize. There are rules, such as spelling and grammar, but how we interpret speech and text is far less well-defined. For example, how do you know when a person is being sarcastic? In human language, words can say one thing, but the context and the tone can make those words mean something else. It takes humans half a lifetime to learn the subtle nuances of language. So, NLP comes as a life savior and handles it very beautifully. NLP enables computers to understand natural language as humans do. It uses AI to take real-world input and process it in a way that makes it sensible enough for the computer to understand.

Technical Things Behind NLP

It involves four major steps for data preprocessing

Tokenization: In this step, the text is broken down into smaller units to work with, for example, the sentence can be tokenized into words.

Stop word removal: The most irrelevant words are removed from the text like to, for, and.

Lemmatization and stemming: This is when words are reduced to their root forms to process. For example, Caring would return Care, and working would return Work.

Part-of-speech tagging: This is when words are marked based on the part of speech they are — such as nouns, verbs, and adjectives.

Natural Language Processing: This is the ability of computer programs to understand human language as it's spoken and written. It's a component of AI.

This is superficially all about the data preprocessing steps. Now let's jump to the algorithm part.

It uses mainly two algorithm

Rule-Based system: It follows dedicated rules based on the language.

Machine Learning Approach: Statically driven methods are used in this approach. They perform tasks based on training just like some traditional machine learning algorithms.

Now let's understand it through a code to get a better understanding.

So the project aim was to summarize lengthy paragraphs into the smaller and more relevant text

So important libraries here are NLTK, text blob, spacy, sklearn and seaborn

NLTK Fullform is a Natural language toolkit, it embeds all the general rules of grammar to make the machine understand. the human context

I extracted the corpus using the web scrapping method and then I cleaned the text using some traditional techniques.

I extracted the first neutral article from Wikipedia and then some articles about the benefits of EVs. I extracted the article about the disadvantages of the EV.

Here are some of the ways I cleaned the data

I named the corpus of three articles as the combine_corpus and removed spaces and some irrelevant data

Then I imported the nltk. tokenize to remove the stopwords

Then I imported the spacy library as it provides the result in the object form whereas nltk is preferred for the string values.

Then via the help of the spacy library and the for loop, I basically counted the frequency of the words and add it to the word frequency array.

For the summary purpose, we have tokenized the sentence from the documents.

And then calculated the sentence score according to the sentence frequency through the words

We have selected the top 30% of the sentence frequency-wise.

Word cloud I Extracted from the articles

Now let's jump off to the real-life use cases of the nlp

Speech Analysis: Technology that leverages artificial intelligence and natural language processing (NLP) to process and analyze customer conversations from live or recorded audio data.17-Nov-2021

Chatbot: These AI-powered chatbots use a branch of AI called natural language processing (NLP) to provide a better user experience. Often referred to as virtual agents or intelligent virtual assistants.

Summary extraction : Uses advanced NLP techniques for language generation to understand the context and generate the summary.

Link for the same project :

--

--

Ansh

An Engineer exploring his intrest in writing , Loves to Write on Finance ,Non tech , Tech Blogs