A simple explanation for Sentence Segmentation and implementation

Mohamed Bakrey Mahmoud
2 min readFeb 20, 2023

--

What is the Sentence Segmentation?

It is a property that divides sentences according to their context, in each sentence according to its beginning, as well as the context under which it falls.

How is this process done?

This method is done by choosing one of the two methods:
Is there a clear sign in the sentence, such as ? Oh ! or ,
It is also done by using an unclear mark such as the period (.), which may have several other uses, such as using it when abbreviating dr. Or so the ml is used to know whether this point is for the end of the sentence or is used to express another purpose. Also, an algorithm can be made manually that knows the end of the sentence.

As looking at the case of the letters before and after the point, it indicates the EOF( End Of Sentences) or the, and also looking at the length of the word after it.

Implementation

Import Libraries

import spacy
nlp = spacy.load('en_core_web_sm')
doc1 = nlp(u'This is the first sentence. This is another sentence. This is the last sentence.')

doc1.sents

Output

Here work has been done to display the special ability to divide sentences..

for sent in doc1.sents:
print(sent)
This is the first sentence.
This is another sentence.
This is the last sentence.

It is also possible to know if a specific word is the beginning of the sentence or not, like this:

print(doc1[6])
doc1[6].is_sent_start

Output

This
True

And each word can be examined separately to determine which is the beginning of the sentence, like this:

doc2 = nlp(u'This is a sentence. that is a sentence. here is a sentence.')

for token in doc2:
print(token.is_sent_start, ' '+token.text)

Output

True  This
None is
None a
None sentence
None .
True that
None is
None a
None sentence
None .
True here
None is
None a
None sentence
None .

Conclusion

In this intermediate article, work has been done to explain what the Sentence Segmentation is, how it works, and its applied part.

--

--

Mohamed Bakrey Mahmoud
Mohamed Bakrey Mahmoud

No responses yet