What are the Matchers In NLP?

Mohamed Bakrey
2 min readMar 4, 2023

--

What are the Matchers in NLP?

The Matcher lets you find words and phrases using rules describing their token attributes. Rules can refer to token annotations (like the text or part-of-speech tags), as well as lexical attributes like Token.is_punct. Applying the matcher to a Doc gives you access to the matched tokens in context.

Implementation

Read the library

# Spacy 
import spacy
#load from spacy nlp words
nlp = spacy.load('en_core_web_sm')
#inport the matchers
from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)

Example for Matchere:

pattern1 = [{'LOWER': 'solarpower'}]
pattern2 = [{'LOWER': 'solar'}, {'LOWER': 'power'}]
pattern3 = [{'LOWER': 'solar'}, {'IS_PUNCT': True}, {'LOWER': 'power'}]

matcher.add('SolarPower', None, pattern1, pattern2, pattern3)
doc = nlp(u'The Solar Power industry continues to grow as demand for solarpower increases. Solar-power cars are gaining popularity.')
found_matches = matcher(doc)
for a,b,c in found_matches :
print(f'Word ID {a} , starts at {b} & ends at {c} , and word is {doc[b:c]}')

Output:

Word ID 8656102463236116519 , starts at 1 & ends at 3 , and word is Solar Power
Word ID 8656102463236116519 , starts at 10 & ends at 11 , and word is solarpower
Word ID 8656102463236116519 , starts at 13 & ends at 16 , and word is Solar--power

How to remove the words from Matcher

matcher.remove('SolarPower')
pattern1 = [{'LOWER': 'solarpower'}]
pattern2 = [{'LOWER': 'solar'}, {'IS_PUNCT': True, 'OP':'*'}, {'LOWER': 'power'}]
matcher.add('SolarPower', None, pattern1, pattern2)

--

--