Pergunta de entrevista da empresa Celebal Technologies

stemming, lemmatization and tokenization

Resposta da entrevista

Sigiloso

14 de set. de 2022

Tokenization - It is the process of breaking down the given text into the smallest unit in a sentence called a token. Punctuation marks, words, and numbers can be considered tokens. Stemming- the process of finding the root of words. Lemmatization- The process of finding the form of the related word in the dictionary. It is different from Stemming. It involves longer processes to calculate than Stemming.