Live Demo
What is Noisy Text?
Noisy text refers to text data that contains various types of errors, inaccuracies, or
inconsistencies that make it difficult to analyze or process with natural language processing
(NLP) techniques, and it usually stems from user generated content.
If you have such a content, you may opt for using a text normalizer since all basic NLP modules
operate on canonical (correct) forms. Text normalization can be considered as a pre-processing
step which focuses transforming user generated content to their canonical counterparts. You can
click here for more detailed information.
What is Document Creation Date used for?
Your text may include temporal (time related) expressions; either absolute (such as Eylül
2023 ) or relative (geçtiğimiz yıl ). For relative temporal expressions, DCT
(Document Creation
Date) is used for normalizing the reference date with respect to DCT. If you do not have any
relative temporal expression in the text you submit, DCT has no effect.