Tokenization

The process of breaking text into individual words or subwords for input into language models; essential for preprocessing textual data before analysis or generation tasks in NLP applications.