mistokenize
Jump to navigation
Jump to search
English
[edit]Etymology
[edit]Verb
[edit]mistokenize (third-person singular simple present mistokenizes, present participle mistokenizing, simple past and past participle mistokenized)
- To tokenize incorrectly.
- 1994, Mark A. Terribile, Practical C++, page 109:
- C++ considers ::*, .* and ->* each to be a single token and a single operator. Some pre-Release 2.0 implementations mistokenize expressions involving pointer-to-pointer-to-member.
- 2016, Hans-Jörg Schmid, Entrenchment and the Psychology of Language Learning, page 111:
- These were mostly proper names, such as Ronny Johnsen, or foreign language items such as ambre solaire (French) and fairie queene (Middle English), as well as a few misspelt or mistokenized items.
- 2020 July, Anupama M Nair, Anusha Aji Justus, Arjun Ramesh, Binu Rajan M.R., “Event Extraction from Emails”, in International Journal of Computer Applications, volume 176, number 41:
- The sentences were tokenized into words using the regex tokenizer which avoided the problems of mistokenizing while using the default NLTK tokenizer.
- 2022, Toni Sivula, Deep Neural Networks in Drug-Target Activity Prediction and Machine Learning Assisted Docking of Ultra-Large Compound Libraries (Master's thesis):
- Similarly, all other two-character atomic representations in SMILES are being mistokenized.