mistokenize

From Wiktionary, the free dictionary
Jump to navigation Jump to search

English

[edit]

Etymology

[edit]

From mis- +‎ tokenize.

Verb

[edit]

mistokenize (third-person singular simple present mistokenizes, present participle mistokenizing, simple past and past participle mistokenized)

  1. To tokenize incorrectly.
    • 1994, Mark A. Terribile, Practical C++, page 109:
      C++ considers ::*, .* and ->* each to be a single token and a single operator. Some pre-Release 2.0 implementations mistokenize expressions involving pointer-to-pointer-to-member.
    • 2016, Hans-Jörg Schmid, Entrenchment and the Psychology of Language Learning, page 111:
      These were mostly proper names, such as Ronny Johnsen, or foreign language items such as ambre solaire (French) and fairie queene (Middle English), as well as a few misspelt or mistokenized items.
    • 2020 July, Anupama M Nair, Anusha Aji Justus, Arjun Ramesh, Binu Rajan M.R., “Event Extraction from Emails”, in International Journal of Computer Applications, volume 176, number 41:
      The sentences were tokenized into words using the regex tokenizer which avoided the problems of mistokenizing while using the default NLTK tokenizer.
    • 2022, Toni Sivula, Deep Neural Networks in Drug-Target Activity Prediction and Machine Learning Assisted Docking of Ultra-Large Compound Libraries (Master's thesis):
      Similarly, all other two-character atomic representations in SMILES are being mistokenized.

See also

[edit]