Pragmatic Tokenizer
Pragmatic Tokenizer is a multilingual tokenizer to split a string into tokens. Looking for developers with knowledge in languages outside of English to help add specs or add stop word / abbreviation lists for languages with poor coverage.
Post a comment