sentencepiece
https://github.com/google/sentencepiece
C++
Unsupervised text tokenizer for Neural Network-based text generation.
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
C++ not yet supported0 Subscribers
Add a CodeTriage badge to sentencepiece
Help out
- Issues
- Doesn't seem to work with Python 3.13
- The pip command to install the SentencePiece Python module fails.
- Initialized number of seed sentencepieces too low
- Update artifact actions from v3 to v4
- Asan detects memory leak in sentencepiece/_sentencepiece.cpython-312-x86_64-linux-gnu.so+0x6f7f4
- Bump the github-actions group across 1 directory with 3 updates
- Bump the build-time-deps group across 1 directory with 4 updates
- Enhancements to CI Workflows and Python Module Initialization with Minor Fixes
- Compatibility Issue when using v0.2.0 with transformers and tensorflow
- Crashes on out of range inputs depending on other inputs
- Docs
- C++ not yet supported