modin
https://github.com/modin-project/modin
Python
Modin: Scale your Pandas workflows by changing a single line of code
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported5 Subscribers
Add a CodeTriage badge to modin
Help out
- Issues
- BUG: df[col].replace(dict, inplace=True) is brutally slow, while .apply which does the same is blazing fast
- Define heuristics to automatically enable dynamic partitioning without performance penalty.
- df.max()/min() on 1 column df leads to "could not broadcast input array from shape (6,) into shape (5,)" what from parquet loaded with ray
- Use pre-commit library to run linters
- Improve the user facing DataFrame and Series constructor by hide the query_compiler parameter
- BUG: `Dataframe.astype(...)` Fails With `'bool' object has no attribute 'all'`
- BUG: [RAY] ray initialisation sets _memory and object_store_memory to the same value, leading to crashes and less flexibility
- to_parquet() needs option of how many files to create, or like rays implementation: num_rows_per_file
- BUG: outofmemory read from big file and dump to a new one
- [RAY] to_parquet() fails when spilled objects reach 64gig... Also my data is just 40gig
- Docs
- Python not yet supported