wpull
https://github.com/archiveteam/wpull
HTML
Wget-compatible web downloader and crawler.
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
HTML not yet supported7 Subscribers
Add a CodeTriage badge to wpull
Help out
- Issues
- Handle buggy http:/// and https:/// redirects
- youtube-dl option records multiple instances in warc
- URL fetches are not logged in cygwin environment
- Next warc is started on resuming, regardless --warc-max-size, when --warc-append
- Change order of retries: retry all errors once before reattempting the remaining errors
- wpull parsing HTMLs for links even if it doesn't have to
- ftp crash: sre_constants.error: bad character range
- Support text file for sitemaps or general link extraction
- Logging error: RuntimeError: reentrant call inside _io.BufferedWriter
- Show general progress of fetched vs todo URLs
- Docs
- HTML not yet supported