Evopricing
Description
Evo
May 22, 2017
I developed a “smart” Scrapy spider that was trainable via a Django web GUI. This work was completed mostly alone, but at times I took the role of lead developer, training a team of junior contracted developers, collaborating via git and performing code reviews.
This project allows a semi-technical user (XPATH knowledge) to configure the spider for different target sites and schedule periodic runs.
In this role, I dealt with many challenges, such as:
- Prototyping and figuring out the initial architecture
- Ban avoidance with custom proxy-rotation and user-agent rotation middleware
- Data integrity and cleaning
- Writing an API in DRF allowing frontend run-scheduling and monitoring.
- Distributed scrapes
- Provisioning servers from scratch.
- Writing custom loaders, extensions, middleware and pipelines.