Lee Hodg

Evopricing

Description

I developed a “smart” Scrapy spider that was trainable via a Django web GUI. This work was completed mostly alone, but at times I took the role of lead developer, training a team of junior contracted developers, collaborating via git and performing code reviews.

This project allows a semi-technical user (XPATH knowledge) to configure the spider for different target sites and schedule periodic runs.

In this role, I dealt with many challenges, such as:

  • Prototyping and figuring out the initial architecture 
  • Ban avoidance with custom proxy-rotation and user-agent rotation middleware 
  • Data integrity and cleaning
  • Writing an API in DRF allowing frontend run-scheduling and monitoring. 
  • Distributed scrapes 
  • Provisioning servers from scratch.
  • Writing custom loaders, extensions, middleware and pipelines.