Monday, December 17, 2012

Use Python to scrape data from Newegg and store it in an SQLite database

Check out this repo of mine on Github for a set of Python scripts for scraping various data from and storing it in an SQLite database.

The scripts use the mobile site to retrieve the list of all the products in a category, then uses the product ID for each product in that category to fetch and parse data from the Newegg JSON API before transforming it into a pandas DataFrame and dumping it into a table in the SQLite database.

The reason for using the mobile site is because it is lighter weight than the desktop version of the site.

As of right now, I have scripts in the Github repo setup to collect data on the following:

  • Desktop CPUs
  • Desktop Memory
  • Hard Drives
  • Laptops
  • LCD/LED/Plasma TVs
  • PS3 Games
  • XBox 360 Games

Each script dumps the data into a separate table in the SQLite database file. Feel free to tweak these scripts however you like, perhaps to retrieve different data from the Newegg JSON API for each product or even to change it to grab data from another set of products on Newegg. Enjoy!

No comments:

Post a Comment