[ Team LiB ] |
Hack 44 Get Purchase Circle Products with Screen ScrapingPurchase Circles provide a unique look at sales patterns. You can access them programmatically only with screen scraping. Amazon's purchase circles are specialized bestseller lists broken down by geography or organization. If you visit the Friends & Favorites page, choose "Purchase Circles" from the drop-down list, and type the name of your city, chances are you'll find what's uniquely popular among your fellow residents. Amazon also lists what's popular at universities and large corporations. If everyone at Microsoft is reading about a certain technology, you may find it in the next version of Windows! 44.1 Finding Purchase Circle IDsIn fact, you can link directly to the Microsoft Corporation purchase circle: http://www.amazon.com/exec/obidos/tg/cm/browse-communities/-/211569/ The six-digit code at the end of the URL is the Purchase Circle ID for Microsoft. Every purchase circle has a unique ID. You can find IDs by noting them from URLs as you browse circles. The purchase circles home page (http://www.amazon.com/exec/obidos/subst/community/community.html) is a good place to start. Once you know an ID, you can link to it directly using the URL format. You can also write scripts to access the page and retrieve a list of items. 44.2 The CodeThis script takes a Purchase Circle ID and returns the books listed. Create a file called get_circle.pl and add the following code: #!/usr/bin/perl
# get_circle.pl
# A script to scrape Amazon to retrieve purchase circle products
# Usage: perl get_circle.pl <circleID>
#Take the asin from the command-line
my $circleID =shift @ARGV or die "Usage:perl get_circle.pl <circleID>\n";
#Assemble the URL
my $url = "http://amazon.com/o/tg/cm/browse-communities/-/" .
$circleID . "/t/";
use strict;
use LWP::Simple;
#Request the URL
my $content = get($url);
die "Could not retrieve $url" unless $content;
my $circle = (join '', $content);
while ($circle =~ m!<title>(.*?)</title>!mgis) {
print $1 . "\n\n";
}
while ($circle =~ m!<td.*?<b><a.*?-/(.*?)[?/].*?>(.*?)</a></b>.*?by[RETURN]
(.*?)<br>.*?</td>!mgis) {
my($asin,$title,$author) = ($1||'',$2||'',$3||'');
#Print the results
print $title . "\n" .
"by " . $author . "\n" .
"ASIN: " . $asin .
"\n\n";
}
One thing to note about this code is that it passes the /t/ URL argument to return a text-only version of the purchase circle page. Text-only pages have less HTML, which means that fewer bytes are flying around and it's generally easier to scrape for information. 44.3 Running the HackYou can run this hack, providing a Purchase Circle ID, from the command line like this: perl get_circle.pl insert purchase circle ID 44.4 Hacking the HackThis script returns popular books for a given circle, but there's no reason you can't also get lists of the most popular music or movies for a circle. Add a catalog after the Purchase Circle ID to find what you're looking for. Here are the possible catalogs: music dvd video toy ce (electronics) So, for example, to link directly to DVDs that are popular in Sebastopol, CA, find the Purchase Circle ID, and add /dvd/ to the URL: http://amazon.com/exec/obidos/tg/cm/browse-communities/-/216435/dvd/ If you'd like to keep it text-only as in the script, the /t/ follows the catalog: http://amazon.com/exec/obidos/tg/cm/browse-communities/-/216435/dvd/t/ |
[ Team LiB ] |