« Dynamic Picasa 2.0? | Main | The Interpreter »

May 08, 2005

CraigList Bot

OK - I've been thoroughly impressed with CraigsList (read more).

Their numbers and usefulness is astounding:

1.7 billion page views per month
7 million unique visitors per month

I have used Craigslist for the following:
- to find about 25% of my employees.
- a new home theater system :-)
- several times to find guys to haul trash cheaply.
- to find contractors - drywall, etc
- to find an outdoor Jacuzzi in great shape for $250.
- to find a big screen JVC HDTV for $600.

So I've been looking for some nice outdoor patio furniture and several other things - and decided to code a "CraigsList Bot" using my out-of-date coding skills. Essentially, the Bot scans CraigsList's XML feeds for keywords several times per day and emails out the results. It very primitive (and has a lot of cool things which can be added - like an web-based interface like news.google.com alerts has).

Here's the code. (Updated July 27th to give Jeremy Zawodny credit.)

#!/usr/bin/perl -w
#
# CraigsList Bot v.01
# (c)2005 Chris Ueland
#
# Derived from Jeremy Zawodny's Jan 17 2004 post:
# http://jeremy.zawodny.com/blog/archives/001440.html
#
# (Thanks to Bill Mitchel (http://www.billmitchell.info)
#
# Currently searches the Los Angeles craigslist for sale category
# for patio and home theater and emails the results out.
# The script can be triggered from cron. Something like this:
# 30 22 * * * perl /var/www/html/craigslist/index.pl
#
# Thanks, Ethan, for the time functions.
#
#


use XML::Simple;
use LWP::Simple;
use Data::Dumper;

my $debug = 0;

my @feeds = (
'home+theater','patio'
);

for my $feed (@feeds)
{
my $xml = get("http://losangeles.craigslist.org/cgi-bin/search?areaID=7&subAreaID=&type_search=1&query=$feed&cat=sss&format=rss");
$xml =~s/^(.*)\<\?xml/\<\?xml/;
$xml =~ s/(.*)\n//;
$xml =~ s/\015//g;
$xml =~s/0\n//g;
#print $xml;

my $ref = XMLin($xml);
my $items = $ref->{item};

if ($debug)
{
print "$xml";
print Data::Dumper->Dump([$items]);
exit;
}

open (MAIL,"|/usr/sbin/sendmail -t -fchris\@wherever.com");

print MAIL "MIME-Version: 1.0\n";
print MAIL "Content-Type: text/html\n";
print MAIL "To: chris\@wherever.com\n";
print MAIL "From: Craigslist Bot \n";
print MAIL "Subject: CraigsList Search: $feed\n\n";

for my $item (@$items)
{
my $title = $item->{title};
my $url = $item->{link};
my $date = $item->{'dc:date'};

my $stamp = stamp_str2num($date);

$date_nice= localtime($stamp)."\n";
# $date_diff = sprintf("Age: %d:%02d", (time - $stamp) / 3600, (time - $stamp) / 60 % 60);
$date_diff = pretty_time(time - $stamp);
# regex match goes here
#if ($title =~ /runner/i)
#{
print MAIL "$title\n
$date_nice ($date_diff)
$url\n\n

";
#}
}
# don't suck feeds too quickly
sleep 2;
}

close(MAIL);
exit;

sub stamp_str2num {
use POSIX;
my ($str) = @_;
return 0 if $str !~ m{^(\d\d\d\d)-(\d\d)-(\d\d)T(\d\d):(\d\d):(\d\d)([+-]\d\d):(\d\d)$};
return POSIX::mktime($6, $5, $4, $3, $2 - 1, $1 - 1900, 0, 0, -1);
}

sub pretty_time { my ( $delta ) = @_; my $str = ''; $str .= sprintf("%dd ", $delta / 86400, $delta %= 86400) if $delta >= 86400; $str .= sprintf("%dh ", $delta / 3600, $delta %= 3600) if $delta >= 3600; $str .= sprintf("%dm", $delta / 60, $delta %= 60); return $str; }
__END__

Posted by Chris at May 8, 2005 05:09 PM

Comments

Why not just add the RSS feeds to your favorite aggregator, then set up a rule to match on certain terms and move all those posts into a "Latest CraigsList Shit" folder? Easy and straight to desktop.

(Or Apple only tip.. bookmark the RSS feeds into a "Craigslist" folder in Safari, then use "View all RSS articles" on that folder when new posts appear in the count - although I prefer to use a proper RSS aggregator myself)

Posted by: Peter Cooper at May 9, 2005 11:43 AM

Demo (your site doesn't seem to allow IMG tags):

http://img230.echo.cx/img230/9215/picture10nn.th.png

Most RSS aggregators on most platforms support basic features like this.

Posted by: Peter Cooper at May 9, 2005 11:47 AM

Eugh, ignore that one. I was trying to use ImageShack for quick image hosting for the first time *g* This is much better:

http://img230.echo.cx/my.php?image=picture10nn.png

Posted by: Peter Cooper at May 9, 2005 11:49 AM

Good ideas. I'll post once I have it setup. I don't really like any Windows based apps by default (I prefer web-based since I can access them anywhere) - but this seems completely justified.

Posted by: Chris Ueland at May 10, 2005 09:34 AM

I use craigslist a lot and could use an RSS-like feed to sort through items I'm looking for. However, I'm unfamiliar w/ hotscripts. What do I specifically do with the code you provided? Cut and paste to what? Thanks and you're welcome for my ignorance.

Posted by: Miles at May 14, 2005 07:13 AM

Is your script derived from Jeremy Zawodny's Jan 17 2004 post?

http://jeremy.zawodny.com/blog/archives/001440.html

Posted by: Bill Mitchell at July 27, 2005 09:51 AM

Thanks for the reminder Bill. Jeremy now is given the credit he is due. I had originally modified his script to make it work for me, and posted it for a friend (forgetting to cite the original).

Good work and post Jeremy and thanks Bill. Feel free to use the code however is useful for you guys.

Posted by: Chris Ueland at July 27, 2005 02:21 PM

If you need discounted office supplies, call Darren at 310-277-0040, mention you know Chris and I will give you the Ueland Discount. We have free delivery all over Los Angeles, furniture, toner, paper, etc... We will MEET OR BEAT ANY ADVERTISED PRICE!!!

Posted by: Darren at October 26, 2005 02:25 PM

guys, i need to implement this search bot...but me no programmer....are there instructions on how i can set it up on my machine.

best, paul.

Posted by: paul at November 7, 2005 09:15 AM

Post a comment




Remember Me?