Eli Fulkerson .com HomeProjectsSave-results-from-dogpile-search-spy
 


Saving results from Dogpile's Search Spy

Description:

Update: Dogpile Searchspy is dead and gone, this script doesn't work anymore. However, I have 4,023.42 megabytes of output ;)

Dogpile uses an XML feed to insert data into its flash-based "Search Spy" application. This script runs in the background, contacts the XML feed manually, and archives the results in a text file.

Platform:

  • Any *nix that includes bash, lynx, date and sed.

  • Background

    Dogpile's Search Spy is very handy if you are trying to understand in a general way what kind of things people search for on the Internet. The application itself is written in flash, and pulls data from one of two XML feeds:

  • http://www.dogpile.com/info.dogpl/searchspy/inc/data.xml - unfiltered adult version. Warning: This feed almost *always* contains something objectionable, so peruse at your own risk.
  • http://www.dogpile.com/info.dogpl/searchspy/inc/data.xml?filter=1 - filtered version
  • Clicking on either feed will load the xml file in your browser. Refreshing will pull in a different set of keywords.

    The 'retriever' script

    This is a script that automates the scraping of the XML feed, stripping the extraneous XML data as it goes, and saving the results to an ongoing logfile. It defaults to the 'filtered' feed, but is adjustable if you want to log unfiltered results. It will run until killed, pulling a new copy of the XML feed down every 3 seconds or so. The is roughly the same frequency which the Search Spy application itself contacts the feed.

    #!/bin/bash
    
    # We are, of course, IE6 under Windows XP
    USERAGENT="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
    
    # Adult version
    # DOGPILE="lynx --dump http://www.dogpile.com/info.dogpl/searchspy/inc/data.xml"
    
    # Safe version
    DOGPILE="http://www.dogpile.com/info.dogpl/searchspy/inc/data.xml?filter=1";
    
    # default output file is 'dogpile.out'
    OUTFILE="dogpile.out"
    
    # Check to make sure that we have our necessary bits
    if which lynx > /dev/null; then
        echo -n
    else
        echo Error, 'lynx' is required.
        exit 1;
    fi
    
    if which sed > /dev/null; then
        echo -n
    else
        echo Error, 'sed' is required.
        exit 1;
    fi
    
    if which date > /dev/null; then
        echo -n
    else
        echo Error, 'date' is required.
        exit 1;
    fi
    
    echo Retriever:  Digging for bones.  Output appends to $OUTFILE.mm_dd_yy
    
    while [ 1 -eq 1 ]; do
        lynx --dump -useragent=$USERAGENT $DOGPILE 2> /dev/null | sed -e :a -e 's/<[^>]*>/\n/g;/> $OUTFILE.`date +%m_%d_%y`
        sleep 3;
    done;
    

    Example Output:

    container gardening
    sector zero virus
    ali frazier
    hotmail
    high school musical
    irs.gov
    best legs
    cds in microwave ovens
    cesarean
    ingredients of success
    buycom promotion code
    free psp downloads
    dating advice
    disney mermaid wand
    large pendant silver star
    real estate
    famous quotes about the holocaust
    michigan golfing
    bath room color ideas
    dewalt dw705r
    "cab by train lyrics"
    fun job opening in phoenix
    das es freud
    picture frames
    buck buchanan mount vernon texas
    employee benefit job
    crazy frog free ringtone
    ... and so on and so forth ...
    

    Download (plain text)