[ Team LiB ] Previous Section Next Section

10.2 Web Aggregators: Introducing Meerkat

Meerkat was the first truly open-to-all, searchable, categorized aggregator. Developed by Rael Dornfest, Chair of the RSS 1.0 Working Group and researcher for O'Reilly (the publisher of this book), Meerkat (http://www.oreillynet.com/meerkat/) takes the RSS feeds from a whole range of sites, throws all of the items into a big pot, and makes the resulting mix searchable. Aggregating feeds in this way allows for custom feeds to be made�"The latest news on Java," say, or "Everything containing the keyword `sausages' written in the past 30 minutes."

Meerkat is primarily usable via its web interface, shown in Figure 10-2. This introduces two main concepts for the application: profiles and mobs. To use either of these features, you must first sign up for a user ID with the O'Reilly Network (http://www.oreillynet.com).

Figure 10-2. A screenshot of Meerkat's front page
figs/csr_1002.gif

Profiles are named sets of query parameters. There are global parameters already loaded into Meerkat, based on subject matter (Apache, Perl, P2P, etc.), and you can save your own from the web interface.

Mobs, on the other hand, are collections of stories. You use a mob like a universally retrievable bookmark list. You can add stories to it via the web interface, send its URL to other people, or even query and display the mob via the Meerkat API.

The what? Well, let me tell you . . .

10.2.1 The Meerkat API

Like Syndic8, Meerkat offers an API to allow other applications access to its database. Unlike Syndic8, however, Meerkat is not a web service in the XML-RPC/SOAP mold. Rather, it relies on the REST-architectural system of passing the entire query encoded into a URL. This is very useful in the RSS world, as it allows people to swap URLs of custom feeds from within the existing framework of behavior�not just emailing them or instant messaging to friends, but also by inclusion in blogrolls and mySubscription lists. As such, the REST-based web aggregator provides a different extended service to the directory or the desktop reader.

To query the Meerkat API, you pass it a URL, built up from http://meerkat.oreillynet.com/?, and then a query made out of the following parameters (any spaces in URLs are replaced by %20):

s= (Search For)

Instructs Meerkat to search for something in the item's title or description. This can be either a list of keywords separated with a plus sign (+) or a regular expression enclosed in //.

Example: http://meerkat.oreillynet.com/?s=eggs+ham returns "any stories whose title or description contains either `eggs' or `ham'."

sw= (Search What)

Ordinarily, Meerkat's s= parameter will only search through title and description elements. The sw= option instructs Meerkat to search other specified fields. Currently, Meerkat supports only the simpler Dublin Core elements, so sw= can be either blank (hence: title, description) or a combination of dc_title, dc_creator, dc_subject, dc_description, dc_publisher, dc_contributor, dc_date, dc_type, dc_format, dc_identifier, dc_source, dc_language, dc_relation, dc_coverage, or dc_rights.

Example: http://meerkat.oreillynet.com/[email protected]&sw=dc_contributor returns all the item elements for which [email protected] is listed as contributor.

c= (Channel)

This tells Meerkat to display only the requested channel. It takes the numerical channel ID.

Example: http://meerkat.oreillynet.com/?c=1243 returns only "stories from the `oreillynet.python' newsgroup."

t= (Time Period)

This controls the maximum age of stories that Meerkat displays. It takes a number, followed by: MINUTE, HOUR, DAY, or ALL. The number is optional (and meaningless) when choosing ALL. The default setting is 1HOUR, so you must set this parameter to get anything older.

Example: http://meerkat.oreillynet.com/?t=7DAY means "show me stories from the past seven days."

p= (Profile)

This displays the stories in the manner chosen by a set profile within Meerkat. You only need to pass it the numerical ID of an existing profile.

Example: http://meerkat.oreillynet.com/?p=563 shows "all stories caught by profile number 563 (the O'Reilly Network)."

m= (Mob)

Very similar to the p= parameter, but displaying stories associated with a particular mob. You pass it the numerical ID of the mob in question.

Example: http://meerkat.oreillynet.com/?m=123 gets you "stories grouped under mob number 123."

i= (ID)

This parameter displays a particular story. Each item in the Meerkat database is assigned a numerical ID. If you know the number, you can point directly at the story. (To find it, go to the web interface, and hold your mouse over the mob icon (ring of dots) to see the story's ID.)

Example: http://meerkat.oreillynet.com/?i=456 will display only story number 456.

So far, we've seen all the parameters needed to filter exactly which stories to display. If you've been adventurous, you will have found that retrieving the URL query in a browser gives you a fancy HTML result with lots of Meerkat logos and links to the rest of the O'Reilly Network site. These are pretty, and provide much good reading, but they are of little use to someone wanting to grab an RSS feed or another type of output.

Happily, Meerkat introduces the concept of flavors. By setting the parameter _fl to a certain string, you get results back in various ways:

_fl=meerkat

The default setting, providing the full bells and whistles of the Meerkat page.

_fl=tofeerkat

A lighter version of the Meerkat page.

_fl=minimal

A very light version of the Meerkat page.

_fl=rss

Provides the results as a simple RSS 0.91 feed.

_fl=rss10

Provides the results as an RSS 1.0 feed.

_fl=xml

Provides the results in a bespoke XML format.

_fl=js

Provides the results in a JavaScript file, which, when parsed, displays the results in an XHTML format.

_fl=php

Provides the results in a PHP-serialized string.

So, now not only can we query the Meerkat database of feeds, but we can also get feedback out again. It gets better. Meerkat offers finer control over exactly what it produces with some Boolean switches (0 = off, 1 = on) that turn various output features on or off:

_de= (Descriptions)

Turns on or off story descriptions or blurbs. You lose some of the story detail but gain a compact display for easy scanning.

Example: http://meerkat.oreillynet.com/?_de=0 means "without descriptions."

_ca= (Categories)

Meerkat places each feed into a category hierarchy of its own, and certain flavors display this. If you don't want to use these for anything, you can turn them off.

Example: http://meerkat.oreillynet.com/?_ca=0 means "no categorization."

_ch= (Channels)

Turns the channel display on or off for the flavors that care about it.

Example: http://meerkat.oreillynet.com/?_ch=0 means "turn off the display of channels."

_da= (Dates)

Turns on or off the display of the date Meerkat first saw the story.

Example: http://meerkat.oreillynet.com/?_da=0 means "dates? I don't need no dates."

_dc= ( Dublin Core Metadata)

The RSS 1.0 flavor contains mod_dc information. You can remove this information with this parameter.

Example: http://meerkat.oreillynet.com/?_dc=0 means "plain and simple, DC-free is for me."

So, you're asking, how do we stick all these together to make something cool? Well, we separate the parameters with an & character.

For example, a query that produces an RSS 1.0 feed of the keyword search for "Ben" for stories up to a week old looks like this:

http://meerkat.oreillynet.com/?s=Ben&_fl=rss10&t=1WEEK

Whereas a query for anything on Java in the past hour, in RSS 1.0 but without Dublin Core Metadata or Categories, looks like this:

http://meerkat.oreillynet.com/?s=Java&_fl=rss10&_dc=0&_ca=0
    [ Team LiB ] Previous Section Next Section