How IdentiFight works

It's been struggling since getting a few prominent links this week, but IdentiFight's still able to search several sites (the ones that don't require login and don't limit searches). I should really add a queue so that it can do searches asynchronously rather than having everyone pile up at once.

Anyway, a dissection of how it works:

  1. An email address.
  2. Submit that email address to the search form (or API, if it's Flickr) of multiple sites (using curl_multi_exec). Login to those sites first if required to allow search by email address.
  3. Scrape the results pages: name, username, photo, profile link and a small amount of profile information.
  4. Submit all the profile links found to Google's Social Graph API, which returns any extra URLs linked from the original set using rel="me" (FOAF support to come later, apparently).
  5. Detect the most common full name and username from those results and use it to build links to search for more accounts using those names on other sites (LinkedIn and Google respectively).

Because some of the searches are disabled at the moment, I asked someone with a fairly full set of online accounts if he wouldn't mind being used as an example of what the results can look like:

IdentiFight results