Gavin and I launched twitterholic close to two years ago now (2 sxsw's ago). It was pre-twitter api (or at least, pre anything comprehensive), so I basically went cowboy on it and was scraping individual HTML pages. It took an hour or so to launch with our friend Alex Hillman helping us rip off the twitter css ;).
The site's been mentioned by gosh knows how many media outlets, including the likes of the New York Times. We're the #1 ranking on google for "twitter ranking" and a bunch of others. The site, even with its ghetto html and shoddy uptime, has done pretty well. We finally decided it was time to make the site, you know, actually work. Twitter was kind enough to whitelist us so we don't have to trash their servers scraping anymore (its all API driven now) and that means we can start doing far cooler things now.
Specific updates so far:
- The site should now keep up with images, urls, descriptions, screen names and all the things you're likely to change from time to time.
- The top 1000 users by followers, updates and friends will be crawled every day around midnight. We're aiming to crawl everyone every 10 days or so (throttled out of respect to Twitter's servers).
- Rankings by location. Right now this is largely based on what the user enters for their location with not much clean up. This will be enhanced slowly to include common abbreviations and slang terms for major metro areas.
- Enhanced user pages. We're building out more and more stats. Added user bio's, how long the user has been on twitter, their overall ranking in our database and their ranking locally. We'll be adding way, way more soon but we're keeping the details under wraps for now :)
- If someone accesses a user page (http://twitterholic.com/<username>/) that doesn't exist, we'll add it instantly. Basically, you can add yourself to our list of users to crawl if we haven't found you on the public timeline yet.


Comments...
(Page 1)1. I notice that the rankings for "Iowa City, IA" are different from the rankings for "Iowa City, Iowa." Could these be consolidated?
4:45PM on Dec 5th 2008 by Jordan Running
2. Yeah, and they will. The only state that works like that right now is Fl (I haven't deployed the new list yet, doing that now).
By tonight, city/state combinations should work reasonably well :)
8:12PM on Dec 6th 2008 by Alex Rudloff
3. - Adding pagination on top lists for top 1000.
- Cleaned up location listings with popular metro area translations (i.e. "San Francisco" also does "San Francisco, CA", etc.). Will probably build this out more and more as time goes and we see it, but for now, it's good enough.
- Added joined on date to user pages. Currently about 20% of the screen names we're tracking have this. Once it gets to 100%, we'll start doing more with this.
- Alex Hillman is going to refresh the css, which is totally awesome. Gavin and I like back end stuff. CSS? Not so much.
Much more to come. Thoughts, suggestions?
10:28PM on Dec 6th 2008 by Alex Rudloff
4. I love Twitterholic. But then again, I'm also Twitter-obsessed.
7:57PM on Dec 8th 2008 by akeorlando
5. Thank for sharing. I' gong to check it. I love Twitter.
4:03PM on Dec 9th 2008 by Marek Nowak
6. Is it at all possible to remove the name associated with my ranking? I am trying to maintain a semi-anonymous presence, but apparently my name is used on the twitterholic rankings.
2:26PM on Dec 13th 2008 by chickpea
7. Twitterholic uses the Twitter API. Remove your name from twitter, and it'll reflect on twitterholic eventually. We're aiming to try to crawl everyone every couple of weeks (and the top 2000 users or so every day).
Hope that sheds light on things :)
2:55PM on Dec 13th 2008 by Alex Rudloff
8. Thanks Alex, but my name is not on Twitter. Do you have any idea how my name was associated to my Twitter account?
Sorry for the questions, I just want to maintain some sort of semi-anonymous presence.
5:28PM on Dec 13th 2008 by chickpea
9. At some point in time, it must have been. It's simply not possible any other way. We call the api, take the xml feed and put it in the database. Not much more than a script that runs every few minutes, frankly.
It's possible you had it on there and then later removed it, and we haven't re-crawled you since the latest code changes. In that case, it'll take 10-14 days to clear out (we may have crawled you say, last September and never again with the old code).
5:38PM on Dec 13th 2008 by Alex Rudloff
10. Alex! I got a google alert for
Who has the most Followers on Twitter? (Top 100) (Omaha NE Edition ...
and here I am.. A very pleasant surprise...
I shall write in great depth to you, after the holidays, as I am want to know about the app/plugin in greater detail..
Have a good Xmas and a successful New Year!
Rgds..
1:43PM on Feb 12th 2009 by Lakshmi Mareddy