IceDragon ([info]icedraco) wrote,
@ 2007-12-02 20:10:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
Current location:AirForce Base 6
Current mood: annoyed
Current music:None
Entry tags:irc, thoughts

IRSeeK SeeKs to compromise our privacy
I have recently found out of a new service that seems to have started itself on a wrong foot and may endanger the privacy of many IRC networks and communities. Introducing IRSeeK:

IRC (Internet Relay Chat) remains one of the most active platforms for sharing knowledge and collaborating on the Internet. www.IRSeek.com (still in Beta) strives to make this hidden gem available to the entire Internet community. By constantly archiving thousands of active, highly-focused, public chat-rooms in a wide variety of topics (e.g. Linux, soccer, Christianity, poker, business and others) then indexing, processing and publishing the content on the web using advanced Web 2.0 technologies while maintaining the privacy of the users, we are creating a knowledge base different from any other.


This seems like a nice idea otherwise, doesn't it? An information search engine that searches IRC conversations for potentially useful information. However, the methods these people use in order to gather information is far from promising a friendly service such as the Google search engine for instance. This from the Freenode staffblog:

The logging bots primarily connect through tor, seem to have no distinguishing characteristics that we can identify, and so far the company has not been willing to remove them voluntarily.

It doesn't take a rocket scientist to understand the difference between services like Google or SearchIRC and IRSeeK:
  • Google tries to index content by crawling the web for publicly accessible links, it also follows a global standard for web-crawlers by requesting a robots.txt file which then tells it where it can go and which places it must stay out of. If certain content you don't want to be indexed already was, you can still request its removal - something which didn't seem to work with Freenode and IRSeeK... Google also provides means to clearly identify its bots through the headers it transmits to the web server - same for other public search engines.

  • SearchIRC is an IRC network/channel indexing service that is opt-in based - it maintains a list of IRC networks with their publicly marked channels and various other useful information, providing statistics for the said networks, etc. You can opt-in and opt-out from this service all you want and their statistics bots identify themselves accordingly as well as come from their official host rather than random proxy, so we know when do they connect and what do they do.

  • From the little knowledge I've managed to gather about IRSeeK's activities, it seems to have already started acquiring information from certain places and according to the staffblog on Freenode, their bots use Tor proxies and randomize their names, making the bots essentially hard to distinguish and remove from channels if IRSeeK's logging is not welcome there. These bots cannot be banned individually if they are not welcome on the network, making their actions hostile! There is no opt-in or opt-out there either and you can't tell these bots where to go and what can they index/archive - they will sneak in and attempt to log everything you say on a channel whether you want it or not. While their activity has already started, there is absolutely no information on IRSeeK's site regarding the bots, their identification and means to get rid of them in case and they are not welcome. Similar tactics are commonly used by hostile IRC users to evade bans, spy on channels or cause other sorts of trouble to places they are not welcome in.

From here a question - how exactly do B & C (the company that powers IRSeeK) plan on "maintaining the privacy of the users" if what they already do is violating the said privacy? Afterall, these bots are not going to read the rules of every single network/channel and they will log data regardless of the possible "No logging" rule! And unlike in web crawlers, there is no robots.txt on IRC - there is no standard built to keep one's privacy intact from external logging. So much for the privacy.

Overall, I think that indexing conversations on IRC for the purpose of information/resources is not the best idea out there simply because IRC means Internet Relay Chat - it's a medium for general conversations among people and not a content management system. Forums would be a lot better source of information if you were ever to search for a solution to something. Think of it - would you rather google up on a certain issue and find a relevant forum that explains the problem in detail, or would you prefer searching through hundreds of lines in IRC logs that, besides possible solution, also have chatter, channel events and other random stuff people say in between? The only resource I'd see this being used for is snooping on what people say behind someone's back, spying on activities of certain channels from outside (good for people who are banned and can't get in to listen in themselves), surveillance, stalking or other, not very appropriate things.

Channel and network administrators including Freenode and our QuickFox network already took means to block IRSeeK's bots from connecting and I suggest other networks to do the same, at least until IRSeeK take the initiative to really honor our privacy and network regulations instead of putting empty claims on their website and doing the opposite.

Relevant websites
  1. IRSeeK.com - Home of the IRSeeK service.

  2. Freenode staffblog - Related to Freenode issues with IRSeeK.

  3. Will IRSeeK have a chilling effect on IRC chat? - Article on TechCrunch about IRSeeK.



Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…