September 2006 Archives

Classifying Web Search Results

--- Originally published on the Adaptive Path Blog ---

Search is a subject that I've always been interested in. Especially internal or enterprise search, within a site. Not web search like Google or Yahoo!. Sure there's lots of search engine optimization (SEO) or marketing (SEM) tricks you can do to improve your ranking in the web search engines. But that's never really held any fascination for me.

Enterprise search -- now that's fascinating! It's much easier to tune an enterprise search engine to make the results you want float to the top. (Assuming, of course, you have access to your IT department to make the changes you want.) Weighting of metadata is a simple way to do this. Tools like Verity or Vivisimo make categorization, "best bets," and other changes to results lists easy easier to do. Though I have to admit, the librarian in me is very skeptical of the promises that those companies make. I don't trust their auto-classification engines to do a job as good as a person could (or to do it in the time they say it takes). And I firmly believe that having someone to care and feed the classification/taxonomy/vocabulary/whatever-you-want-to-call-it is the best way to get good results.

Recently, I started looking into what is being called "vertical search." It's taking the approaches traditionally used on enterprise search (like classifying results) and applying it to the web at large. Folks like Kosmix and Clusty are leading the charge. This sounds a lot like what Northern Light (remember them?) was doing back in 1999 and 2000. However, unlike Northern Light, who used people to come up with their categories (the blue folders), Kosmix and Clusty are using complex algorithms to determine what the web pages are about. Kosmix, for example, focuses on a subset of the web (e.g., travel, health, politics) and subdivides the results into different categories.

Just like with the enterprise search engines, I'm a bit skeptical about this approach. The classification that they are doing isn't very sophisticated (they use categories like "basic information" or "blogs"), but it is certainly more helpful than a list of thousands of results ala Google results. It will be interesting to see where this goes. A hybrid approach using both algorithms and human-moderated categories seems like it would give the best results. Though I don't know of anyone really taking that kind of two-pronged approach. Do you?

About this Archive

This page is an archive of entries from September 2006 listed from newest to oldest.

August 2006 is the previous archive.

October 2006 is the next archive.

Find recent content on the main index or look in the archives to find all content.


Recent Comments

  • Laura Zucchetti: Speaking as someone who started out as a jack of read more
  • Kate: Hey Chiara: This is a wonderful post. I am so read more
  • Lisa Paul: Wow. This is great. I always try to shop proactively read more
  • Elisa Camahort Page: Thanks Chiara!! Hope we will see you in Chicago this read more
  • Chiara: Hi Conner- That's great that you are happy with the read more
  • Conner Versione: I used these guys for my website They did read more
  • Conner Versione: Conner, I used these guys to add a video to read more
  • Mamacita: Wow, I've been quoted! Thank you so much for those read more
  • scotter: Yes, I guess that anger hangover is right. read more
  • anger hangover: i've been blogging for a few years and i don't read more
Powered by Movable Type 5.04