Classifying Web Search Results

--- Originally published on the Adaptive Path Blog ---

Search is a subject that I've always been interested in. Especially internal or enterprise search, within a site. Not web search like Google or Yahoo!. Sure there's lots of search engine optimization (SEO) or marketing (SEM) tricks you can do to improve your ranking in the web search engines. But that's never really held any fascination for me.

Enterprise search -- now that's fascinating! It's much easier to tune an enterprise search engine to make the results you want float to the top. (Assuming, of course, you have access to your IT department to make the changes you want.) Weighting of metadata is a simple way to do this. Tools like Verity or Vivisimo make categorization, "best bets," and other changes to results lists easy easier to do. Though I have to admit, the librarian in me is very skeptical of the promises that those companies make. I don't trust their auto-classification engines to do a job as good as a person could (or to do it in the time they say it takes). And I firmly believe that having someone to care and feed the classification/taxonomy/vocabulary/whatever-you-want-to-call-it is the best way to get good results.

Recently, I started looking into what is being called "vertical search." It's taking the approaches traditionally used on enterprise search (like classifying results) and applying it to the web at large. Folks like Kosmix and Clusty are leading the charge. This sounds a lot like what Northern Light (remember them?) was doing back in 1999 and 2000. However, unlike Northern Light, who used people to come up with their categories (the blue folders), Kosmix and Clusty are using complex algorithms to determine what the web pages are about. Kosmix, for example, focuses on a subset of the web (e.g., travel, health, politics) and subdivides the results into different categories.

Just like with the enterprise search engines, I'm a bit skeptical about this approach. The classification that they are doing isn't very sophisticated (they use categories like "basic information" or "blogs"), but it is certainly more helpful than a list of thousands of results ala Google results. It will be interesting to see where this goes. A hybrid approach using both algorithms and human-moderated categories seems like it would give the best results. Though I don't know of anyone really taking that kind of two-pronged approach. Do you?

About this Entry

This page contains a single entry by Chiara published on September 15, 2006 12:35 PM.

IA Summit - Call for proposals was the previous entry in this blog.

Pink for October is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.


Powered by Movable Type 5.04