Next generation search engines

Yesterday, I attended a conference on document management, organised by the Belgian and Dutch document management organization : 2006 Update.


Prof Joost Duflou from the University of Leuven gave an interesting presentation on next generation search engines.  First, documents are scanned for relevant words.  Then, these are stemmed : controlled, controlling, controllers, … is converted to control.  Then a weight is applied to these stemmed terms, based on the number of occurrences of the term in this document and the number of documents where the term is not used.  Mathematically, the documents are now converted into vectors in a vector space.  Using regular vector mathematics, the distance between two documents can be calculated very easily.  Documents which are "close" are also similar. 

Queries now run on distance between documents.  Documents close to the query are relevant hits. 

Users get a profile based on published and read documents.  This profile is treated similarly, so also people become vectors.  And with simple mathematics, documents and people become similar objects one can search for.  And the difference between tacit and explicit knowledge is no more.


This algorithm is used now by the TINK search product of ICMS.  Hans van Heghe, CEO of ICMS wrote a book on how to deal with information.


I have not read the book yet, but I ordered a copy and will test the product one of these days.


The champagne and the music at the end of the conference where also interesting




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s