Search-based Applications

From conversations I have had with IT managers there is a widely held assumption that enterprise search is now about as good as it is ever going to get.  Although the basic principles of text search date back almost 50 years the reality is that there is an immense amount of research and development in information retrieval being undertaken at the present moment. For example, coming to a desktop near you before too long will be search tools based on topic modelling, which use Bayesian statistics and machine learning to infer the relationship between topics in a document.  What is fascinating about this technique is that it dates back to the development of latent semantic indexing in the late 1980s, which was then refined into probabilistic latent semantic indexing a decade later.  Now the buzz is about Latent Dirichlet Allocation (LDA), which itself has formed the basis for correlated topic models (CTM) and dynamic topic modelling (DTM).

These are techniques for text search, the traditional domain of enterprise search, but now search-based applications (SBA) are being developed. SBA technology can be used to intelligently aggregate large volumes of unstructured data (like Web pages) and structured data (like database content), and to make that data available in a highly contextual, quasi real-time manner. Some years ago I heard a memorable presentation from an IDC analyst working for Sue Feldman (a leading thinker about SBAs) which positioned enterprise search as the future information integration platform for the enterprise. SBAs will be able to fulfil that vision.

Gregory Greffenstette and Laura Wilber work for Exalead, but with Search-based Applications have written a book that is admirably free of any sales pitch for the company. In just over 100 pages they discuss in detail the basic principles of searching databases, highlighting recent advances that make search-based applications possible. There are three good case studies at the end of the book, and a final chapter looking at the future of these applications with respect to the semantic and mobile webs. There is an excellent bibilography of over 70 references which in itself illustrates the level of R&D in search technology and its applications.

Search technology is complex, but the authors have managed to present the technology in a very understandable way, so that both IT managers and business managers can appreciate the value that search-based applications can bring to their organisations.  This is one of a very good series of books from Morgan and Claypool on search-related topics.  I am however surprised that any publisher can offer a book that has no index.  Overall this is an inexpensive but very valuable resource on a technology that is already available and has the potential to have a substantial impact on the organisational management of information.

Martin White