Deep Text – deep insights on text analytics from Tom Reamy

This may seem odd but it was not until I started to write a short chapter on text analytics in the 2nd Edition of Enterprise Search that I realised how little I knew about the topic. Though to be honest that is a core justification for me writing books – I find out what I don’t know. In the case of text analytics Tom Reamy (KAPS) knows all there is to know and has presented it with great clarity in his new book Deep Text, recently published by Information Today Inc. When you are an academic writing books is part of the job description and often academics take sabbaticals to do the writing. In the case of independent consultants who need to make a living writing a book gets in the way of the day job to a very significant extent. Tom has been an advocate of text analytics for years, and on numerous occasions at conferences in the USA he has tried to get me to see the light. With this book he has succeeded quite brilliantly.

The value of this book is not in the descriptions of the technology but in the detailed presentation of how the technology can be used to gain business advantage. The 420 page book has 15 chapters in five sections. The sections are

  • Text analytics basics
  • Getting started in text analytics
  • Text analytics development
  • Text analytics applications
  • Enterprise text analytics as a platform

The writing style is very conversational, which I like because you feel Tom is alongside you trying to educate and enthuse you at the same time. The technology descriptions are quite good enough to understand how the analytics process works. That is sufficient for a book of this type where the focus is on educating potential business users about the value of text analytics and not acting as a reference work for computer science students. Along the way Tom tells stories about some of his clients (with due anonymity) and these help considerably in providing a context for his advice. I especially liked the two chapters on how to establish an enterprise analytics department, which mirrors my own views on enterprise search departments. At the end of each chapter there are some useful references to published case studies and books. The list of text analytics vendors is a little on the short side. I’m working on a major study of the text analytics market and have already found around 150 vendors. But a list that length also illustrates the scale of the text analytics business, with companies offering software and services often doubling their revenues each year. My only disappointment is the index, which is not up to the standard of the book.

Don’t think for one moment that because you are in ‘search’ you do not need to bother with ‘text analytics. Wrong. The overlap is significant and getting closer each year, as I commented recently in a CMSWIre column. Every search manager should have a copy of this book, because in many companies they may be first person who get asked about how to take advantage of text analytics. This book will give you all the answers and is the definitive book on the business possibilities of the technology. If you want to learn more about Tom and his work he was interviewed by Stephen Arnold a couple of years ago.

Martin White

Intranet Focus – 2016 so far so good!

It’s been an interesting few months with scarcely time to breath, and my blog and Twitter contributions have been minimal. Much of the first few months of the year was spent in working on an enterprise search strategy for a global law firm. An element of that was to specify the profile for the firm’s first full-time enterprise search manager, who is now in post. During April I had the very enjoyable experience of working on a proposal with Paul Corney and Chris Collison for a KM strategy project for a large UK-based not-for-profit organisation. The brief was poor and we spent some time coming up with a workable approach to the project, On 10 June  we went to present our approach to the organisation’s project team, an occasion notable for no relevant questions being asked of us. The indication was that they would select the contractor within a week or two. It was not until early August that we had a two sentence rejection letter. I feel sorry for the contractor that won the project! As far as we know all our competitors had the same letter, so I have to wonder if the project has been cancelled, perhaps as a result of reading the proposals. Prospective clients need to be aware that the consulting business in this sector is a small world. As there is more than enough work for everyone we do tend to talk to each other!

Later in June, in the course of our holiday, I found myself negotiating with one of the world’s leading pharma companies for an intranet assessment project. The project started two days after our return from vacation with an absolute requirement to complete in mid-August. No pressure then. Luckily I’d asked Sam Marshall to work with me, and we managed to exceed the client’s expectations for quality and quantity and still meet all the milestones. But this took every working day since mid-July. A challenging project because of the deadline but it was a pleasure to work with with the client and with Sam. The Intranet Focus business year ends in September and it will be the second best since I set the business up in 1999.

Now the in-tray is clear, and I’m in catch up mode. I have  good books to review from Tom Reamy, Agnes Molnar, Michael Sampson and Ryen White, and hope to have the reviews up by the end of the week. I’m also working on a study of the global text analytics market which is giving me valuable insights into the increasingly blurred boundary between text analytics and enterprise search but that’s a longer term project. Although still over a month away I’m looking forward to taking part in the Intranet Now event in London on 30 September. with 27 presentations packed into the day plus some exchange-of-experience sessions. My presentation will be a 7 minute history of intranets, which starts off in 1965! Both James Robertson and Michael Sampson are going to be in London in September and we have plans to meet up and catch up. Slightly further away I am looking forward to the Findwise Findability Day in Stockholm on 27 October. This year I am taking a year out from speaking so that I can listen and learn.

Now then, which book do I review first?

Martin White


Intranet Now Diamond Award 2016 – open for nominations

Having just written a history of intranets I am aware of how many people have made significant contributions to the development and adoption of intranets and yet never been recognized. If you have never heard of Jennifer Stone Gonzalez and Steve Tellen (for example) they will feature in my presentation at Intranet Now. It is a pity that they will not be in the audience.  Towards the end of the conference on 30 September the winner of the Intranet Now Diamond Award will be announced, and Sam Marshall will have the pleasure of handing over a very heavy (but ever so elegant!) chunk of glass.

The Intranet Now Diamond Award is unique in that it is awarded to an individual for their remarkable contribution to the community at large. Wedge and Brian are now seeking recommendations for the 2016 Award. There are of course many intranet managers who have made a significant impact on their organisation, often as a team of one.  However they are looking for someone who is committed to raising the awareness of good intranet practice amongst the wider intranet community in the UK. It’s not as if Wedge and Brian do not know potential candidates but they are firm believers in the wisdom of crowds so would like to know who you respect as an intranet guru.

Both Steve and Jennifer would have been worthy winners in 1993 and 1998, but that’s a story for my presentation.  So could you look through the list of the people you follow on Twitter and the blogs you monitor.  But please bear in mind Wedge and Brian are looking for someone who participates in our community and not just observes it. To me there is one obvious candidate…..I wonder if you agree?  Information on how to nominate someone is on the Intranet Now site.

Martin White

Recommind finds an open door at OpenText

The announcement that Recommind had sold out to OpenText does not seem to have raised much in the way of comment. Both companies go back some way into the time line of information discovery. OpenText started life at the University of Waterloo as the outcome of a project to digitize the Oxford English Dictionary. I worked on this project in 1984 when at Reed Publishing, as readers of the 1st edition of Enterprise Search will be aware. The company now has revenues of around $2billion and the presentations to the May 12 Investor Day make interesting reading, if only because the  word ‘search’ does not appear at all!  Along the way OpenText has acquired 53 companies, many of which at the time were positioned as the next best thing to happen to the market. RedDot and Vignette come to mind. In the distant past OpenText also acquired Basis through its purchase of Information Dimensions, which was a seriously-good search application developed by a team at Battelle in the early 1970s.

Recommind is also the outcome of a research project, in this case the development of Probabilistic Latent Semantic Analysis by Thomas Hofmann, though the original paper in 1999 referred to it as Probabilitistic Latent Semantic Indexing. It is is one of a number of probabilistic topic approaches to information discovery.  Although the technology is very different to that used by Autonomy there is a common interest in finding patterns in text which will go beyond what are often described as ‘simple keyword approaches’, even if ‘simple’ is substantial misnomer. In the new world of open source search Recommind is just about as proprietary as you can get, and that makes for some problems in trying to optimise performance.Where Recommind has made a particular mark on e-Discovery is in the area of predictive coding for the analysis of texts submitted in legal cases. This has been widely adopted in the USA and is now recognised in the UK, which could have been a catalyst for the acquisition given the importance of the UK market to OpenText. The e-Discovery market is highly competitive, with kCura, FTI Technology, Nuix (of Panama papers fame) Zylab and HP along side Recommind in the Leader quadrant of the 2015 Garner e-Discovery Magic Quadrant.

OpenText has acquired Recommind for $163 million, which at (as a guess) 20-times earnings puts the company at $8m earnings on $80 million revenues. For a $2B company this is not a big buy. For comparison OpenText acquired Vignette for $310 million in 2009. What happens next is anyone’s guess, and that probably goes for the sales teams at both OpenText and Recommind given the OpenText track record. Because it is a small unit ($80 million/$2 billion) I can’t see it being retained as a stand-alone unit post the closure of the transaction in 2017. Just how Recommind is going to fit into OpenText is not yet easy to work out, as Recommind has a range of information governance applications as well as the Decisiv search application. The key executives will stay around because they will have earn-out agreements but other staff may well be brushing up their cvs.  This could make like difficult for on-going support for Recommind clients as there is very little external expertise available from search system integrators. It will also be interesting to see what happens to the recent partnership between Recommind and BAInsight.

In an ideal world OpenText would be wise to capitalise on the innovations that pervade the Recommind technology and make wider use of it in other ECM applications. Somehow, based on the history of the 53 other acquisitions, I’m not going to hold my breath. Based on the Investor Day presentations ‘search’ is not a core element of OpenText strategy.

Martin White

Relevant Search – Doug Turnbull and John Berryman

The user requirement for a successful search is very easy to state. They want the items that are most relevant to their query to appear on the first page (or at worst the first two pages!) of results. Delivering this requirement is a far greater challenge than users and search managers imagine. The very fact that Relevant Search, written by Doug Turnbull and John Berryman, runs to over 330 pages gives an immediate illustration of the scope and scale of relevance management. I often use the metaphor of looking at an automobile engine. In principle we all know how the engine works but when it doesn’t work to perfection all we can do is look at the collection of modules and wires and wonder just what we have to do to restore the performance. That’s when an engineer with plug-in diagnostic equipment is essential. They can not only spot the problem but also know the systems well enough to sort out the problem.

The reason for presenting this metaphor is that the authors have written this book for relevance engineers. This to me is a new job profile but one that I can immediately relate to. The book presents all that a relevance engineer requires to understand how to go about improving relevancy, and this requires a good knowledge of information retrieval principles and also how these principles are best translated into software code. I should state up front that the examples in the book show code from Elasticsearch or Solr open source software but that should not be seen as limiting the book to open source implementation. Indeed seeing the code will help the reader understand what is going on in any enterprise search application. After all SharePoint 2010/2013 uses the same BM25 ranking model that is now in Lucence v6.

The eleven chapters in the book cover debugging a relevance problem, understanding the role of tokens, basic multi-field search, term-centric search, shaping the relevance function, providing relevance feedback, designing a relevance-focused search application, the relevance centered enterprise and advanced search techniques. There is no other book that I know of that manages to integrate both information retrieval and search management so successfully, with just enough IR fundamentals to show the origin of a relevance problem and the basis for a solution which can be expressed in code.  I especially value the way in which the examples are based on a ‘real’ collection of information, the Movie Database. Since we all have a familiarity with movies this for me makes the book come alive.

The quality of the content is not matched by the quality of the publishing format. This review is based on the e-book version. Although there is a list of sub-headings on the PDF version the lack of an index makes it almost impossible to dip into the book to find an explanation of a feature or a solution to problem. The writing style is very conversational but this results in a lot of words with apostrophes, often where they are not needed. Overall the copy editing is patchy.

I cannot recommend this book strongly enough. It is certainly not just for ‘developers’. Search managers, and of course relevance engineers, need to appreciate the fundamentals of search technology and good practice in relevance management even if they are working with commercial applications. Students on computer and information science courses will also find it of great value and hopefully be inspired to follow a career in relevance engineering.  All I missed was a consideration of relevance management in federated search implementations, but I’m sure that the authors are saving this for the next edition.

Martin White

Intranet Content Migration – a guide to good practice

Intranets and content management software (CMS) applications both have service lifetimes of probably 4-5 years although this can sometimes be extended with strong initial and ongoing implementation. Intranet teams will have the experience and expertise needed to develop an upgraded intranet on an existing CMS but will rarely have the experience to migration to a new CMS, especially where there is a requirement to introduce a new information architecture, to clean up the amount and quality of the content and perhaps implement a new search application. As a result planning and executing an intranet content migration project become a very considerable challenge.

Intranet Content Migration is co-authored with David Hobbs  a leading authority on website content migration. As far as we are aware this is the first briefing paper to be published specifically on intranet migration. We have set out to present what in our experience is good practice for intranet content migration based on some major projects we undertook individually and together in 2014 and 2015.  Although the principles are similar to website content migration there are a number of specific technical and governance challenges that need to be addressed. Particular attention is paid to the benefits of undertaking a comprehensive planning process ahead of the commencement of migration, focusing on a content inventory process that enables informed decisions to be made on the amount of content that needs to be migrated, and the extent to which this can be accomplished using content rules rather than a time-consuming inspection and migration of each content item.

Other topics covered in this Research Note are the importance of effective risk management, the need to work through the implications for the search application for the intranet, the requirement to have a well-designed and supported communications programme and the importance of deciding how the progress of migration will be reported. Appendices list a set of ten critical success factors and some additional resources on content migration.

Martin White

The Organisation in the Digital Age – 2016 survey now open

Much of my career has been in the B2B market research business, notably with International Data Corporation and then Logica. The IT sector has always been awash with research reports from vendors seeking to justify their market position and pricing, as well as many boutique companies offering high quality research in a small sector. The value of the IDC and Logica services was that each year they used the same core methodology to highlight trends in market growth over a five year period and yet included questions in the survey which took account of recent developments. It was hard work.

All the more remarkable then that this year Jane McConnell is working solo on the 10th of her annual surveys, which started out with intranets and now assess the extent to which organisations are making a commitment to working digitally, This year the survey for the Organisation in the Digital World report is in two parts.  The Core part (59 questions), streamlined from previous years, takes approximately 30 minutes. The optional Extended part (37 questions) is for organizations that want to do a deeper dive into their digital transformation. All participants receive a copy of the final report The Organization in the Digital Age 2016 (Core or Extended), as well as the Scorecard for their organization, which is optional and free.

The innovations this year are a customised snapshot report and sponsorship opportunities for research supporters. The snapshot report is available to organisation who are able to arrange for six or more people to complete the survey. They receive 3-page summary of the consolidated results providing a snapshot from different viewpoints: functions,  business lines, or countries depending on the role of the respondents. This year vendors, digital agencies, technology and service providers, and others can participate as a Research Supporter through a sponsorship package. This brings visibility in the report, and a chance to communicate their messages to a high-potential audience.

Although the benefits to organisations of having a global perspective on digital workplace adoption is significant I know that many organisations welcome the opportunity to use the survey as a means of bringing together their digital leaders to exchange views on how adoption is taking place in specific departments and divisions. Even if the team only spend a morning together to complete the survey the near-term and long-term benefits will be substantial. I have seen too many organisations in which digital innovations are its best kept secrets! The publication later this year of both this survey and the Findwise Findability Survey will once again provide us with dependable insights into the level of commitment to digital working that can be used in planning for 2017 and beyond.

Martin White


Defining and managing information quality

For the last three years I have been supporting major projects that involve content migration and enterprise search. A primary objective of both migration and search is ‘to improve information quality’ but in the projects I have been involved with little attention has been paid to defining the parameters or information quality and putting in place policies and processes to improve quality. The reason for not doing so is that the staff resources required are significant, and because there is no corporate commitment by the organisation to information quality it is all but impossible to gain the support required to at least start the journey towards information quality improvement. It is indeed a journey; there are no quick fixes.

In general organisations seem unaware of the significant amount of work that has been undertaken in to defining information quality standards and guidelines, dating back to pioneering work at MIT in the early 1990s that recognized information had to be fit for purpose and not just ‘accurate’. A very good resource on the development of information quality management is a book entitled The Philosophy of Information Quality, published by Springer in 2014. This book is a collection of contributions on all aspects of data and information quality edited by Luciano Floridi and Phyllis Illari. The quality of the contributions is very high but for some unaccountable reason there is no index to the book. Springer clearly does not have a commitment to information quality!  A similar book on Data and Information Quality is about to be published by Springer, and it will be interesting to see if an index is provided. There is an earlier book on Managing Information Quality from Springer which was published in 2006.

MIT remains at the heart of information quality management. It organises an annual conference, which in 2016 takes place in Spain from 22-23 June. The papers from previous conferences can be downloaded from the conference archive. The International Association for Information and Data Quality (IAIDQ) also organises an annual conference. It should be noted that in the context of work on information quality there is no differentiation between data and information though there are initiatives, notably around ISO 8000 – 2011 where the emphasis is on master data management. The Association for Computing Machinery (ACM) publishes the Journal of Data and Information Quality but access is limited to ACM members. A good overview of the challenges of managing information as an enterprise asset (pdf download) is provided by Nina Evans and James Price, based in Australia.

The purpose of this post is to summarise some of the resources that are available in the area of information quality management. As I have mentioned above there are no quick fixes but information professionals should certainly ensure that they are aware of the substantial amount of work that has been published and is currently being undertaken.

Martin White

Enterprise search management as a ‘wicked problem’

In 1973 Horst Wittel and Melvin Webber authored a paper entitled ‘Dilemmas in a General Theory of Planning’ (Policy Sciences 4 (1973), 155-169). In this paper they set out the basis for what they regarded as ‘wicked problems’, which were beyond the capacity of traditional methods to resolve. In particular wicked problems cannot be addressed by a linear project management methodology because of the multi-dimensional nature of the problems that need to be resolved. Over the last few years a design thinking approach has been used with some success. Design thinking in management is a creative process, in which after gathering information (often through ethnographic techniques) the manager approaches problems through imagining possible solutions, rather than analysing the existing issue reductively. A key element in resolving wicked problems is that the leader’s role is in asking questions in order to help define the complexity of the problem facing the organisation and create conditions for ‘collective responsibility’ in addressing it, rather than the traditional expectation that they will offer a solution.

All too often I find that organisations are treating enterprise search as a project. At the end of the project the team is dispersed and gradually whatever quality was there at launch gradually fades away. The complexity of the workflow between the content being indexed and then found is rarely appreciated. If it doesn’t meet requirements then it must be the technology! In my experience that is very rarely the case.

I have created a table that looks at enterprise search as a ‘wicked problem’  Looking at the 16 elements of a wicked problem shows that traditional waterfall or even agile project approaches are totally unsuited to enterprise search applications. The requirement is to work as a team across multiple elements of an enterprise search implementation with a leader who has the experience to challenge and then work with a team to resolve an element. Even then there is a high probability that not all the elements can be resolved, which is why enterprise search applications need to be well supported with a search team post a nominal implementation. Earlier this week I was talking with Darron Chapman at CBResourcing, one of the most experienced recruitment consultants in the information and knowledge management sectors here in the UK. We agreed that the demand for experienced search managers was well in excess of supply and that salary requirements were very much on the high side. Organisations are now recognising that enterprise search is indeed a wicked problem and there are just not enough people around to solve all the problems. That raises another problem – where can people get a thorough training in enterprise search that is vendor-neutral and covers both commercial and open source applications?

Martin White


Organisation culture – what do the ‘buzz words’ actually mean?

Organisations like to embroider their internal and external communications with statements about their corporate culture and direction. “Unparalleled expertise across our wide range of solutions” comes from Gartner, just as an example to hand. So just what does ‘unparalled expertise’ mean, and how might it translate into other languages? In French ‘une expertise inégalée’ is close but not a strict translation. Last year I was working with a company with its headquarters in London but major offices around Europe and Asia. A substantial acquisition had taken place a couple of years prior to my engagement, and now that the dust had settled the communications team had decided that it was time for a new corporate message to be promoted. The team decided that the core term was the ‘bold’ steps that the company was taking. I had occasion to speak to several senior directors in Germany who were very upset by this decision, as the English concept of bold does not have a single direct German equivalent. The German words fett, mutig, kühn, fettgedruckt, dreist, and verwegen are all close but meant slightly different concepts.

Things get more complicated in companies headquartered in countries which do not have English as the local language. Multinational companies often use English as a lingua franca (ELF) but when it comes to abstract concepts like ‘bold’, ‘leading edge’ and ‘visionary’ should the words emerge from a discussion in the HQ national language or through a discussion in ELF? I’ve just been reading a very interesting case study of how a Norwegian company set about defining its corporate values, taking into account that it had subsidiaries in 10 countries. One of these countries was China and the case study has some very interesting quotes from both Norwegian and Chinese managers about the issue of communicating corporate values.

Intranet managers in multi-national companies would do well to read this case study, as it has implications for the extent to which ELF corporate values guidelines need to be carefully translated into other languages. In the case of the Norwegian company translations were made into German and Chinese for local purposes, but not into Norwegian because the company wanted to make a statement about its adoption of ELF even in Norway. For example Norwegian managers were not allowed to exchange emails in Norwegian with Norwegian colleagues working in overseas subsidiaries.

A conclusion from the case study is that multi-national companies should not develop culture statements in English and then rely on a translation into other languages. There should be a discussion with people speaking all the national languages present in the company (many of which HQ may well not be aware of!) so that the words selected can be rendered in these languages in a way that supports, rather than possibly negates, the corporate direction. Even if there is a close translation the very fact that the decision on the values was made by people speaking English as their mother language may send the wrong signals to a linguistically diverse workforce.

Martin White