Sentiment analysis – a report on a Text Analytics Meetup, London

I’m currently working on the scope of the 2nd edition of ‘Enterprise Search’ and am aware of the very blurred boundary between search and text analytics. Sentiment analysis is a good example. A search application could locate references to specific products in a repository of call centre transcripts but would not be able to indicate whether over time customers were more positive about the performance of the product, to use a very simplistic example. This is where sentiment analysis (sometimes referred to as ‘opinion mining’) has a very important role to play.

Yesterday over 80 attendees listened to two excellent presentations on sentiment analysis at a meeting of the London Text Analytics Meetup Group, sponsored by UXLabs. After excellent refreshments courtesy of the Financial Times, the first speaker was Despo Georgiou, currently a Business Consultant at Atos SE. Her recent MSc dissertation at City University was an examination of the value of two commercial sentiment analysis applications (Semantria and TheySay) and two non-commercial applications (Google Prediction API and WEKA) in analysing documents from the health care sector. This was a very good general introduction on how to conduct application assessments.

She was followed by Dr. Diana Maynard, a Research Fellow in the Computer Science department of Sheffield University. Diane talked mainly about sentiment analysis applied to social media but in a concluding Q&A session proved able to provide a tremendous amount of insight into all aspects of sentiment analysis. Do take a look at Diane’s blog!

For me it was a very valuable meeting and put a lot of fragmented knowledge I have of this area into context. It was the 13th meeting of the Text Analytics Meetup but the first I had attended as a member, and I am looking forward to future meetings. If you want to know more about sentiment analysis a good place to start is to download both the short book on Sentiment Analysis and Opinion Mining by Bing Liu and a recent paper by Eric Cambria on New Avenues in Opinion Mining and Sentiment Analysis.

Martin White

At last, a UK intranet conference

Back in April I blogged about the lack of an independent intranet conference in the UK. Today I agreed to sponsor Intranet Now, styled as an un/conference for intranet and comms managers, which takes place at the Radisson Blu Hotel in Portman Square, London on 2 September. When I set up Intranet Focus Ltd in 1999 I made a conscious decision not to sponsor anything, so the very fact that I have changed my mind after 15 years of business should give you some idea of the importance I attach to the launch of this conference.

The concept was developed by Wedge Black and was almost immediately supported by Brian Lamb. They have acquired a stellar array of both sponsors and speakers and all they need now is a stellar number of delegates. The lunch sponsor is Interact Intranet, which is running its own event in October, so a special mention to Nigel and the team at Interact. Igloo are another lead sponsor. The agenda is a mix of 20 minute papers and 5 minute lightning papers together with some self-organised discussion sessions. Take a look at the agenda to get a full sense of what is on offer. It should be a memorable and stimulating day, and you should leave with a host of good ideas and a significantly larger network.

It is important to realise that Wedge and Brian are one-man businesses, so they have already made a very significant investment in the event. Now is the time for the intranet and internal comms communities to match it by turning up to the event. This  is an intranet conference developed by people who understand intranets, with speakers who have built and managed intranets and a programme full of case studies and lessons learned. What more can you ask for?

The early bird price is just £60 and that offer lasts until 25 July. After that the price is £120. You can book on Eventbrite and there is a Lanyrd page as well. Even if your organisation feels unable to find the ticket price think of it as just over £2 a week (£1 if you are quick!) self-investment in your career for a year. Unbeatable value. Please make the decision to book as quickly as you can and match the commitment of Wedge and Brian to the UK intranet community.

Martin White


Gartner Magic Quadrant 2014 for Enterprise Search

For ten years in my somewhat diversified career I managed large IT market analysis teams, initially for IDC and then for Logica. Preparing vendor analyses was a core element of the work of the teams and we knew that subscribers would read every word and every line between the lines. One highlight was flying in HP’s corporate jet to present an independent view of HP’s server markets to meetings in Lausanne and then Cannes. Those were the days! So I have some idea of the challenges faced by Whit Andrews and Hanns Koehler-Kruener in developing the Gartner MQ for Enterprise Search.

Miles Kehoe has already published an assessment of the report highlighting some of the apparent inconsistencies in the report and I would broadly concur with his comments. Criticisms of the MQ are usually levelled both at the selection of vendors and the comments about the vendors. I was certainly very surprised to see IHS listed when Funnelback was relegated to a passing mention. The exclusion of SharePoint on the basis that it is not sold as an independent product is understandable but skews the analysis as my guess would be that the combined installed base of SharePoint search must make it the most widely adopted of all search applications.  However it is important to remember that Gartner clients are able to talk to Whit and Hanns about their views on the market and that what is released is just a summary of the main outcomes of their research. It must have been a greater challenge than usual this year as open-source applications grow rapidly in sophistication and adoption level.

From my perspective as an independent search consultant the MQ does enable me to have good discussions with clients as all the major vendors are set out in the document and I can add value to the Strengths and Cautions analysis based on my own experience. It also enables me to start discussions about what clients actually want from a search application, and what trade-offs will be acceptable. Sadly the Real Story Group has discontinued its reports on search software and Forrester seems to be ignoring enterprise search, certainly from a vendor comparison viewpoint. Fortunately Dave Schubmehl and his colleagues at IDC track the search and content analytics markets but as with Gartner the reports come with a significant price tag if you are not a subscriber. My overall view is that the search community should be grateful to Gartner for (in effect) releasing the report through some its clients. (I downloaded it from Coveo – thank you!)  It is a good starting point for discussion, and Garter would be the first to emphasise that the published version of the MQ should not be seen as the sum of all the knowledge the consultancy has about this sector.

Martin White

Digital workplaces have suppliers and customers – building digital bridges

Much of what I read about digital workplaces seems to make the assumption that if an organisation reaches the upper levels of digital workplace maturity, especially in terms of enterprise social network adoption, then business and employee performance will be transformed. This is what I see as a  “sceptred isle” approach to digital workplace strategy. In reality every digital workplace depends on building excellent relationships with suppliers and customers and facilitating the flow of information along the entire supply line. A recent column in Forbes by Rawn Shah sums up the importance of these relationships in just 500 words. The column is entitled ‘Building Bridges Beyond Your Corporate Collaboration Island’ and should be pinned to the desktop of anyone engaged in digital workplace strategy and implementation. The second section of the column considers what IBM are doing with Connections to make building these bridges far less of an IT nightmare.

There are also two related reports from Accenture on this topic. ‘Making Cross-Enterprise Collaboration Work’ was published in 2012. In the opinion of the authors of the report, to drive a new era of growth companies will increasingly be required to collaborate with enterprises outside their corporate boundaries. They go on to say that doing so successfully requires coordinated attention to a range of human capital strategy issues covering talent, leadership, culture and organization. The requirement to work with the entire supply chain is also the topic of a recently released Accenture report on the need to link big data analytics.

Many (though by no means enough) organisations conduct employee engagement surveys, and increasingly use these as one of the metrics in assessing the benefits of digital workplace deployment. I would suggest that these surveys need to be extended to the supply chain to see if the organisation is one that others can do business with. It could be that the organisation is far too focused on building a internal collaborative environment which is very difficult for suppliers and customers to take full advantage of. Certainly a digital workplace strategy needs to identify core suppliers and customers and ensure that there are good channels of communication, and even joint application development, to ensure that cross-enterprise collaboration is as effective as it needs to be.

Martin White

An information security manager perspective on search

Varonis is in the business of data and information management application software and over the last few years has made a significant investment in producing surveys and briefing papers on a wide range of information management issues. The company has recently released a short but well written survey of the attitudes of the information security community to enterprise search, based on 300 responses from attendees at two information security events, one in the USA and one in the UK.Overall the level of adoption was only 30%, with a further 8% planning to make an investment in search.

The survey provides a novel security-management focus to the adoption and use of enterprise search. Nearly 70% of the respondents were concerned that users would find information that they did not have permission to see. 30% were concerned that file permissions were not adequate to support enterprise-wide search and a further 30% were not sure. I will admit to not having seen information security as a potential barrier to enterprise search adoption but the results of this survey have caused me to change my mind! Information security is rightly right at the top of IT management issues and the evidence from this survey is that security concerns could trump business advantage when it comes to considering a search investment.

When asked the reasons for not having enterprise search 28% felt it was too expensive, 15% felt it was too difficult to deploy and amazingly 15% felt that native search (surely not Windows?) was good enough. To me this seems to be quite a low level of both expectation and understanding of search in the information security community.

There is a fairly narrow focus to this survey but the initiative of Varonis in undertaking and publishing the research is most welcome because of the insights provided into search from an information security perspective

Martin White


Information governance vision and reality at AIIM Forum London

Judging by the level of attendance (probably around 200 delegates) at the AIIM Forum in London on 25 June information governance is a very hot topic. Indeed what was scheduled as a roundtable discussion hosted by IBM with spaces for 20 participants attracted an audience of over 70! Although attendance was free, thanks to a wide range of sponsors, delegates still had to justify a day away from their desks. In total there were over 20 papers in three tracks, with a mixture of case studies, roundtable discussions and three plenary presentations. Two of these were given by John Mancini (AIIM President) and Doug Miles (Director, AIIM Market Intelligence) on future directions in information governance and information management. The third was given by Urs Raas (HP Autonomy) who talked about the future of ECM without turning it into a sales pitch for HP. All three were very well presented.

Two other presentations stood out. The IBM “standing-up roundtable” was managed with great skill by Roger Johnston, Information Lifecycle Governance lead at IBM Europe. He was the only speaker of the day that I heard focus on the business value of information governance. In the course of the session he asked for quite a number of show-of-hands responses to core questions, and I was dismayed by the low level of adoption of information governance at an organisation-wide level even though this was the core role of most of the participants. The fundamental problem seems to be that CEOs still do not understand the importance of information governance.

I was also very impressed by the presentation on the way in which the BBC has adopted collaboration and mobile applications. This was given by Mark Kelleher, Head of Business and Mobile Systems Delivery at the BBC. He and his team had to find secure solutions to support 20,000 employees and a constantly changing group of 15,000 contractors. Clearly the BBC is well on the way to being a digital workplace with total location-independence of information access.

Overall it was a very interesting day of presentations and discussions with exhibitors. However the event did show up that information governance and information management are not really synonyms. The focus was mainly on acquisition, storage and records management, with making effective use of information assets somewhat lower on the agenda. AIIM should be congratulated in running a well-organised event at no cost to attendees and for its commitment to the cause of effective information management.

Martin White

Knowledge Quotient – a new corporate information management benchmark from IDC

Every year a great many surveys are published about various aspects of information management but most are quite small scale. Drawing generic conclusions from them can be a risky business. So when I started to read the latest report from David Schubmehl and his colleagues at IDC I was immediately impressed by the fact that the survey covered 2155 organisations in 6 countries. The report is entitled The Knowledge Quotient: Unlocking the Hidden Value of Information Using Search and Analytics. The aim of the project was to understand the current state of information access using a matrix of measurements of process capability, technology, the sharing and reuse of information and the extent to which the organisation sees information as an asset. From the survey IDC identified about 10% of the survey cohort to be in the 90th percentile, and defined these as KQ Leaders. The results of the survey are then segmented into KQ Leaders and Others.

Looking at the outcomes I was tempted to call the second group the KQ Losers, as their scores are substantially poorer to the point of wondering how they manage without information. To take just the metric of high satisfaction with intranet search, the Leaders were at 90% and the Others at just over 10%. I’m in the right business! Again 72% of the Leaders (against just 25% of the Others) cited unstructured information access and analysis as very important for revenue growth. One of the themes of the report is a growing requirement for unified information access across multiple repositories and applications.

At the end of the report IDC report on the time spend by knowledge working on gathering information. Currently this is 16% of the working week and almost half of this work is wasted because the relevant information cannot be found. Over the years IDC have published a number of reports on this topic (directed by Sue Feldman) and looking back at the 2003 report it seems we have not made any significant improvements in finding information. This matches similar results for the period from 2008 to 2014 from the Digital Workplace Trends report. The technology has certainly improved so in my view that only leaves organisational commitment to information management as the cause of the problem. Perhaps IDC can do some cross-tabbing to find out.

The research was sponsored by 9 technology vendors. Of these Coveo and Lexalytics clearly wanted to be very visible in the report and in my opinion the case studies disrupt the flow of the report. Less is sometimes more.  However I would also acknowledge that research on this scale is expensive and we should be grateful for the vendor support of the project. I downloaded my copy from Coveo. Overall there is a wealth of related information in the report and it will provide quantitative data for my presentations and reports for quite some time. I hope that in due course the IDC team will provide some vertical sector analyses of the survey data.

Martin White

Making metadata work – a report on a ISKOUK seminar

Making metadata work was the theme of a day of workshops and presentations organised by ISKOUK in London on 23 June with the support of the Information Retrieval Specialist Group of the BCS and the Dublin Core Metadata Initiative.  I was unable to attend either of the workshops but did have the honour of starting off the afternoon session of six papers on various aspects of metadata management. My theme was the value of metadata in improving search performance, with particular reference to the importance of date metadata in enterprise search and the need to place metadata within an information management strategy. I was followed by Sean Bechhofer (University of Manchester) talking about the wonderfully named Wf4Ever project which is developing a metadata framework for ‘research objects’ arising from the process of scientific research and publication. The issue of data provenance was (deservedly) given a lot of prominence. Next up was Professor Mark Sandler (Queen Mary University London)  talking about the challenges of tagging music recordings with metadata, a project that QMUL are working on with Goldsmiths College, London.

After a break Richard Ranft (British Library) presented an overview of the work being undertaken to tag the 8 million sound recordings held in the BL’s Sound Archive. A European perspective was provided by Antoine Issac, a member of the team developing the Europeana service which acts as a portal to a wide range of museum and other specialised collections in the EU. The final paper was given by Dr Andy MacFarlane on the Photobrief project being undertaken by the Centre for Interactive Systems Research at City University.

There is no doubt that metadata is critically important in effective information discovery but that the intellectual challenges in developing robust but extensible schemas are immense, especially when dealing with non-text objects. One of the issues that was mentioned in passing but probably needs a seminar of its own was how people adding content to a repository could tag the content in the most consistent way without adding significantly to the overall process time. I’ve seen many CMS products where the focus is on the end-user experience with seemingly little concern about the way in which contributors can be supported in the tagging process. Overall it was an excellent event and ISKOUK should be congratulated on putting together an excellent programme that attracted around 80 delegates.

Martin White

The three clicks rule – it was about bandwidth not navigation

There are some advantages about being in at the start of the Internet and building applications based on videotex technology. Along the way I had the pleasure of meeting up with Dr Peter Cochrane, who from 1994-1999 was Head of Research for British Telecom (which developed the videotext service) and went on to be Chief Technologist for BT prior to his retirement in 2000. Peter was, and indeed still is, a charismatic public speaker and in the late 1990′s was a very frequent speaker at conferences. I still remember having dinner with him after a conference I had organised at which he ranged widely and with tremendous insight over every branch of technology.

So what is the connection between Peter and the ‘three clicks rule’? Today I was alerted via Twitter to a blog post about the fact that the three clicks rule was not helpful in designing web navigation. This came as no surprise to me because the three clicks rule was proposed by Peter in around 1998/1999 and had nothing whatsoever to do with navigation structure optimisation. Peter was a telecoms engineer and his concern was over bandwidth latency as the use of the Internet increased but was still largely running over copper and not optical fibre. It was typically taking 15-20 seconds for a page to load in response to a click and so three clicks might take a minute to complete. If you think this is unbelievable read this paper from 2004 on tolerable waiting times for web users!

His concern was that unless there was a significant network capacity upgrade world wide the delay for a user even needing as few as three clicks would be such that they would give up the task, and that as a result the Web might never fulfil its potential.  He wrote this around 1998

“Who would like a three click, one second, no handbook world? Drill down to anything you want in three clicks of a mouse, and it appears on your screen in under a second! No need to read a handbook, no training – just the application of intuition – an obvious and easy to use interface for everyone. The only prospect of realising this dream relies on ‘end-to-end’ optical fibre and significant improvements in network and computer protocols, interfaces and software.”

This is just to put the origin of the three clicks rule on the current record and hopefully to stop people wasting their time trying to prove or disprove a half-remembered proposition.

Martin White

The role of computational linguistics in making search work

It is not easy to understand how search works. The usual explanation is along the lines of crawling content, creating an index and then matching the query term against the index. If only it was that simple. In reality a great deal of processing is going on in the background, much of it using techniques from computational linguistics. These techniques enable search applications to extract entities from text, interpret natural language queries, generate summaries and enable computers to analysis the structure of words and sentences up to the point of semantic analysis. These are all ‘wicked problems’ but are essential to any search application. Computational linguistics emerged from early work in the 1960s on machine translation. There is a good account of the history and applications of the science in the Stanford Encyclopedia of Philosophy

The main reason why I’m blogging on this topic is that I am often told that there are no development taking place in search. Certainly some of the core principles date back to the 1960s and 1970s but under the surface there is a considerable amount of research being undertaken in both information retrieval and in computational linguistics. Because the problems that need to be solved are complex it takes time to find solutions. Indeed sometimes a research paper can do no more than clearly state the problem. Good examples include recent papers on deciding how manage queries including a reference to a percentage and parsing models for identifying multiword phrases.

Both these papers were published in the journal Computational Linguistics. This is an open access journal so copies of all the paper published can be downloaded at no cost. It is well worth browsing through some of the back issues to get a sense of the scope and scale of computational linguistics research, and the challenges in ‘understanding’ language in a way that it can be processes by a computer.  The rapid adoption of open source search solutions is very likely to reduce the time taken from research to implementation so some of the techniques described in the papers could be available to developers in months rather than years. Even when they do emerge onto the desktop it may be very difficult to spot exactly what has changed behind the scenes but overall search performance and satisfaction are likely to increase quite substantially over the next few years as a result of computational linguistics.

Martin White