IPsoft recently started working with NHS Digital, national information and technology partner to the health and social care system in England, to develop a conversational data concierge known as “ViDA” (or Virtual Data Assistant). ViDA is based on our industry-leading cognitive agent Amelia; users around the world can search through NHS Digital data via an intuitive conversational interface that they can access directly from their web browser.
ViDA has just passed the half-way point in a 12-week beta test pilot phase. You can find out more about our partnership here, but in this blog post we wanted to explore the process of developing this ambitious project, and how we addressed the challenges we encountered along the way.
An Always-Available Data Concierge
Data within the NHS helps to design and deliver products and services across a wide range of clinical and operational activities.
As part of their commitment to digital innovation and customer service, NHS Digital wanted to explore the use of alternative channels of interaction, such as cognitive virtual agents, to provide better and faster access to data and information on services.
One of the key responsibilities of NHS Digital is the collection, management and dissemination of data and information from the NHS in England. This includes data captured in hospitals, GP surgeries, mental health services and many other settings. More than 287 additional publications are added to the site annually, and many of these are published in open access format. These are used by researchers, journalists, NHS employees, charities and many other stakeholders.
NHS Digital operates a standard website built on top of a content management system (CMS), which allows users to independently search for data and service information. However, feedback suggests that users find it difficult to locate what they need and many people end up calling the NHS customer contact centre to resolve their inquiry. NHS Digital wanted to build a conversational assistant to make this process more efficient and the user experience more fulfilling.
Let’s review the scope of the project. ViDA was designed to help users find data and information through an intuitive conversational interface. Specifically, the system is designed to assist users in three main areas:
- Locating relevant data sets: NHS Digital publishes many data sources, but they are not always readily available. ViDA helps users locate data sources quickly. For example, if a user asks “Can I see any mental health stats on hospital admissions by region?” ViDA will provide links to relevant resources such as Mental Health Services Monthly Statistics.
- Answering users’ questions: ViDA isn’t merely a conversational FAQ. When possible, ViDA also suggests interesting key facts that may be relevant to user’s enquiries. For example, if a user asks “How many people in England are diagnosed with dementia?” and an answer is provided, ViDA may also offer a key fact such as: “Of the 650 thousand people aged over 65 estimated to have dementia as of 28 February 2019, 67.9% have a coded dementia diagnosis [Recorded Dementia Diagnoses, 2019].”
- Learning about data access processes: ViDA can explain how customers can obtain bespoke or tailored data extracts through its range of data access services. For example, if a user asks “I don’t think you publish what I need, can the analysts run a special report for me?” ViDA will lead them through a process of locating the service they need.
In addition to these three main use cases, there are a range of FAQs and standard responses provided for situations where a user might ask about other NHS Digital services, or the NHS either generally or specifically.
Finding the Signal in the Noise
Intent recognition refers to ViDA’s ability to take a user’s input (e.g., “Can I update my personal NHS data here?”) and map it to the most appropriate conversational response. In this example, ViDA would map the input to a FAQ response about managing personal NHS data. This functionality is at the heart of the solution.
In most Artificial Intelligence (AI) projects, machine learning (ML) algorithms that power this functionality are generated from dozens or in some cases hundreds of sample training user inputs. When it came to designing ViDA, there were two unique challenges a traditional ML approach might not be able to address:
- A high degree of overlap and contiguity between various user inputs. ViDA can address similar-sounding inputs with the correct responses. For example, consider the overlap in these two different inquires: “Do you have info about dementia patients on antipsychotics?” and “Do you know the number of dementia patients on antipsychotics?” Both utterances share many of the same words and in much the same order, but the users are seeking different information. This makes it significantly harder to disambiguate between them using a traditional ML approach.
- An almost limitless variety of potential user inputs. Visitors looking for information through NHS Digital may use a variety of terms when presenting questions – from the relatively straightforward term “hospital” to complex medical terms like “decompressive craniectomy.” Attempting to train a ML model with such a vast array of terms would take probably tens of thousands of training data inputs.
To address both challenges, the development team decided to take a slightly unconventional approach to training ViDA. Essentially, it created a training data set which focuses more on the structural shape of a user’s question (thereby addressing the first challenge) by removing all the “substance” words which could occur in practically any intent (addressing the second one) and replacing them with random character “placeholder” (xx) string tokens. Coupled with this is an extensive use of key terminology data stored in structured “grammars” which allows ViDA to fill in placeholders with actual values where appropriate.
At go-live, the intent recognition rate of the ViDA solution was computed in excess of 95%. After exposure to real-world users’ inputs for the last five weeks, it is running at times up to 89%, and improving each week as we further refine and calibrate the model. This is a very typical pattern and in line with other implementations. Our aim is to achieve 90%+ by the end of the 12-week pilot program.
Designing the User Experience
The first challenge in designing the user experience was finding appropriate ways to manage user expectations. The team quickly discerned that some users would want to use the conversational interface as a de facto search engine, others would expect the “machine” to know answers to every NHS data question under the sun, and there would always be a small group determined to use the solution to deal with their own personal health and care issues.
Part of the answer to address this challenge was very clear signaling – from the launch page onwards – of the scope and limitations of ViDA. This is designed to set the correct expectations up front, so that users are not unduly disappointed when ViDA cannot help them to check the opening times of the nearest pharmacist or give them a potted history of their own medical record.
Beyond setting expectations, it was essential to build flexibility into typical user interactions. This means providing some “guiderails” to help steer the conversation using prompts, buttons and chat notes, but also allowing free form input (text or speech) throughout the solution. We used a modular, looped conversation design which provides a reasonable balance between fixed and dynamic routes.
The last major challenge with this deployment related to the quality of the metadata — the data about the data which currently exists. ViDA requires very extensive metadata, so consequently NHS Digital’s metadata had to be developed to meet ViDA’s requirements. As this is a pilot implementation — designed to test acceptability and get some useful feedback – it was agreed that the CMS would not be amended at this stage.
Instead, the team created an intermediate database to define and manage the necessary metadata – topics, data types, hierarchies of terms, typical synonyms and published facts – drawn from NHS Digital publications via web scraping tools with additional curation by NHS analysts. Once the pilot is complete, the team will have some excellent insights on any substantive changes which might be considered for future CMS developments and integration to ViDA.
One final data challenge, which relates to the metadata issue, was presenting results to users in order of priority/relevance. The current implementation is built on two core principles: 1) Order the results in terms of the similarity between the implied metadata tags in users’ requests and the structured tags stored in the database, and 2) Sort the results in terms of a natural language match between the users’ questions or requests, and the stored product name and description. By combining both aspects, it should be possible for ViDA to present the right data to NHS Digital customers in most cases.
We’re very excited about our project with NHS Digital and hope that we can be a powerful bridge between users and data. The project is now live and open to all, and we welcome any feedback.