The Future of Big Data in Healthcare

[bibshow file=futurebigdatahealthcare.bib sort=firstauthor order=asc]medical record
In the United States and other countries, healthcare service providers are rapidly adopting Electronic Health Records (EHRs). These records are real-time, digital versions of patients’ charts. They provide a variety of patient-centered information such as medical history, diagnoses, and medications. In 2014, 83% of physicians in the United States utilized EHRs, compared to just 42% in 2004 [bibcite key=heisey2015any]. A number of other countries already have better adoption of the technology, with over 90% of healthcare service providers in Australia, the Netherlands, New Zealand, Norway and the United Kingdom having adopted EHRs in 2012 [bibcite key=schoen2012survey].

Additionally, through progress in healthcare and technology, and the ensuing societal changes, people are becoming more conscious of their health and actively seek to better manage their health and lifestyle [bibcite key=lymberis2003smart]. With improvements in computer technology, mobile health monitoring devices are available; both through dedicated devices and through “apps” on personal mobile phones.

Most importantly, these developments are generating extensive datasets of previously unattainable proportions. As a result, hospital-centered healthcare is changing to patient-centered care. Where the goal formerly was to find general treatments for ailments, the goal is now shifting towards early detection of risks, prevention of further complications, and improvement of treatments through patient feedback. Though reducing costs should not be a primary motivator in healthcare, it has been estimated that usage of big data methods in healthcare could cause a 12% reduction of the baseline United States’ healthcare costs [bibcite key=kayyali2013big]. In this essay we will look at leading-edge developments related to big data in healthcare. Recognizing those developments, we will consider which near-future accomplishments can be expected and we will reflect on the desirability thereof.

Knowledge Dissemination

Currently, the rich EHR datasets are widely used as an archive and not as a central means to improve healthcare efficiency. Healthcare researchers see that the application of big data analytics is an opportunity to gain and disseminate new knowledge [bibcite key=health2015health]. Importantly, physicians indicate they struggle to keep their knowledge of the latest developments in clinical practice up to date; the great number of publications makes it difficult for physicians to read all relevant studies, even if only in their own specializations.

Practictioners cannot keep up with the great number of research articles published

By utilizing the EHR datasets in an analogous manner as commercial services have used their datasets (e.g., sending recommendations to customers based on comparable customers’ purchases), a likely achievement will be the development of treatment recommendation systems. By using such a system, physicians could receive recommendations based on treatments employed by their colleagues for the same ailment in similar patients, thereby implicitly sharing knowledge [bibcite key=murdoch2013inevitable]. A major hurdle in developing such a system is the ability to measure similarity between patients; it is not necessarily clear which factors can influence treatment outcomes. Nonetheless, even before similarity measures have been developed thoroughly, healthcare providers could benefit greatly from such systems in the near future.

Personalized Care

Treatments for ailments are generally devised through randomized controlled trials. However, some groups of people are underrepresented in such trials [bibcite key=unger2011randomized,konrat2012underrepresentation]. Prescribing treatments based on results on similar patients is important: some drugs that work for some people might be less effective or dangerous for others. For example, the results of a treatment on a woman might differ based on whether the woman is pregnant or not. To indiscriminately prescribe the same treatment with no regard for differences in efficacies or risks is undesirable. Yet, the differences in treatment efficacies go largely unnoticed when only basing treatments on randomized controlled trials, especially for minority groups. Relatedly, it is often unclear which factors put a person at risk of developing a certain ailment.

Observational data about people at risk for certain ailments and the results of performed treatments are naturally available through EHR databases. Sadly, the research by [bibcite key=neuvirth2011toward], though successful, is one of only few examples of machine learning applications for personalized healthcare. Due to the success of such approaches and because of the spread of EHRs, it is likely that more such approaches will be developed over the coming years. Nevertheless, it is unlikely that we will see systems providing actionable results in the near future.


The Fitbit; an activity tracker

A large number of deaths of patients is caused by medical errors [bibcite key=kohn2000err], and, though likely a slight overstatement, medical errors are sometimes blamed as being the fifth leading cause of death [bibcite key=hayward2001estimating]. Medical errors are often caused by drug interactions and wrong diagnoses, due to erroneous or incomplete information [bibcite key=kohn2000err]. The information required to confidently diagnose and treat a patient could be made available through monitoring devices and persistent patient medical information services (e.g., EHR records) [bibcite key=varshney2007pervasive]. Pervasive monitoring devices with persistent connections to medical facilities will allow for real-time monitoring of patients. Additionally, by employing methods such as Lock In Feedback monitoring devices could facilitate finding optimal dosages in treatments based on rapid quantitative and qualitative measurements [bibcite key=kaptein2015lock]. When a patient unexpectedly enters critical health conditions, facilities can be notified immediately and can respond as deemed appropriate.

Such devices already see limited use, for example as wearable blood pressure and activity monitors. It is likely that more devices will be released with greater functionality in the near future. For example, the research group My Movez is developing a mobile phone application with which they plan to monitor a broad array of children’s physical play activities (

Difficulties and Concerns

The datasets in healthcare are large and difficult to manage [bibcite key=raghupathi2014big]. A hurdle in taking advantage of EHRs is elegantly captured by the “80 percent rule”, which states that 80% of business-related information exists in unstructured form; this is most likely also true for medical information. Information about patients is often written in natural language. As such, the development of natural language processing approaches to extract useful information from these documents is a necessary step to improve system performances [bibcite key=murff2011automated]. Additionally, due to the sensitive nature of medical data, records are often partially censored, thus further complicating usage [bibcite key=neuvirth2011toward].

Medical information about people is privacy sensitive, and the data stored by healthcare organizations are appealing targets for criminals. Data protection by design is regarded as an important part of data-driven systems, and should be seen as a system requirement [bibcite key=hoepman2014privacy]. Because the developments outlined above require several parties to have access to patient data, transmitting data is a necessity. The privacy concerns in transmitting such data are severe [bibcite key=danezis2015privacy]. For example, in the Netherlands patients could not be granted access to their personal EHRs through the internet because of inadequate protection offered by the national identity management platform, even though it was already used for providing access to other sensitive information [bibcite key=jacobs2008beveiligingseisen]. These concerns are intensified when considering health monitoring devices that provide constant supervision and tracking [bibcite key=li2015adoption].

Besides privacy concerns, a difficulty for the usefulness of health monitoring devices is the willingness of patients to (learn to) use those devices. As such, it is important to understand the needs of patients and to design the products with a patient-centered approach. With the elderly being perhaps most reluctant to use new technologies to improve health, it is reassuring to learn that the elderly are generally ready to use new technologies when it facilitates independent living [bibcite key=mikkonen2002user].


In this essay we have looked at recent developments in healthcare and looked at which developments we can expect in regard to prevention, diagnosis and treatment. The EHRs that are finding increasing use facilitate many big data approaches to healthcare. We can expect the development of methods to find new knowledge from these large datasets, as well as means to disseminate knowledge more effectively to practitioners. With the development of pervasive monitoring devices, the amount of medical errors can be reduced and response times to critical health conditions improved. Through adopting and developing these methods to take advantage of increased data availability, physicians can expect to more efficiently provide treatments for their patients.

Predominantly, these developments will allow for a paradigm shift in healthcare; moving from a hospital-centered approach developing treatments for a non-existent “average patient” to a patient-centered approach by taking into account personal factors when determining optimal treatments.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.