The Future of Big Data in Healthcare

medical record
In the United States and other countries, healthcare service providers are rapidly adopting Electronic Health Records (EHRs). These records are real-time, digital versions of patients’ charts. They provide a variety of patient-centered information such as medical history, diagnoses, and medications. In 2014, 83% of physicians in the United States utilized EHRs, compared to just 42% in 2004 [4]. A number of other countries already have better adoption of the technology, with over 90% of healthcare service providers in Australia, the Netherlands, New Zealand, Norway and the United Kingdom having adopted EHRs in 2012 [18].

Additionally, through progress in healthcare and technology, and the ensuing societal changes, people are becoming more conscious of their health and actively seek to better manage their health and lifestyle [12]. With improvements in computer technology, mobile health monitoring devices are available; both through dedicated devices and through “apps” on personal mobile phones.

Most importantly, these developments are generating extensive datasets of previously unattainable proportions. As a result, hospital-centered healthcare is changing to patient-centered care. Where the goal formerly was to find general treatments for ailments, the goal is now shifting towards early detection of risks, prevention of further complications, and improvement of treatments through patient feedback. Though reducing costs should not be a primary motivator in healthcare, it has been estimated that usage of big data methods in healthcare could cause a 12% reduction of the baseline United States’ healthcare costs [8]. In this essay we will look at leading-edge developments related to big data in healthcare. Recognizing those developments, we will consider which near-future accomplishments can be expected and we will reflect on the desirability thereof.

Knowledge Dissemination

Currently, the rich EHR datasets are widely used as an archive and not as a central means to improve healthcare efficiency. Healthcare researchers see that the application of big data analytics is an opportunity to gain and disseminate new knowledge [3]. Importantly, physicians indicate they struggle to keep their knowledge of the latest developments in clinical practice up to date; the great number of publications makes it difficult for physicians to read all relevant studies, even if only in their own specializations.

Practictioners cannot keep up with the great number of research articles published

By utilizing the EHR datasets in an analogous manner as commercial services have used their datasets (e.g., sending recommendations to customers based on comparable customers’ purchases), a likely achievement will be the development of treatment recommendation systems. By using such a system, physicians could receive recommendations based on treatments employed by their colleagues for the same ailment in similar patients, thereby implicitly sharing knowledge [14]. A major hurdle in developing such a system is the ability to measure similarity between patients; it is not necessarily clear which factors can influence treatment outcomes. Nonetheless, even before similarity measures have been developed thoroughly, healthcare providers could benefit greatly from such systems in the near future.

Personalized Care

Treatments for ailments are generally devised through randomized controlled trials. However, some groups of people are underrepresented in such trials [19, 10]. Prescribing treatments based on results on similar patients is important: some drugs that work for some people might be less effective or dangerous for others. For example, the results of a treatment on a woman might differ based on whether the woman is pregnant or not. To indiscriminately prescribe the same treatment with no regard for differences in efficacies or risks is undesirable. Yet, the differences in treatment efficacies go largely unnoticed when only basing treatments on randomized controlled trials, especially for minority groups. Relatedly, it is often unclear which factors put a person at risk of developing a certain ailment.

Observational data about people at risk for certain ailments and the results of performed treatments are naturally available through EHR databases. Sadly, the research by [16], though successful, is one of only few examples of machine learning applications for personalized healthcare. Due to the success of such approaches and because of the spread of EHRs, it is likely that more such approaches will be developed over the coming years. Nevertheless, it is unlikely that we will see systems providing actionable results in the near future.


The Fitbit; an activity tracker

A large number of deaths of patients is caused by medical errors [9], and, though likely a slight overstatement, medical errors are sometimes blamed as being the fifth leading cause of death [2]. Medical errors are often caused by drug interactions and wrong diagnoses, due to erroneous or incomplete information [9]. The information required to confidently diagnose and treat a patient could be made available through monitoring devices and persistent patient medical information services (e.g., EHR records) [20]. Pervasive monitoring devices with persistent connections to medical facilities will allow for real-time monitoring of patients. Additionally, by employing methods such as Lock In Feedback monitoring devices could facilitate finding optimal dosages in treatments based on rapid quantitative and qualitative measurements [7]. When a patient unexpectedly enters critical health conditions, facilities can be notified immediately and can respond as deemed appropriate.

Such devices already see limited use, for example as wearable blood pressure and activity monitors. It is likely that more devices will be released with greater functionality in the near future. For example, the research group My Movez is developing a mobile phone application with which they plan to monitor a broad array of children’s physical play activities (

Difficulties and Concerns

The datasets in healthcare are large and difficult to manage [17]. A hurdle in taking advantage of EHRs is elegantly captured by the “80 percent rule”, which states that 80% of business-related information exists in unstructured form; this is most likely also true for medical information. Information about patients is often written in natural language. As such, the development of natural language processing approaches to extract useful information from these documents is a necessary step to improve system performances [15]. Additionally, due to the sensitive nature of medical data, records are often partially censored, thus further complicating usage [16].

Medical information about people is privacy sensitive, and the data stored by healthcare organizations are appealing targets for criminals. Data protection by design is regarded as an important part of data-driven systems, and should be seen as a system requirement [5]. Because the developments outlined above require several parties to have access to patient data, transmitting data is a necessity. The privacy concerns in transmitting such data are severe [1]. For example, in the Netherlands patients could not be granted access to their personal EHRs through the internet because of inadequate protection offered by the national identity management platform, even though it was already used for providing access to other sensitive information [6]. These concerns are intensified when considering health monitoring devices that provide constant supervision and tracking [11].

Besides privacy concerns, a difficulty for the usefulness of health monitoring devices is the willingness of patients to (learn to) use those devices. As such, it is important to understand the needs of patients and to design the products with a patient-centered approach. With the elderly being perhaps most reluctant to use new technologies to improve health, it is reassuring to learn that the elderly are generally ready to use new technologies when it facilitates independent living [13].


In this essay we have looked at recent developments in healthcare and looked at which developments we can expect in regard to prevention, diagnosis and treatment. The EHRs that are finding increasing use facilitate many big data approaches to healthcare. We can expect the development of methods to find new knowledge from these large datasets, as well as means to disseminate knowledge more effectively to practitioners. With the development of pervasive monitoring devices, the amount of medical errors can be reduced and response times to critical health conditions improved. Through adopting and developing these methods to take advantage of increased data availability, physicians can expect to more efficiently provide treatments for their patients.

Predominantly, these developments will allow for a paradigm shift in healthcare; moving from a hospital-centered approach developing treatments for a non-existent “average patient” to a patient-centered approach by taking into account personal factors when determining optimal treatments.

[1] G. Danezis, J. Domingo-Ferrer, M. Hansen, J. Hoepman, D. L. Metayer, R. Tirtea, and S. Schiffner, “Privacy and data protection by design-from policy to engineering,” Arxiv preprint arxiv:1501.03726, p. 27–31, 2015.
title={Privacy and Data Protection by Design-from policy to engineering},
author={Danezis, George and Domingo-Ferrer, Josep and Hansen, Marit and Hoepman, Jaap-Henk and Metayer, Daniel Le and Tirtea, Rodica and Schiffner, Stefan},
journal={arXiv preprint arXiv:1501.03726},
[2] R. A. Hayward and T. P. Hofer, “Estimating hospital deaths due to medical errors: preventability is in the eye of the reviewer,” Jama, vol. 286, iss. 4, p. 415–420, 2001.
title={Estimating hospital deaths due to medical errors: preventability is in the eye of the reviewer},
author={Hayward, Rodney A and Hofer, Timothy P},
publisher={American Medical Association}
[3] Health IT Policy Committee, “Health big data recommendations,” Health IT, 2015.
title={Health Big Data Recommendations},
author={{Health IT Policy Committee}},
journal={{Health IT}},
[4] D. Heisey-Grove and V. Patel, “Any, certified, and basic: quantifying physician ehr adoption through 2014,” Onc data brief, vol. 28, p. 1–10, 2015.
title={Any, Certified, and Basic: Quantifying Physician EHR Adoption through 2014},
author={Heisey-Grove, Dustin and Patel, Vaishali},
journal={ONC Data Brief},
[5] J. Hoepman, “Privacy design strategies,” in Ict systems security and privacy protection, Springer, 2014, p. 446–459.
title={Privacy design strategies},
author={Hoepman, Jaap-Henk},
booktitle={ICT Systems Security and Privacy Protection},
[6] B. P. F. Jacobs, S. Nouwt, A. de Bruijn, O. Vermeulen, R. van der Knaap, and C. de Bie, “Beveiligingeisen ten aanzien van identificatie en authenticatie voor toegang zorgconsument tot het elektronisch patiëntendossier (EPD),” , p. 1–85, 2008.
title={Beveiligingeisen ten aanzien van identificatie en authenticatie voor toegang zorgconsument tot het Elektronisch Patiëntendossier {(EPD)}},
author={Jacobs, B.P.F. and Nouwt, S. and Bruijn, A. de and Vermeulen, O. and Knaap, R. van der and Bie, C. de},
publisher={Radboud Universiteit Nijmegen}
[7] M. Kaptein and D. Ianuzzi, “Lock in feedback in sequential experiment,” Arxiv preprint arxiv:1502.00598, 2015.
title={Lock in Feedback in Sequential Experiment},
author={Kaptein, Maurits and Ianuzzi, Davide},
journal={arXiv preprint arXiv:1502.00598},
[8] B. Kayyali, D. Knott, and S. Van Kuiken, “The big-data revolution in us health care: accelerating value and innovation,” Mc kinsey & company, 2013.
title={The big-data revolution in US health care: Accelerating value and innovation},
author={Kayyali, Basel and Knott, David and Van Kuiken, Steve},
journal={Mc Kinsey \& Company},
[9] L. T. Kohn, J. M. Corrigan, M. S. Donaldson, and others, To err is human: building a safer health system, National Academies Press, 2000, vol. 6.
title={To err is human: building a Safer Health System},
author={Kohn, Linda T and Corrigan, Janet M and Donaldson, Molla S and others},
publisher={National Academies Press}
[10] C. Konrat, I. Boutron, L. Trinquart, G. Auleley, P. Ricordeau, and P. Ravaud, “Underrepresentation of elderly people in randomised controlled trials. the example of trials of 4 widely prescribed drugs,” Plos one, vol. 7, iss. 3, p. e33559, 2012.
title={Underrepresentation of elderly people in randomised controlled trials. The example of trials of 4 widely prescribed drugs},
author={Konrat, C{\'e}cile and Boutron, Isabelle and Trinquart, Ludovic and Auleley, Guy-Robert and Ricordeau, Philippe and Ravaud, Philippe},
journal={PLoS One},
publisher={Public Library of Science}
[11] H. Li, J. Wu, L. Liu, and Q. Li, “Adoption of big data analytics in healthcare: the efficiency and privacy,” in Proceedings of 19th pacific asia conference on information systems, singapore, 2015.
title={Adoption of big data analytics in healthcare: The efficiency and privacy},
author={Li, He and Wu, Jing and Liu, Ling and Li, Qing},
booktitle={Proceedings of 19th Pacific Asia Conference on Information Systems, Singapore},
[12] A. Lymberis, “Smart wearables for remote health monitoring, from prevention to rehabilitation: current r&d, future challenges,” in Information technology applications in biomedicine, 2003. 4th international ieee embs special topic conference on, 2003, p. 272–275.
title={Smart wearables for remote health monitoring, from prevention to rehabilitation: current R\&D, future challenges},
author={Lymberis, A},
booktitle={Information Technology Applications in Biomedicine, 2003. 4th International IEEE EMBS Special Topic Conference on},
[13] M. Mikkonen, S. Va, V. Ikonen, M. Heikkila, and others, “User and concept studies as tools in developing mobile communication services for the elderly,” Personal and ubiquitous computing, vol. 6, iss. 2, p. 113–124, 2002.
title={User and concept studies as tools in developing mobile communication services for the elderly},
author={Mikkonen, Matti and Va, S and Ikonen, V and Heikkila, MO and others},
journal={Personal and ubiquitous computing},
[14] T. B. Murdoch and A. S. Detsky, “The inevitable application of big data to health care,” Jama, vol. 309, iss. 13, p. 1351–1352, 2013.
title={The inevitable application of big data to health care},
author={Murdoch, Travis B and Detsky, Allan S},
publisher={American Medical Association}
[15] H. J. Murff, F. FitzHenry, M. E. Matheny, N. Gentry, K. L. Kotter, K. Crimin, R. S. Dittus, A. K. Rosen, P. L. Elkin, S. H. Brown, and others, “Automated identification of postoperative complications within an electronic medical record using natural language processing,” Jama, vol. 306, iss. 8, p. 848–855, 2011.
title={Automated identification of postoperative complications within an electronic medical record using natural language processing},
author={Murff, Harvey J and FitzHenry, Fern and Matheny, Michael E and Gentry, Nancy and Kotter, Kristen L and Crimin, Kimberly and Dittus, Robert S and Rosen, Amy K and Elkin, Peter L and Brown, Steven H and others},
publisher={American Medical Association}
[16] H. Neuvirth, M. Ozery-Flato, J. Hu, J. Laserson, M. S. Kohn, S. Ebadollahi, and M. Rosen-Zvi, “Toward personalized care management of patients at risk: the diabetes case study,” in Proceedings of the 17th acm sigkdd international conference on knowledge discovery and data mining, 2011, p. 395–403.
title={Toward personalized care management of patients at risk: the diabetes case study},
author={Neuvirth, Hani and Ozery-Flato, Michal and Hu, Jianying and Laserson, Jonathan and Kohn, Martin S and Ebadollahi, Shahram and Rosen-Zvi, Michal},
booktitle={Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining},
[17] W. Raghupathi and V. Raghupathi, “Big data analytics in healthcare: promise and potential,” Health information science and systems, vol. 2, iss. 1, p. 3, 2014.
title={Big data analytics in healthcare: promise and potential},
author={Raghupathi, Wullianallur and Raghupathi, Viju},
journal={Health Information Science and Systems},
publisher={BioMed Central Ltd}
[18] C. Schoen, R. Osborn, D. Squires, M. Doty, P. Rasmussen, R. Pierson, and S. Applebaum, “A survey of primary care doctors in ten countries shows progress in use of health information technology, less in other areas,” Health affairs, vol. 31, iss. 12, p. 2805–2816, 2012.
title={A survey of primary care doctors in ten countries shows progress in use of health information technology, less in other areas},
author={Schoen, Cathy and Osborn, Robin and Squires, David and Doty, Michelle and Rasmussen, Petra and Pierson, Roz and Applebaum, Sandra},
journal={Health Affairs},
publisher={Health Affairs}
[19] A. Unger, R. Jagsch, H. Jones, A. Arria, H. Leitich, K. Rohrmeister, C. Aschauer, B. Winklbaur, A. Bäwert, and G. Fischer, “Randomized controlled trials in pregnancy: scientific and ethical aspects. exposure to different opioid medications during pregnancy in an intra-individual comparison,” Addiction, vol. 106, iss. 7, p. 1355–1362, 2011.
title={Randomized controlled trials in pregnancy: scientific and ethical aspects. Exposure to different opioid medications during pregnancy in an intra-individual comparison},
author={Unger, Annemarie and Jagsch, Reinhold and Jones, Hendree and Arria, Amelia and Leitich, Harald and Rohrmeister, Klaudia and Aschauer, Constantin and Winklbaur, Berndadette and B{\"a}wert, Andjela and Fischer, Gabriele},
publisher={Wiley Online Library}
[20] U. Varshney, “Pervasive healthcare and wireless health monitoring,” Mobile networks and applications, vol. 12, iss. 2-3, p. 113–127, 2007.
title={Pervasive healthcare and wireless health monitoring},
author={Varshney, Upkar},
journal={Mobile Networks and Applications},
publisher={Springer-Verlag New York, Inc.}

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.