Additionally, through progress in healthcare and technology, and the ensuing societal changes, people are becoming more conscious of their health and actively seek to better manage their health and lifestyle [12]. With improvements in computer technology, mobile health monitoring devices are available; both through dedicated devices and through “apps” on personal mobile phones.
Most importantly, these developments are generating extensive datasets of previously unattainable proportions. As a result, hospital-centered healthcare is changing to patient-centered care. Where the goal formerly was to find general treatments for ailments, the goal is now shifting towards early detection of risks, prevention of further complications, and improvement of treatments through patient feedback. Though reducing costs should not be a primary motivator in healthcare, it has been estimated that usage of big data methods in healthcare could cause a 12% reduction of the baseline United States’ healthcare costs [8]. In this essay we will look at leading-edge developments related to big data in healthcare. Recognizing those developments, we will consider which near-future accomplishments can be expected and we will reflect on the desirability thereof.
Currently, the rich EHR datasets are widely used as an archive and not as a central means to improve healthcare efficiency. Healthcare researchers see that the application of big data analytics is an opportunity to gain and disseminate new knowledge [3]. Importantly, physicians indicate they struggle to keep their knowledge of the latest developments in clinical practice up to date; the great number of publications makes it difficult for physicians to read all relevant studies, even if only in their own specializations.
By utilizing the EHR datasets in an analogous manner as commercial services have used their datasets (e.g., sending recommendations to customers based on comparable customers’ purchases), a likely achievement will be the development of treatment recommendation systems. By using such a system, physicians could receive recommendations based on treatments employed by their colleagues for the same ailment in similar patients, thereby implicitly sharing knowledge [14]. A major hurdle in developing such a system is the ability to measure similarity between patients; it is not necessarily clear which factors can influence treatment outcomes. Nonetheless, even before similarity measures have been developed thoroughly, healthcare providers could benefit greatly from such systems in the near future.
Treatments for ailments are generally devised through randomized controlled trials. However, some groups of people are underrepresented in such trials [19, 10]. Prescribing treatments based on results on similar patients is important: some drugs that work for some people might be less effective or dangerous for others. For example, the results of a treatment on a woman might differ based on whether the woman is pregnant or not. To indiscriminately prescribe the same treatment with no regard for differences in efficacies or risks is undesirable. Yet, the differences in treatment efficacies go largely unnoticed when only basing treatments on randomized controlled trials, especially for minority groups. Relatedly, it is often unclear which factors put a person at risk of developing a certain ailment.
Observational data about people at risk for certain ailments and the results of performed treatments are naturally available through EHR databases. Sadly, the research by [16], though successful, is one of only few examples of machine learning applications for personalized healthcare. Due to the success of such approaches and because of the spread of EHRs, it is likely that more such approaches will be developed over the coming years. Nevertheless, it is unlikely that we will see systems providing actionable results in the near future.
A large number of deaths of patients is caused by medical errors [9], and, though likely a slight overstatement, medical errors are sometimes blamed as being the fifth leading cause of death [2]. Medical errors are often caused by drug interactions and wrong diagnoses, due to erroneous or incomplete information [9]. The information required to confidently diagnose and treat a patient could be made available through monitoring devices and persistent patient medical information services (e.g., EHR records) [20]. Pervasive monitoring devices with persistent connections to medical facilities will allow for real-time monitoring of patients. Additionally, by employing methods such as Lock In Feedback monitoring devices could facilitate finding optimal dosages in treatments based on rapid quantitative and qualitative measurements [7]. When a patient unexpectedly enters critical health conditions, facilities can be notified immediately and can respond as deemed appropriate.
Such devices already see limited use, for example as wearable blood pressure and activity monitors. It is likely that more devices will be released with greater functionality in the near future. For example, the research group My Movez is developing a mobile phone application with which they plan to monitor a broad array of children’s physical play activities (http://mymovez.socsci.ru.nl/en/).
The datasets in healthcare are large and difficult to manage [17]. A hurdle in taking advantage of EHRs is elegantly captured by the “80 percent rule”, which states that 80% of business-related information exists in unstructured form; this is most likely also true for medical information. Information about patients is often written in natural language. As such, the development of natural language processing approaches to extract useful information from these documents is a necessary step to improve system performances [15]. Additionally, due to the sensitive nature of medical data, records are often partially censored, thus further complicating usage [16].
Medical information about people is privacy sensitive, and the data stored by healthcare organizations are appealing targets for criminals. Data protection by design is regarded as an important part of data-driven systems, and should be seen as a system requirement [5]. Because the developments outlined above require several parties to have access to patient data, transmitting data is a necessity. The privacy concerns in transmitting such data are severe [1]. For example, in the Netherlands patients could not be granted access to their personal EHRs through the internet because of inadequate protection offered by the national identity management platform, even though it was already used for providing access to other sensitive information [6]. These concerns are intensified when considering health monitoring devices that provide constant supervision and tracking [11].
Besides privacy concerns, a difficulty for the usefulness of health monitoring devices is the willingness of patients to (learn to) use those devices. As such, it is important to understand the needs of patients and to design the products with a patient-centered approach. With the elderly being perhaps most reluctant to use new technologies to improve health, it is reassuring to learn that the elderly are generally ready to use new technologies when it facilitates independent living [13].
In this essay we have looked at recent developments in healthcare and looked at which developments we can expect in regard to prevention, diagnosis and treatment. The EHRs that are finding increasing use facilitate many big data approaches to healthcare. We can expect the development of methods to find new knowledge from these large datasets, as well as means to disseminate knowledge more effectively to practitioners. With the development of pervasive monitoring devices, the amount of medical errors can be reduced and response times to critical health conditions improved. Through adopting and developing these methods to take advantage of increased data availability, physicians can expect to more efficiently provide treatments for their patients.
Predominantly, these developments will allow for a paradigm shift in healthcare; moving from a hospital-centered approach developing treatments for a non-existent “average patient” to a patient-centered approach by taking into account personal factors when determining optimal treatments.
@article{danezis2015privacy,
title={Privacy and Data Protection by Design-from policy to engineering},
author={Danezis, George and Domingo-Ferrer, Josep and Hansen, Marit and Hoepman, Jaap-Henk and Metayer, Daniel Le and Tirtea, Rodica and Schiffner, Stefan},
journal={arXiv preprint arXiv:1501.03726},
pages={27--31},
year={2015}
}
@article{hayward2001estimating,
title={Estimating hospital deaths due to medical errors: preventability is in the eye of the reviewer},
author={Hayward, Rodney A and Hofer, Timothy P},
journal={Jama},
volume={286},
number={4},
pages={415--420},
year={2001},
publisher={American Medical Association}
}
@article{health2015health,
title={Health Big Data Recommendations},
author={{Health IT Policy Committee}},
journal={{Health IT}},
year={2015}
}
@article{heisey2015any,
title={Any, Certified, and Basic: Quantifying Physician EHR Adoption through 2014},
author={Heisey-Grove, Dustin and Patel, Vaishali},
journal={ONC Data Brief},
volume={28},
pages={1--10},
year={2015}
}
@incollection{hoepman2014privacy,
title={Privacy design strategies},
author={Hoepman, Jaap-Henk},
booktitle={ICT Systems Security and Privacy Protection},
pages={446--459},
year={2014},
publisher={Springer}
}
@article{jacobs2008beveiligingseisen,
title={Beveiligingeisen ten aanzien van identificatie en authenticatie voor toegang zorgconsument tot het Elektronisch Patiëntendossier {(EPD)}},
author={Jacobs, B.P.F. and Nouwt, S. and Bruijn, A. de and Vermeulen, O. and Knaap, R. van der and Bie, C. de},
pages={1--85},
year={2008},
publisher={Radboud Universiteit Nijmegen}
}
@article{kaptein2015lock,
title={Lock in Feedback in Sequential Experiment},
author={Kaptein, Maurits and Ianuzzi, Davide},
journal={arXiv preprint arXiv:1502.00598},
year={2015}
}
@article{kayyali2013big,
title={The big-data revolution in US health care: Accelerating value and innovation},
author={Kayyali, Basel and Knott, David and Van Kuiken, Steve},
journal={Mc Kinsey \& Company},
year={2013}
}
@book{kohn2000err,
title={To err is human: building a Safer Health System},
author={Kohn, Linda T and Corrigan, Janet M and Donaldson, Molla S and others},
volume={6},
year={2000},
publisher={National Academies Press}
}
@article{konrat2012underrepresentation,
title={Underrepresentation of elderly people in randomised controlled trials. The example of trials of 4 widely prescribed drugs},
author={Konrat, C{\'e}cile and Boutron, Isabelle and Trinquart, Ludovic and Auleley, Guy-Robert and Ricordeau, Philippe and Ravaud, Philippe},
journal={PLoS One},
volume={7},
number={3},
pages={e33559},
year={2012},
publisher={Public Library of Science}
}
@inproceedings{li2015adoption,
title={Adoption of big data analytics in healthcare: The efficiency and privacy},
author={Li, He and Wu, Jing and Liu, Ling and Li, Qing},
booktitle={Proceedings of 19th Pacific Asia Conference on Information Systems, Singapore},
year={2015}
}
@inproceedings{lymberis2003smart,
title={Smart wearables for remote health monitoring, from prevention to rehabilitation: current R\&D, future challenges},
author={Lymberis, A},
booktitle={Information Technology Applications in Biomedicine, 2003. 4th International IEEE EMBS Special Topic Conference on},
pages={272--275},
year={2003},
organization={IEEE}
}
@article{mikkonen2002user,
title={User and concept studies as tools in developing mobile communication services for the elderly},
author={Mikkonen, Matti and Va, S and Ikonen, V and Heikkila, MO and others},
journal={Personal and ubiquitous computing},
volume={6},
number={2},
pages={113--124},
year={2002},
publisher={Springer}
}
@article{murdoch2013inevitable,
title={The inevitable application of big data to health care},
author={Murdoch, Travis B and Detsky, Allan S},
journal={Jama},
volume={309},
number={13},
pages={1351--1352},
year={2013},
publisher={American Medical Association}
}
@article{murff2011automated,
title={Automated identification of postoperative complications within an electronic medical record using natural language processing},
author={Murff, Harvey J and FitzHenry, Fern and Matheny, Michael E and Gentry, Nancy and Kotter, Kristen L and Crimin, Kimberly and Dittus, Robert S and Rosen, Amy K and Elkin, Peter L and Brown, Steven H and others},
journal={Jama},
volume={306},
number={8},
pages={848--855},
year={2011},
publisher={American Medical Association}
}
@inproceedings{neuvirth2011toward,
title={Toward personalized care management of patients at risk: the diabetes case study},
author={Neuvirth, Hani and Ozery-Flato, Michal and Hu, Jianying and Laserson, Jonathan and Kohn, Martin S and Ebadollahi, Shahram and Rosen-Zvi, Michal},
booktitle={Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining},
pages={395--403},
year={2011},
organization={ACM}
}
@article{raghupathi2014big,
title={Big data analytics in healthcare: promise and potential},
author={Raghupathi, Wullianallur and Raghupathi, Viju},
journal={Health Information Science and Systems},
volume={2},
number={1},
pages={3},
year={2014},
publisher={BioMed Central Ltd}
}
@article{schoen2012survey,
title={A survey of primary care doctors in ten countries shows progress in use of health information technology, less in other areas},
author={Schoen, Cathy and Osborn, Robin and Squires, David and Doty, Michelle and Rasmussen, Petra and Pierson, Roz and Applebaum, Sandra},
journal={Health Affairs},
volume={31},
number={12},
pages={2805--2816},
year={2012},
publisher={Health Affairs}
}
@article{unger2011randomized,
title={Randomized controlled trials in pregnancy: scientific and ethical aspects. Exposure to different opioid medications during pregnancy in an intra-individual comparison},
author={Unger, Annemarie and Jagsch, Reinhold and Jones, Hendree and Arria, Amelia and Leitich, Harald and Rohrmeister, Klaudia and Aschauer, Constantin and Winklbaur, Berndadette and B{\"a}wert, Andjela and Fischer, Gabriele},
journal={Addiction},
volume={106},
number={7},
pages={1355--1362},
year={2011},
publisher={Wiley Online Library}
}
@article{varshney2007pervasive,
title={Pervasive healthcare and wireless health monitoring},
author={Varshney, Upkar},
journal={Mobile Networks and Applications},
volume={12},
number={2-3},
pages={113--127},
year={2007},
publisher={Springer-Verlag New York, Inc.}
}
A list is a data structure of a collection of elements in a certain order. A list can be defined in multiple ways. One way is to define a list as being an element (the “head” of the list) followed by another list (the “tail” of the list). Inductively, this could be formalized as follows:
Inductive list a := cons a list | nil.
This means that a list l with elements of type
ais either
cons x tail, with x of type
aand with tail of type
list a, or l is the empty list
nil. Let’s look at some example lists with elements of the whole numbers.
// Empty list; [] l = nil // List with one element; [1] l = cons 1 nil // Different list with one element; [2] l = cons 2 nil // List with two elements; [1,2] l = cons 1 (cons 2 nil) // List with three elements; [1,2,3] l = cons 1 (cons 2 (cons 3 nil))
Note that because we have lists of integers, following our definition the list l is of type
list integer.
Multiple list operations can be defined, such as append and reverse. We defined our list inductively, and so it would make sense to define these operations inductively (also known as recursively) as well. Because of our neat data structure and operations, we should then be able to prove that certain properties of the operations hold.
Let’s first define append. By appending two lists, we create a new list where the elements in the first list are followed by the elements in the second. This can easily be defined as follows:
Function append : (list a) (list a) -> (list a) append (nil) (k) = k. append (cons head tail) (k) = cons head (append tail k).
Thus, our function append is a function that takes two parameters of type list a and returns something of type list a. We defined it in two steps: if the first list is empty (nil), we return the second list. If the first list is not empty, we create a new list with the same head as the first, and with as tail the second list appended to the tail of the first list. Note that the parentheses around nil, k, and cons head tail in the above definition are optional; I added them for clarity. Let’s append the lists [1,2,3] and [4,5]:
append (cons 1 (cons 2 (cons 3 nil))) (cons 4 (cons 5 nil)) = cons 1 (append (cons 2 (cons 3 nil)) (cons 4 (cons 5 nil))) = cons 1 (cons 2 (append (cons 3 nil) (cons 4 (cons 5 nil)))) = cons 1 (cons 2 (cons 3 (append (nil) (cons 4 (cons 5 nil))))) = cons 1 (cons 2 (cons 3 (cons 4 (cons 5 nil)))).
We want to prove that the following properties of this function hold:
Lemma append_nil = forall (l : list a), append l nil = l. Lemma append_assoc = forall (k : list a, l : list a, m : list a), append (append k l) m = append k (append l m).
With the first lemma we state that the result of appending a list and an empty list is equal to the first list. With the second lemma we state that first appending lists k and l, and then appending m to the result is equal to appending the result of appending l and m to k.
Let’s prove the first lemma. We will prove that the lemma is true with induction on l.
First, consider l = nil. Thus,
append l nil = append nil nil // First rule of append: = nil = l.
Thus, for l = nil we have append l nil = l, and so the property holds.
Now, we assume that the property holds for the list tail. Thus, append tail nil = tail. Consider l = cons head tail.
append l nil = append (cons head tail) nil // Second rule of append: = cons head (append tail nil) // Inductive assumption: = cons head tail = l.
As such, if the property holds for tail we have append (cons head tail) nil = cons head tail for any arbitrary head. Thus, with structural induction on l, we have shown that the first lemma holds for all finite lists.
Now, we will prove the second lemma. We will again prove this by induction.
First, consider arbitrary l and m, with k = nil.
append (append k l) m = append (append nil l) m // First rule of append: = append l m // First rule of append (reversed): = append nil (append l m) = append k (append l m).
Thus, with k = nil the property holds. Now, we assume the property holds for some list tail. Thus, append (append tail l) m = append tail (append l m). Consider k = cons head tail.
append (append k l) m = append (append (cons head tail) l) m // Second rule of append: = append (cons head (append tail l)) m // Second rule of append: = cons head (append (append tail l) m) // Inductive assumption: = cons head (append tail (append l m)) // Second rule of append (reversed): = append (cons head tail) (append l m) = append k (append l m).
As such, if the property holds for tail we have append ((cons head tail) l) m = append (cons head tail) (append l m) for any arbitrary head. So, the property holds for cons head tail as well. We have shown, with structural induction on k, that the second lemma holds for all lists.
Now, we define a new operation; reverse.
Function reverse : (list a) -> (list a) reverse nil = nil. reverse (cons head tail) = append (reverse tail) (cons head nil)
Let’s take the reverse of the list [1,2,3]:
reverse (cons 1 (cons 2 (cons 3 nil))) = append (reverse (cons 2 (cons 3 nil))) (cons 1 nil) = append (append (reverse (cons 3 nil)) (cons 2 nil)) (cons 1 nil) = append (append (append (reverse nil) (cons 3 nil)) (cons 2 nil)) (cons 1 nil) = append (append (append nil (cons 3 nil)) (cons 2 nil)) (cons 1 nil) = append (append (cons 3 nil) (cons 2 nil)) (cons 1 nil) = append (cons 3 (cons 2 nil)) (cons 1 nil) = cons 3 (cons 2 (cons 1 nil)). // [3,2,1]
Now, we want to prove that the following property of our reversal and append functions hold:
Lemma reverse_append = forall (l : list a, m : list a), reverse (append l m) = append (reverse m) (reverse l).
This lemma states that reversing the result of appending lists l and m is equal to appending the reversals of lists m and l. We will prove this by induction on l. First, consider l = nil.
reverse (append l m) = reverse (append nil m) // First rule of append: = reverse m // Reverse use of our first lemma // append m nil = append m: = append (reverse m) nil // First rule of reverse (reversed): = append (reverse m) (reverse nil) = append (reverse m) (reverse l).
Thus, with l = nil the property holds. Now, assume the property holds for tail; thus, reverse (append tail m) = append (reverse m) (reverse tail). Consider l = cons head tail.
reverse (append l m) = reverse (append (cons head tail) m) // Second rule of append: = reverse (cons head (append tail m)) // Second rule of reverse: = append (reverse (append tail m)) (cons head nil) // Inductive assumption: = append (append (reverse m) (reverse tail)) (cons head nil) // Use of second lemma: = append (reverse m) (append (reverse tail) (cons head nil)) // Second rule of reverse (reversed): = append (reverse m) (reverse (cons head tail)) = append (reverse m) (reverse l).
As such, if the property holds for tail, we have reverse (append (cons head tail) m) = append (reverse m) (reverse (cons head tail)) for any arbitrary head. Thus, the property also holds for cons head tail. So, with structural induction on l, the property holds for all lists.
]]>In order to understand the difference between reason and logic, I think it is important to first understand reason more deeply.
Reason is traditionally split into two types. Theoretical reasoning is a type of thought to come to a certain belief, whereas practical reasoning is a type of thought to change plans or intentions. What is reasonable in one type, is not necessarily reasonable in the other. For example, in practical reasoning one might be presented with multiple, equally satisfactory options. It would be rational to choose an arbitrary option; otherwise one would be stalled through inaction. The same does not hold for theoretical reasoning: when presented with multiple, equally satisfactory beliefs it would not be rational to arbitrarily choose one to believe. However, it is possible to rationally choose which beliefs or questions one evaluates with theoretical reason. The conclusions remain unaffected.
Ordinarily, reasoning is applied in a conservative manner. One’s current beliefs and intentions are changed if there is a special reason to do so, and conserved otherwise. This is in contrast with foundational reasoning, where a belief should only be continued to be held if there is a justification to do so. In such a reasoning system, there are some foundational beliefs that require no further justification, such as current perceptions and logical axioms. In general, humans reason conservatively.
Logical processes can be divided into three modes. One such mode is deduction, where logical conclusions are reached from premises. A proof that some conclusion is true through deduction starts with the premises, then consists of a series of steps where each step follows logically from the prior steps or the premises, and finally leads to the conclusion. This proof is sometimes called “deductive reasoning”.
In reality, it is not actually a type of reasoning. In constructing such a proof, one can have many different considerations. One first determines what they are setting out to prove, upon which they might consider which intermediate steps could be useful. They then aim to prove these intermediate steps, and only then prove the conclusion is true using those intermediate results. The reasoning behind constructing a logical proof does not necessarily follow the same structure as the proof itself: the deductive rules must be satisfied for the proof, but are not necessarily followed in the proof’s construction.
As such, something that is reasonable is not necessarily logical, and something that is logical is not necessarily reasonable. One can reason about deductions, but deduction is not a kind of reasoning. Logic, at least its deductive mode, is not a theory of reasoning and does not tell us how we govern our beliefs and intentions.
If you would like to explore logic and reasoning in more detail, I highly recommend reading the article Internal critique: A logic is not a theory of reasoning and a theory of reasoning is not a logic, by G. Harman (2002).
]]>
There are various different, but equivalent, Turing machine definitions. A one-tape Turing machine can be defined as a 6-tuple
with:
The input of the Turing machine is written on the initial tape configuration. We assume that initially the first symbol on the tape is always the blank symbol , and that the following symbols are the input. We also assume that the input is followed by an infinite sequence of blank symbols (i.e., the tape is infinite).
For example, if the input is “1011”, the tape would be:
Position | 0 | 1 | 2 | 3 | 4 | 5 | 6 | … |
Symbol | B | 1 | 0 | 1 | 1 | B | B | … |
The Turing machine will start its computation in the initial state with the head at tape position 0. It will follow the transitions in the transition function, and halts when there are no more transitions it is able to take from the current head position. The input is accepted if the halting state is an accepting state. The output is the tape content after the machine has halted, but note that it is possible for machines to never halt for certain inputs.
We can represent Turing machines graphically. Let’s look at an example machine:
This machine describes the machine with:
and initial state .
If we run this machine with input “1011”, we compute . We can follow the computation by evaluating what the machine does at each step:
Step 1
Tape: B1011BB…
State:
Symbol: B
Transition:
Step 2
Tape: B1011BB…
State:
Symbol: 1
Transition:
Step 3
Tape: B0011BB…
State:
Symbol: 0
Transition:
Step 4
Tape: B0111BB…
State:
Symbol: 1
Transition:
Step 5
Tape: B0101BB…
State:
Symbol: 1
Transition:
Step 6
Tape: B0100BB…
State:
Symbol: B
Transition: halt
The input is accepted as the machine stops in . The output is the input with 0s and 1s flipped: “0100”. By analyzing the machine, we can easily see that it accepts any input that ends with a “1”, and that the output is always the input with 0s and 1s flipped.
Sometimes we want computer programs to perform computations with as input other computer programs. For example, a compiler takes source code and turns it into a program, a virus scanner can look at programs and indicate whether it believes they are viruses or not, and an interpreter takes source code and directly executes the program that is described. To be able to look at Turing machines with Turing machines, we require a means to encode such machines.
There are many different encodings possible. One such encoding is the following, where we encode Turing machines as strings of 0s and 1s. Given a machine , and by ignoring accepting states, the encoding of is :
with for a transition :
with:
Thus, the encoding of a Turing machine is three 0s followed by the encoding of the transitions (separated by two 0s), and ends with three 0s. The example Turing machine above would be encoded to (transitions are separated by spaces for clarity):
000 101110110111011 00 1101011011011 00 11011011101011 00 11101011011011 00 111011011101011 000
A formal language is not the same as a natural language (such as English). A formal language is a set of strings made from symbols, and certain rules may apply as to which strings are in the language. Turing machines can recognize certain types of formal languages. The example machine above recognizes the language ; that is, it recognizes the language of all strings of 1s and 0s that end with a 1. Note that in recognizing languages, the output of machines is irrelevant. We only need to know whether the input is accepted or not.
There are two types of languages related to Turing machines, recursive languages and recursively enumerable languages. A recursive language is a language for which there exists a Turing machine that will accept any string that is in the language, and rejects any other string. It is important that this machine halts for every possible input. On the other hand, a recursively enumerable language is one for which we can construct a Turing machine that will enumerate all strings in the language, meaning that it will produce output:
B | w_{1} | B | w_{2} | B | w_{3} | B | w_{4} | B | … |
with w_{i} valid strings in the language. For infinite languages this enumeration operation will obviously never halt, but any given string will be reached eventually. Note that this means there does not need to exist a Turing machine that will tell you for all possible inputs whether it is or is not in the language. One is able to construct a machine that will recognize and halt for each valid string, but you cannot guarantee that the machine will halt for strings that are not in the language. One such machine can be constructed from the enumeration machine: enumerate until you find the input string, then halt and accept. Clearly, if the input string is not a valid string and the language is infinite, this machine will never halt.
The set of recursive languages is a subset of the recursively enumerable languages. Given a machine that halts for any input and accepts precisely the words of language , we can construct a machine that enumerates language . The enumerating Turing machine for language generates all possible strings and places them on the output if is accepted by .
Given an input and a description of a Turing machine (or any computer program), the halting problem is the problem of determining whether the computation will halt eventually or will run forever. Using his Turing machines, Alan Turing proved that there cannot exist a general algorithm that solves the halting problem for all pairs of computer programs and inputs; the halting problem is undecidable.
We will now prove that the halting problem is undecidable. If the halting problem would be decidable, we could make a Turing machine that, given an encoding of a Turing machine and an input for that machine, would give as output 1 if halts, and 0 if it does not. Using , we can make a machine that takes as input the code for a Turing machine and asks whether that machine halts when it looks at input ; i.e., asks whether halts. Then, we construct such that it halts precisely when does not.
Now, instead of making look at an arbitrary machine, we make look at itself: we compute . Using our definition of , we see that halts precisely when its input machine on its encoding, , does not.
However, this is a contradiction. As such, our initial assumption that the halting problem is decidable must be invalid; the halting problem is undecidable.
Gödel’s first incompleteness theorem states that in a consistent logic system, you will not be able to prove everything that is true. A consistent logic system is one which does not contain contradictions. The first incompleteness theorem follows directly from the halting problem. To prove this, we start by assuming that we can prove all things that are true. Take two things: one being that a computer program will halt for a given input, and the other being that that computer program will not halt for that input. Given our assumption, we would be able to prove that one of these is true, and thus we would be able to decide the halting problem. However, we know that the halting problem is undecidable, so this is a contradiction. It follows that we cannot prove all things that are true, which proves Gödel’s first incompleteness theorem.
]]>The view that machines cannot give rise to surprises is due, I believe, to a fallacy to which philosophers and mathematicians are particularly subject. This is the assumption that as soon as a fact is presented to a mind all consequences of that fact spring into the mind simultaneously with it. It is a very useful assumption under many circumstances, but one too easily forgets that it is false.
]]>— Alan Turing, 1950
Let’s say a researcher has an initial data set of ten samples. The effect they are looking for is not present in this data. Of course it is possible that they were just unlucky, and that simply more data is all that is needed to find the effect. The researcher can now start to draw more samples, and will continuously tests whether there is a significant effect. Once they find a significant effect, they stop looking. They do this until the data set hits 100 data points, after which the researcher is convinced there is no effect. When the statistical test used is looking at the 5% significance level, one might think that doing this still gives a probability of just 5% of making a mistake. That is wrong.
Instead, by doing this the researcher finds themselves with an approximate 28% chance of finding an effect while there is none. This can be confirmed by writing a bit of code that performs this procedure for a large number of initial samples from a distribution without the hypothesized effect, which is precisely what I have done to find this number. I’ve included the code I wrote below. By running this code in MATLAB, you will find that about 2,800 out of 10,000 experiments will (eventually) yield a significant result. This means that in about 24% of data sets with initially insignificant effects, you will find a significant effect just by continuing to look for that effect. The issue is easily remedied; choose a hypothesis and sample size and stick with them.
A related issue can pop up when doing other types of analyses. In data mining, a researcher is usually interested in finding novel patterns in data. This means the researcher cannot choose one hypothesis before testing; they will be forming and testing a great number of hypotheses as they explore the data. One way to do this without fishing is to split the data randomly into two sets: exploring, and testing. The researcher uses the exploring set to formulate hypotheses they want to test, and they test all the chosen hypotheses on the testing set. Compare this with the training, testing and development sets I posted about earlier. Note that one still has to account for the number of tests they are performing.
pd = makedist('Normal'); % Make a normal distribution (mu=0, stdev=1) numExperiments = 10000; % Number of experiments we want to run numAdditions = 90; % Number of samples to add iteratively to find a % significant result significances = 0; for i = 1:numExperiments data = random(pd, 1, 10); % Perform a one-sample t-test (null hyp. mu == 0) h = ttest(data); if h == 1 % Sample mean differs significantly from 0 significances = significances + 1; else % Sample mean does not differ significantly from 0 for j = 1:numAdditions r = random(pd); data = [data r]; % Draw and add new sample % Perform a one-sample t-test (null hyp. mu == 0) h = ttest(data); if h == 1 % Sample mean significantly differs from 0 significances = significances + 1; break; end end end display(['Significances found ' num2str(significances) '/'... num2str(i)]); end]]>
Additionally, due to this useful orbit, the station could be used as an “Exploration Gateway Platform“. This Gateway would allow for a great number of scientific missions, and would pave the way for sustainable Lunar exploration. Other long-term goals of the Gateway include exploration of asteroids and, ultimately, Mars. See the Global Exploration Roadmap (2013) for more information about the Gateway and potential missions.
The goal of the challenge is to design a number of missions in an international group of university students. The missions should be based on the rough architecture named “Human-Enhanced Robotic Architecture and Capabilities for Lunar Exploration and Science” (HERACLES); this architecture describes a number of landers, ascent modules, robots, etc, that will work in concert to provide an unprecedented opportunity for exploring the Moon.
I am really excited to be participating in this challenge, and I think it was a great idea of ESA to organize it. Good luck to all competing teams, and have fun!
]]>A little while ago I found myself needing to plot a heat map table in MATLAB. Such a plot is a table where the cells have background colors; the colors depend on the value in the cell, e.g. a higher value could correspond with a warmer color. I found no existing function to do this easily, so I set out to create my own solution.
The code to plot a heat map table can be found here.
Usage is pretty simple. If you have a matrix , just pass it into the function and it will do the rest! For example:
A = zeros(7,7); for i = 1:7 for j = 1:7 A(i,j) = i+j-2; end end tabularHeatMap(A);
There are a number of options available. See the documentation in the code for more information about the options. To further adjust the generated figure, such as to add labels, proceed as you would with other plotting functions. For example:
confusion = crosstab(responses, correctAnswers); h = tabularHeatMap(confusion, 'Colormap', 'winter'); title('Confusion Matrix'); xlabel('Correct'); ylabel('Response'); h.XAxisLocation = 'top'; h.XTick = [1 2 3]; h.XTickLabel = {'A', 'B', 'C'}; h.YTick = [1 2 3]; h.YTickLabel = {'A', 'B', 'C'};]]>
The overly complex model found in (b) is said to have overfitted. A model that has been overfitted fits the known data extremely well, but it is not suited for generalization to unseen data. Because of this, it is important to have some estimate of a model’s ability to be generalized to unseen data. This is where training, testing, and development sets come in. The full set of collected data is split into these separate sets.
The training set is the part of all the collected data that is used to tune a model’s parameters. Generally, this set comprises 50-80% of all data. To increase the probability that a model is generalizable the model’s parameters must be tuned with as much data as possible. So, the training set must be as large as possible, while still allowing for large enough testing and development sets.
The testing set is used to test the generalizability of a fully trained model by testing its ability to predict relationships in unseen data. Once a model is put to use in the real world, it will not have seen any of the data before it has to make a prediction. So, it is important that none of the data points in the testing set have been used to tune any of the model’s parameters, otherwise we do not get a fair estimate of the model’s generalizability. The testing set often comprises 10-25% of all data. The larger the testing set, the more accurate the estimate of generalizability will be.
The development set is closely related to the testing set, but there is a subtle difference between the two. When searching for models that predict relationships in data, you generally try a multitude of models. In addition, you often need to tune model hyper-parameters such as learning rate or parameter regularization. To evaluate and compare these choices, you would train models on the training set and evaluate their ability to be generalized on unseen data. However, (and here comes the subtle part,) if one uses the testing set to evaluate generalizability of models against each other, they are actually tuning the hyper-parameters and model choices to data that should be left completely unseen in order to be able to make a confident statement about the generalizability of a specific model. If that person then uses the testing set to assess the generalizability of the chosen model, they would get an underestimate of the true test error. As such, by using a development set of unseen data to evaluate models against each other, the testing set of unseen data can still be used as an estimate of generalizability of a specific model. The development set often has the same size as the testing set; 10-25% of all data.
Keep in mind that after evaluating the chosen model on the test set, you should not tune the model’s parameters any further.
]]>The modus ponens belonging to this implication can be written as:
A commonly made mistake is to erroneously also assume the opposite: if the grass is wet it, is raining. This is called the converse:
This statement is not necessarily true; it does not follow logically from the first implication. The grass could have become wet through other means, or it could have stopped raining. This can be seen easily by diligently constructing a truth table. The implication says nothing about the grass being wet if it is not raining, so the grass can be either wet or dry if it is not raining (the first two rows). If it is raining, then the grass cannot be dry, as that would contradict the implication (the implication would be false, third row). If it is raining, clearly the grass can, and should, be wet (the fourth row).
it rains | the grass is wet | it rains → the grass is wet |
---|---|---|
false | false | true |
false | true | true |
true | false | false |
true | true | true |
If you look at the rows in this table where the grass is wet is false and the implication is true, you see that the only possibility is that it isn’t raining. If you look at all the rows where both the grass is wet is true and the implication is true, you see that there are two possibilities. It can be raining, but it it also possible that it isn’t raining. The mistake people sometimes make to forget about the negative option is understandable; and certainly, the grass being wet makes it more probable that it actually is raining. However, one should take care not to make these kinds of mistakes. Logic is the basis of our reasoning, we should use it with care.
]]>