# Alan Turing Quote from Computing Machinery and Intelligence

The view that machines cannot give rise to surprises is due, I believe, to a fallacy to which philosophers and mathematicians are particularly subject. This is the assumption that as soon as a fact is presented to a mind all consequences of that fact spring into the mind simultaneously with it. It is a very useful assumption under many circumstances, but one too easily forgets that it is false.

— Alan Turing, 1950

# Why You Should Never Go Data Fishing

Most likely you will have heard that you should never go data fishing, meaning that you should not repeatedly test data. In the case of statistical significance tests, perhaps you will have heard that because of the nature of these tests you will find an effect at the 5% significance level in 5% of cases when there actually is no effect, and an effect at the 2% significance level in 2% of cases when there actually is no effect, and so on. It is less likely you will have heard not to continue looking for an effect after your current test concluded there was none. Here is why.

# Competing in the ESA Moon Challenge

I have recently joined a group of people to form a team for competing in the ESA Moon Challenge. This challenge is held in light of recent plans to set up an inhabited station near the far-side of the Moon; specifically, at the second Lagrangian point of the Earth-Moon system (). In this orbit, the station will remain in a stable position relative to both the Earth and the Moon. This position brings great advantages, such as excellent Lunar research capabilities, as well as being relatively easy to reach with a rocket from Earth.

Additionally, due to this useful orbit, the station could be used as an “Exploration Gateway Platform“. This Gateway would allow for a great number of scientific missions, and would pave the way for sustainable Lunar exploration. Other long-term goals of the Gateway include exploration of asteroids and, ultimately, Mars. See the Global Exploration Roadmap (2013) for more information about the Gateway and potential missions.

The goal of the challenge is to design a number of missions in an international group of university students. The missions should be based on the rough architecture named “Human-Enhanced Robotic Architecture and Capabilities for Lunar Exploration and Science” (HERACLES); this architecture describes a number of landers, ascent modules, robots, etc, that will work in concert to provide an unprecedented opportunity for exploring the Moon.

I am really excited to be participating in this challenge, and I think it was a great idea of ESA to organize it. Good luck to all competing teams, and have fun!

# Plotting a Heat Map Table in MATLAB

A little while ago I found myself needing to plot a heat map table in MATLAB. Such a plot is a table where the cells have background colors; the colors depend on the value in the cell, e.g. a higher value could correspond with a warmer color. I found no existing function to do this easily, so I set out to create my own solution.

The code to plot a heat map table can be found here.

Usage is pretty simple. If you have a matrix , just pass it into the function and it will do the rest! For example:

1 2 3 4 5 6 7 |
A = zeros(7,7); for i = 1:7 for j = 1:7 A(i,j) = i+j-2; end end tabularHeatMap(A); |

There are a number of options available. See the documentation in the code for more information about the options. To further adjust the generated figure, such as to add labels, proceed as you would with other plotting functions. For example:

1 2 3 4 5 6 7 8 9 10 |
confusion = crosstab(responses, correctAnswers); h = tabularHeatMap(confusion, 'Colormap', 'winter'); title('Confusion Matrix'); xlabel('Correct'); ylabel('Response'); h.XAxisLocation = 'top'; h.XTick = [1 2 3]; h.XTickLabel = {'A', 'B', 'C'}; h.YTick = [1 2 3]; h.YTickLabel = {'A', 'B', 'C'}; |

# Training, Testing and Development / Validation Sets

Finding models that predict or explain relationships in data is a big focus in information science. Such models often have many parameters that can be tuned, and in practice we only have limited data to tune the parameters with. If we make measurements of a function at different values of , we might find data like in Figure (a) below. If we now fit a polynomial curve to all known data points, we might find the model that is depicted in Figure (b). This model appears to explain the data perfectly: all data points are covered. However, such a model does not give any additional insight into the relationship between and . Indeed; if we make more measurements, we find the data in Figure (c). Now the model we found in (b) appears to not fit the data well at all. In fact, the function used to generate the data is with Gaussian noise. The linear model depicted in Figure (d) is the most suitable model to explain the found data and to make predictions of future data.

The overly complex model found in (b) is said to have *overfitted*. A model that has been overfitted fits the known data extremely well, but it is not suited for *generalization* to unseen data. Because of this, it is important to have some estimate of a model’s ability to be generalized to unseen data. This is where *training*, *testing*, and *development* sets come in. The full set of collected data is split into these separate sets.

Continue reading “Training, Testing and Development / Validation Sets”

# Mistaken Logical Implication

One of the most common logical inferences uses *logical implication*. For example, you know that if it rains then the grass will be wet. If you look outside and see that it rains, you do not have to look at the grass to know that it is wet. This inference is called *modus ponens*: if A implies B and A is true, then B is true. Formally, the implication can be written as:

The *modus ponens* belonging to this implication can be written as:

A commonly made mistake is to erroneously also assume the opposite: *if the grass is wet it, is raining*. This is called the converse: