Karl Popper versus Data Science

I’m sure you’ve heard of Big Data and IoT (Internet of Things) by now. There is a current in computing now that is based on the economics of nearly unlimited resources for computational complexity including Cognitive Computing (AI + Machine Learning). From this, many are seeing the “end of science,” meaning, the truth is in the data and the scientific method is dead.Previously, a scientist may observe certain phenomena, come up with a theory and test it.He is a counter example.

Using algorithms from Topology (yeah, I studied topology in the 70’s) investigators can apply TDA (Topological Data Analysis) to investigate the SHAPE of very complex, very high-volume, very hi-dimensional data (1000’s of variables), deform it in various ways to see what its true nature is and find out what’s really going on. Traditional quantitative methods can only sample or reduce the variables using techniques like Principal Component Analysis (these variables don’t seem very important).

In one case, an organization did a retrospective analysis of every single trial and study on spinal cord injuries. What they found with TDA was that one and only one variable had a measurable effect on outcomes with patients presenting with SCI – maintaining normal blood pressure as soon as they hit the ambulance. No one had either seem or even contemplated this before.

Karl Popper was one of the most important and controversial philosophers of science of the 20th century. In “All Life is Problem Solving,” Popper claimed that “Science begins with problems. It attempts to solve them through bold, inventive theories. The great majority of theories are false and/or untestable. Valuable, testable theories will search for errors. We try to find errors and to eliminate them. This is science. It consists of wild, often irresponsible ideas that it places under the strict control of error correction.”

In other words, hypothesis precedes data. We decide what we want to test, and assemble the data to test it. This is the polar opposite of the data science emerging from big data.

So here’s my premise. Is Karl Popper over? Has computing killed the scientific method?


This entry was posted in Big Data, Decision Management, Genomics, Medicine, Research, Uncategorized and tagged , , , . Bookmark the permalink.

9 Responses to Karl Popper versus Data Science

  1. Maybe I’m missing something ,but I don’t think it changes the scientific method in a fundamental way. Mine the data to formulate an hypothesis. Then you still need to test the hypothesis to falsify it. The data that falsifies the hypothesis may not be in the data set you use to formulate the hypothesis.

    • nraden says:

      There are two ways it is different. The first is, gather all the data in the world (data lake) and throw some algorithms at it to see what’s interesting. This is quite different from observing a phenomenon (in vivo, in vitro, in lab, etc) and gathering data to understand it. Second, it has major ramifications on how we go about solving these problems. Previously, we think about the problem, think about what data would be useful and MODEL the data for our experiments within the (previous) limits of our computational complexity. That constraint is gone. So it’s not whether or not a hypothesis is formed, it’s whether it’s based on scientific inquiry of just swimming through data.

  2. nraden says:

    Gödel’s Incompleteness Theorem, beyond its staggering influence on mathematics and every other field of complexity, implies and entails the falsity of mechanism (that minds can be explained as machines) , the dead-endedness of the field of Artificial Intelligence, if AI presumes to fully explain our thinking.

    Among the things that Gödel indisputably established was that no formal system of sound mathematical rules of proof can ever suffice, even in principle, to establish all the true propositions of ordinary arithmetic. This is certainly remarkable enough. But a powerful case can also be made that his results showed something more than this, and established that human understanding and insight cannot be reduced to any set of rules. Gödel’s Theorem shows this and provides the foundation of my argument that there must be more to human thinking than can ever be achieved by a computer, in the sense that we understand the term “computer” today.

    • davgar says:

      My suspicion is that the human brain is not a Turing machine and is therefore not bound by results for that kind of computer. We just have to develop a theory for the kind of computer it actually is.

      Well, “just”. 😉

  3. VJ says:

    In my experience throwing an algorithm at data and wait until what comes out does not work or at least I have not seen it work – one needs to have some initial or an understanding of a hypothesis, and this could get disproved or invalidated by what the data says. I would say it is an iterative process — you need to an initial idea or hypothesis, validate that with data, refine or throw it and come up with something else and repeat

  4. Despite all data, Popper’s advice survives. The risk is always the Sharpshooter Fallacy – and that keeps us rooted in reality. https://en.wikipedia.org/wiki/Texas_sharpshooter_fallacy

    • nraden says:

      The Sharpshooter Fallacy, Simpson’s Paradox and of course confirmation bias – my favorite examples of the shortcomings of data analysis. Thanks for mentioning it.

  5. davgar says:

    Oh yes!
    In his work Logik der Forschung (logic of the scientific method) he discusses the failings of induction, and what could replace induction.
    For many years the example of swans was used by European philosophers to illustrate induction.
    Of course when explorers reached W Australia they found Black swans. Data Science can never predict that. A similar example is the example that we believe the Sun will rise tomorrow because it rose everyday till now. But in the high North it does not rise everyday!

    He also distinguishes science from technology. Many questions are technological when they do not require any new theories and more data can help these kind of questions. An example would be what is the best dose for this patient balancing efficacy and safety.

  6. Hi, Neil. I was mulling this over last night after our Twitter interaction.

    As I said then, Popper was right, and continues to be. Theory comes before measurement. There’s a great quote on the topic here, from Conjectures and Refutations: http://www.goodreads.com/quotes/778918-the-belief-that-science-proceeds-from-observation-to-theory-is

    Having said that, maybe the answer lies in the implicit theories we use in our “theory-free” analysis of data. (I’m making up the term, but I think you know how I mean it.)

    You mentioned that people might “throw some algorithms at [the data] to see what’s interesting”. Implicit in that practice are the theories that (a) some of the data will have specific patterns in it, namely the patterns that the algorithms detect, and (b) those patterns somehow reflect the most relevant features of the data.

    Those might not always be good theories. I remember one of the business books of the 90s (Built to Last, maybe?) stating that, when examining what makes businesses successful, they didn’t want to find “buildings” on the list. Yes, most successful businesses had buildings, but who cares? (The analogy was better at the time. 🙂 ) Correlation and causation and all that — it took a human to decide which factors really could be relevant.

    But then again, they might be pretty good much of the time. I think there’s an equivalent phenomenon when we make our own theory-laden observations. When we observe something — an apple on a table, say — our eyes have already detected patterns that are relevant to the choice of theories we use to evaluate what we see: the edge detection to see the outlines of the table and the apple, nuances of 3D perspective and focus, and so on. We sometimes misunderstand the signals we get, and we can subvert them deliberately by creating optical illusions, but *generally* speaking those patterns *are* the most relevant ones.

    How did we come to know those patterns in the first place, and to understand how they take priority in different situations?

    That’s for someone more knowledgeable than I to say. I would guess that some of it’s innate, and some of it’s learned. Perhaps the innate knowledge has an analogy in those numerical truths we’ve deduced or inferred over time: The Law of Large Numbers, memoryless functions, distribution types, and so on. And maybe the learned knowledge comes from repetitive efforts, much as repeated efforts to pick up the apple would lead us to get a better understanding of how apparent depth relates to arm length.

    Thus the real effort in data science would be twofold: First, the deliberate testing of hypotheses we’ve made; second, the rapid and somewhat wonton making of hypotheses for testing, which is done by “throwing algorithms at the data.” Not either, but both. I think that’s compatible with what Karl Popper identified in the scientific process, and perhaps clarifies what theory-free data science really does (which is absolutely not theory-free).

    Lotta words, but hey, I’m at an airport, so why not? 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s