Saturday, February 20, 2010

6.4 Falsifiability vs Verifiability

Karl Popper, the well-known modern critic of Logical Positivism, wrote The Logic of Scientific Discovery in the 1930's. In it he promoted the revolutionary idea that the Logical Positivists' requirement of verifiability was too strong a criterion for science, and should be replaced by a criterion of falsifiability. The Positivists held that statements about the world are meaningless and unscientific if they cannot be verified.

To the Logical Positivist, the technical term, "Verification", has a very precise meaning (though different Positivists have had slightly different definitions of it). Generally it indicates that a statement is meaningful only if it is either empirically verifiable or else tautological (i.e., such that its truth arises entirely from the meanings of its terms). "Verifiability" requires that a statement be logically entailed by some finite set of observation reports. Later Positivists, having abandoned this view, required of a verifiable statement only that it be made evident or supported or rendered probable by the relevant set of observations.

Popper, on the other hand, argued that verifiability was more a requirement for "meaning" rather than science. He explained that there exist meaningful theories that are not scientific, and that a criterion of meaningfulness is not the same as a criterion for demarcation between science and non-science. Popper proposed that that falsifiability was the correct rule for this use because it did not invite the philosophical problems inherent in verifying via induction. It allowed for statements from the physical sciences which seemed scientific but which did not meet the more stringent verification criterion. In other words, it is more difficult to construct strictly verifiable hypotheses than it is to devise falsifiable ones, and for this reason, the criterion of verifiability excludes much of what we would consider to be real science. If verifiability were the criterion, then the targets which science could address would be far more constrained.

One of the main criticisms of logical positivism was that its own principle of verification, which held that a statement is only meaningful if it can be empirically verified, was self-defeating. The Positivist "Verification Principle" was central to the Positivist project of demarcating scientific knowledge from non-scientific or metaphysical claims. The problem with the Verification Principle is that it cannot be empirically verified itself. This means that the principle fails its own test for meaning and is thus rendered meaningless according to the Positivist's own criteria. This problem is known as the "verification principle's self-refutation" and was pointed out by several philosophers, including Popper and Quine, who argued that the principle was either trivially true or not at all true.

This critique undermined the Positivist's claim to have found a solid foundation for scientific knowledge and contributed to the decline of the movement. However, it is important to note that this was not the only reason for the demise of logical positivism. Other factors, such as the emergence of new scientific theories, critiques of the positivist's understanding of language and meaning, the emergences of Quantum Theory (and the "Uncertainty Principle"), and the social and political changes of the time, also played a role.

To be clear, just because something is "falsifiable" does not mean it is false. Rather, it means that if it is false, then this can be shown by observation or experiment. Popper used falsification as a criterion of "demarcation" to draw a sharp line between those theories that are scientific and those that are unscientific or pseudo-scientific. He was motivated by a frustration with what he considered to be some unscientific theories that were popular at the time - Marxism and Freudianism, and his great admiration of Relativity Theory. He saw the one set of theories as qualitatively different than the second. Marx and Freud were making claims that were fundamentally incapable of being disproved, while Einstein's claims were very clearly capable of disproof. This difference is the essence of Falsifiability. It is useful to know if a statement or theory is falsifiable, if for no other reason than that it provides us with an understanding of the ways in which one might assess and test the theory. One might at the least be saved from attempting to falsify a non-falsifiable theory, or come to see an unfalsifiable theory as unsupportable.

Popper claimed that, if a theory is falsifiable, then it is scientific; if it is not falsifiable, then it is not open to falsification and therefore not a meaningful scientific issue. This puts most (but interestingly, not all) questions regarding God and religion outside the domain of science. Falsifiability also circumvents the debate over whether the domain of science only encompasses the "natural world" as opposed to the "supernatural". Instead it frames science within the bounds of a methodology - science deals with hypotheses that can be falsified.

Falsifiability certainly ranks as one of the most important elements in the modern conduct of science. Its superiority to verifiability results from fact that no number of positive experimental outcomes can ever absolutely confirm a scientific theory. But a single counter-example is decisive. It shows that the theory being tested is false, or at least incomplete. Instead of saddling scientists with the impossible task of providing absolute proof, a theory was considered to be tentatively “true” if ample opportunity and means were proposed to disprove it, but no one was able to do so.

Popper demonstrated his position with an example of the rising sun. Although there is no way to prove that the sun will rise every morning, we can hypothesize that it will do so. If only on a single morning it failed to rise, the theory would be disproved. Barring that, it is considered to be provisionally true. The longer a theory retains this provisional status, and the more attempts are made to test it, the greater its claim to firm truth. The “sun-will-rise” theory has been well tested many billions of times, and we have no reason to anticipate that circumstances will cause it to stop happening. So we have a very good reason to believe that this theory represents reality. This argument has some weaknesses (primarily that it is not deductively ironclad). But because no stronger proof suggests itself, it is consistent with other information we have, and it is pragmatically useful, it remains a very good operating theory.

However, critics have legitimately pointed out practical problems with the straightforward use of falsifiability to test theories. The basic idea that Popper proposed is that a scientist proposes a theory, and researchers test the theory in an attempt to find confirming and/or contradictory evidence. If it can be falsified by reliable and reproducible experimental results, that theory must be abandoned.

Thomas Kuhn argued that the actual practice of science doesn't follow this type of pattern at all. He pointed out that in the history of science, there have been several famously incorrect conclusions reached by applying this standard. For example, when it was discovered that the orbit of the planet, Uranus, did not follow the path predicted by Newtonian mechanics, it appeared to falsify Newton. However, the desire to retain Newton's laws was so strong (after all, it was coherent with so many other theories), that post hoc explanations were introduced to save it - in this case another planet was posited even further out than Uranus. This actually turned out to be the case (Neptune was discovered some years later). But it is considered a weakness in a theory to have to postulate ad hoc changes simply to save it from the facts.

A similar problem was encountered by Lord Kelvin in his attempt to falsify claims of great antiquity for the Earth's age. He was a devout Christian who objected to Darwin's new Theory of Evolution, and was very motivated to demonstrate that not enough time had passed for evolution to have occurred. He calculated the rate at which the Earth has probably cooled since its early (assumed) molten state. This number (about 20 to 100 million years or so) happened to agree very well with his other calculation for the age of the sun. Our planet could certainly not be older than the sun. And he thought that if the sun were made of even the highest grade coal, it could have been burning at its current rate for only a few thousand years. He added in gravitational contraction as an alternate source of heat, and arrived at the same age - 20 to 100 million years. Thus, he believed he had falsified the Theory of Evolution, which requires much more time than that. Of course, if we altered the assumption concerning the source of heat for the sun (e.g. from coal to fusion), it would allow a far greater age for both it and the Earth. To Kelvin's credit, he did recognize this when he wrote,

"inhabitants of the earth cannot continue to enjoy the light and heat essential to their life for many million years longer unless sources now unknown to us are prepared in the great storehouse of creation."

Another great example is the Theory of Evolution. The current "Modern Synthesis" Theory is quite different from the version first proposed by Darwin, though it shares many of the basic fundamentals. It has incorporated Mendelian genetics over Darwin's version of inheritance, and was improved by our increased understanding of DNA and molecular biology. The theory has changed over time in the face of new information that showed limitations, weaknesses, and actual errors in the original theory. It was not discarded, but improved.

The question is, though, how many times are we allowed to move the goalposts and alter theories, and the background assumptions of those theories, after they have failed? The addition of ad hoc, "auxiliary propositions" can weaken the original theory by erecting a shaky scaffold of special cases around it. But even Popper admitted that the "naïve falsification" he proposed has to be flexible enough to bend with necessity:

"Some genuinely testable theories, when found to be false, are still upheld by their admirers—for example by introducing ad hoc some auxiliary assumption, or by reinterpreting the theory ad hoc in such a way that it escapes refutation. Such a procedure is always possible, but it rescues the theory from refutation only at the price of destroying, or at least lowering, its scientific status."

I question his assertion that introduction of ad hoc assumptions inevitably weakens or lower the status of the theory. In the examples above involving Uranus, the sun's age, and Evolution, the addition of new considerations into the theories enriched them and eventually led to a clearer understanding of the solar system and life. The modified theories were actually stronger than the original ones.

As attractive and seemingly fail-safe as Popper's Falsifiability Theory may have seemed, especially when contrasted with the most competitive alternative of the time - the Logical Positivist's Verifiability technique - it is clearly not a panacea. We see that it has some serious limitations and weaknesses. Popper's theory was basically that no amount of data could ever completely prove a theory, but that even a single piece of counter-evidence is sufficient to disprove it. It is a wonderful guideline, like Occam's Razor or "measure twice cut once", but it is not useful in all scientific endeavors. Although the naïve initial temptation upon learning about it is to apply it indiscriminately, it turns out that it is not universally applicable.

The first problem concerns the assertion that no amount of data can confirm a theory. This is simply not how science is actually practiced. Overwhelming and consistently supportive data boosts confidence in a theory to the point where it is accepted as a practical fact; that to dispute it would be contrary and perverse. No one seriously disbelieves in the law of gravity (e.g., that apples may start falling upward tomorrow). Scientists usually don't need to confirm a theory one hundred percent in order to trust and use it as if it were true.

As Kuhn described in his work, scientists do not discard a theory as soon as an experimental observation contradicts the theory. That contrary evidence would need to be reproduced several times, and other similar experiments would need to be done to probe the potential weakness and boundaries of the problem area. There could have been flaws in the experimental methodology or the analysis of results, or maybe the theory just needs a minor adjustment to accommodate the new data.

Another reason why the importance of falsification has declined is because much of modern science is model-based rather than hypothesis and theory-based. Doing science using models rather than theories doesn't really lend itself to falsification, since there are no experiments being conducted to isolate behaviors that will yield evidence supporting or contradicting a hypothesis. Of course models are incomplete and simplistic (compared to the complex physical process they are modeling). There are bound to be errors in them. Data that conflicts with the model doesn't necessarily imply that the model should be discarded, but more likely that the model needs additional refinement to address aspects of reality that were left out of it or incorrectly dealt with in the model. In fact, discovering and repairing flaws in the models adds deeper understanding of the real-world phenomena the model attempts to replicate. Climate and weather models, molecular models, economic models, cosmological models, and the rest are not always thrown out when reality conflicts with them - instead they are usually enhanced or modified to incorporate the new information, making the models stronger and more accurate and representative. Sometimes older models become so ragged and jury-rigged that it makes more sense to discard them and begin again with a new approach.

One last problem with falsification is that much of science does not involve establishing the correctness of theories - they are not tests of theories. Materials science, chemistry, biology, computer science, and others involve activities that don't involve falsification or verification - they are making things like new materials, molecules, pharmaceuticals, software solutions, and devices. There is nothing to falsify - so Popper's method is simply irrelevant in these legitimate sciences.



Duhem-Quine Thesis

The Duhem-Quine thesis adds another objection to Popper's criterion of falsifiability (sad to say). Falsifiability works so well in so many cases, but it does, unfortunately have at least one fatal flaw, which Duhem and Quine identified. They assert that no hypothesis entails predictions about an empirical observation alone, because that hypothesis is always associated with a large collection of supporting assumptions, theories, and auxiliary hypotheses. This thesis states that it is impossible to test a hypothesis in complete isolation, because an empirical test of the hypothesis requires one or more of these background assumptions, the ultimate background assumption being that we can even rely on the rules of logic. The hypothesis being tested cannot be completely segregated from the assumptions that support it. Instead, the consequences of the hypothesis rest on background information, and that information must itself be tested and proven (or at least shown not to be false) - they must be accounted for. And those background assumptions, themselves may depend on other background assumptions. If your experiment on the primary hypothesis generates a negative result (i.e., you have falsified the hypothesis), but you have not accounted for all background assumptions (ad infinitum), you really can't draw any conclusion. Your hypothesis may indeed be wrong, or the background assumptions may have problems which invalidate the falsification. The case involving the age of the Sun (above) is an actual example of this - the background assumption about the source of heat in the sun was wrong, as was his additional assumption about the rate of the Earth's cooling. So, although Kelvin thought he had falsified evolution, he had done nothing of the sort. Further, a discrepancy between what a hypthesis predicts and the actual observed data does not necessarily falsify the hypothesis because there may have been flaws in how the data, itself, was collected or measured.

The thesis can be expressed using symbolic logic:
H → E
This says that "If H, then E", where H is the hypothesis and E is evidence, or an observation expected if the hypothesis is true. That is, if the hypothesis is true then we should see the evidence. By the logical rule of modus tollens,
¬E → ¬H
This says that if we do not observe E, then H is false. In other words, if we don't observe the evidence when running an experiment, then the hypothesis has been falsified. For example, say that H is the hypothesis that water boils at 100°. You have a pot of water you intend to boil. If you heat the water past this temperature and it does not boil, then you have falsified the hypothesis. But this assumes quite a lot, for example it assumes that you are at one atmosphere of air pressure, and that the water is pure and unadulterated. But what if you are at two atmospheres of pressure, or what if the water has a contaminant, such as antifreeze, in it that raises the boiling point? The background assumptions have been violated. So a better expression of the experiment is:
(H & A) → E
This means that H, along with some background assumptions imply E. So, if you don't observe E, then (H&A) are false
¬E → ¬(H & A)
This says that the combination of H and the its background assumptions, A, is false. A is not just a single assumption, but many (such as we are at one atmosphere, that the water is pure, that the thermometer is well calibrated, etc). So, A is really (A1 & A2 & A3 & ... & An), where each of the A's is a different background assumption. So now we have:
(H & (A1 & A2 & A3 & ... & An)) → E
and also:
¬E → ¬(H & (A1 & A2 & A3 & ... & An))
The above expression, ¬(H & (A1 & A2 & A3 & ... &An)), is the same as any of these:
¬H | ¬(A1 & A2 & A3 & ... &An)
¬H | (¬A1 | ¬A2 | ¬A3 | ... | ¬An)
¬H | ¬A1 | ¬A2 | ¬A3 | ... | ¬An
This means that if you don't observe E, then either the hypothesis, H, is wrong, or one of the background assumptions, (A1, A2, A3, ..., An), are wrong, or some combination of H and one more of An is wrong.

So, as frequently as the idol of "falsifiability" is honored in the context of science, a "naïve falsification" approach truly is insufficient. Serious researchers must also take background assumptions into account to ensure that they, too, have strong support. But all is not lost. Obviously, science still occurs, experiments are run, hypotheses are falsified, and progress is still made. Overall, this critique of Popper's method been healthy for science. Researchers are forced to take less for granted in their assumptions, do a thorough job of supporting their underlying assumptions, and check their experimental methods to buttress against possible errors in auxiliary assumptions. The reliance on background assumptions cannot be eliminated, but their destabilizing influence can be minimized to, hopefully, manageable levels. The process of justifying beliefs and assumptions can only begin once a number of precursors assumption are independently justified. Some of these fundamental assumptions must be accepted as self-evident if they cannot be justified because they comprise the frame in which justification takes place.