Scientific Experiments

Experiments are used to determine whether some hypothesis is true by testing to see what happens in the real world. Typically we want to know whether a certain action, in a certain situation, causes (or tends to cause) a certain result. This is valuable because it tells us when we might or might not want to take that action.

Suppose I want to test whether eating radishes will prevent tooth decay. Perhaps I know several people who like to eat radishes and rarely get cavities. There are several explanations for why I might have observed this.

  • It might be a coincidence.
  • It might be that some third factor caused them to both eat radishes and have few cavities, for example if these people were born with a condition that prevents cavities and also causes a craving for radishes.
  • It might be that eating the radishes actually does prevent tooth decay.
The last possibility is the most interesting because it tells us something that other people could do if they wanted to reduce tooth decay. How would we test whether an increase in radish eating will help the teeth?

The obvious thing to do is have somebody eat radishes and see if they have less tooth decay. Let's see what problems occur. We want to design the experiment in such a way as to minimize the chances we get the wrong conclusion. Specifically, if our results come out positive, we want the truth of our hypothesis (radishes reduce tooth decay) to be the only reasonable explanation for our outcome. If there are other explanations, then we don't know whether our hypothesis is true.

A simple experimental design

We might recruit twenty students from a nearby college and divide them into two groups. The students in group R are asked to come to a certain location each day where they are each given three radishes to eat. This is to continue for a year. Students in group X are told to just eat as they normally would. Arrangements are made to give all the students in both groups a dental checkup after a year, and the number of cavities each has will be recorded. If group R has fewer cavities than group X, we will conclude that eating radishes was helpful in preventing tooth decay.

Doing an experiment like this has a lot of advantages over just observing people we might encounter, but it still has some serious problems. Let's use this example to look at some of the requirements for doing experiments that provide reliable results.

Controls

One good feature of the radish experiment is that it includes a control group. That is group X, which doesn't do anything special, but is used for comparison. In order to know whether group R has a good result, we need an idea of how things turn out under normal circumstances. (more)

Single blinding

If the people in group R knew that scientists were hopeful that radishes would prevent tooth decay, and that the success of the experiment depended on their having good dental checkups, they might be tempted to "help" by avoiding sweets or brushing their teeth more often or engage in other behavior they felt would be good for their teeth. But if they did this, a successful test result might be caused by this other behavior instead of the radishes, and we could draw a false conclusion. To prevent this, researchers like to give a phony treatment to the control group and keep it secret whether each person is in the test group of the control group. This is called "blinding" since the subjects are not able to "see" which group they are in. (more)

Double blinding

There are also ways in which the experimenters can influence the results of the test. If a dentist doing checkups for the radish test knows a patient is in the R group, he might be tempted to "not count" a speck that might grow into a cavity, while he might count the same size speck for a person in the X group. Or perhaps he would "help" by counting two cavities as one if they are in the same tooth of an R group member. A dentist treating subjects before the test is over might be more thorough for an R group member than an X group member.

If the R group winds up doing better, the results might be due to the experimenter's behavior instead of the effects of the radishes. To prevent this, the experimenters can also be blinded. When they deal with any of the subjects, they are kept uninformed about which group the subject is in. (more)

Statistical significance

There can be very many factors influencing the outcome for any particular subject in an experiment, especially when the experiment involves people or other living things. These include genetics, age, occupation, eating habits, etc. Even if radishes are not helpful, the people in group R may have fewer cavities for reasons having nothing to do with what group they were in. Typically a statistical method is used to determine a level of success that would be unlikely to be caused by luck. Sometimes a "95% confidence level" is used, which implies that a positive result would have only a 5% chance of occurring by luck alone. If the results are strong enough to exceed this level, they are said to be "statistically significant." Many experiments use a tougher 99% confidence level to reduce the chances of declaring success when the outcome was actually accidental. (more)

Experimental report

Whether or not the experiment is successful, it is desirable to publish the results in a professional journal so other researchers can find out what happened. The experimental report carefully states the method used and the results achieved. If the results show a statistically significant effect, we would expect other researchers to achieve a similar effect if they used the same method. If you have a serious interest in some experimental result you have read about, it might pay to obtain a copy of the original research report. Casual discussions of tests like the ones we might see in a newspaper may leave out important details of the method or results, such as the type of radishes, and how many were eaten, and how often they were eaten. (more)

Replication

If a report has been published in a professional journal showing that radishes have been shown to reduce tooth decay, it is likely that radish growers will start advertising this benefit to those who buy their product. Does that mean we can now be sure that it works? Not really. Something may have been done improperly, such as a person involved in the experiment telling supposedly blinded participants which group they're in. There may have been overt fabrication of the data. Or it may be the occasional case in which a statistically significant result is actually caused by luck. If other researchers have repeated or "replicated" the experiment and achieved the same result, we can feel a lot more confident that the hypothesis was valid. (more)

Comments

Most of the steps described above are employed to reduce the chances of drawing a false conclusion.

Not all of these steps are done for all experiments. Sometimes they simply are not necessary because of the nature of the experiment. Other times certain steps are impractical or would cause ethical problems. On other occasions the steps should have been taken but the researchers doing the experiments failed to recognize their importance.

If we are going to make good decisions about what to believe when we hear about experimental results, we need to know as much as we can about how the experiments were done and what may have been overlooked that could invalidate the outcome.