Variables and Data

Variables:

Variables are factors relevant to our study, which can change from place to place, time to time, or specimen to specimen (i.e. things that vary). They can include properties of the organisms we are studying, or of the environment in which the organisms are found. Here is some of the terminology associated with experimental variables.
Manipulated variable is the factor that we deliberately change (manipulate) between our treatment groups in an experiment. It is the factor that we think causes some change in a second variable. It is also called the independent variable. If we are testing the hypothesis that nitrogen fertilizer causes tomato plants to produce more flowers, our manipulated variable will be the amount of nitrogen fertilizer.
Response variable is the factor we think is being influenced by the manipulated variable. The response variable shows the effect. It is also called the dependent variable because according to our hypothesis it depends on the manipulated factor. In the experiment immediately above, we would measure the number of tomato flowers as our response variable.
Confounding variables are factors, other than the manipulated variable, which could influence the response variable. Factors which vary consistently between treatments are our main concern since they can cause us to mistakenly believe our treatments are having an effect (Type I error). Factors which vary randomly between treatments are less of a problem. They add to the variance within our treatment groups, and may prevent us from seeing an effect of the treatments when it exists (Type II error). Experiments are designed to eliminate (or at least minimize) confounding variables by keeping these factors the same in all treatment groups (controlling them), or by randomizing them so that differences between treatment groups average out. In observational studies, there is no way to completely eliminate these confounding factors. Potential confounding variables in the tomato flower experiment might include natural nitrogen in the soil, light intensity, and temperature. If we grew our fertilized tomatoes all at 25 C and our unfertilized tomatoes all at 20 C, we could never be sure which factor caused the difference in growth, nitrogen, or temperature. (Growing all the tomatoes at 25 C would be the best solution, but if we randomly assigned plants to either the 20 C or 25 C greenhouses, we would at least have eliminated a consistent bias.
Controlled variables are potential confounding factors which have been kept constant (controlled) between treatments. Note that this is not the same as a control group.

Data Types:

Data (singular datum) are the pieces of information we collect in our study. If we are measuring number of tomato flowers per plant, each plant will give us a data point. Data come in several broad categories. The category of data being collected will influence the type of statistical analysis that can be performed once the study is complete (Zar 1996).
Discrete data fall into distinct categories with no grey area in between. For example, a bird can lay three eggs or four, but is unlikely to lay 3.5. Only certain values, in this case whole numbers, are possible.
Continuous data can, at least in principle, take on any value within a range of possibilities. Human height is a simple example. There are limits on how short or tall people are, but within that range, any height is possible.
Nominal scale allows us to group our data into named categories, but the categories have no numerical significance. We could separate household pets into cats, dogs and goldfish, but the names do not tell us anything about how the categories relate to one another. Nominal data are always discrete.
In an ordinal scale, the categories are ranked according to some criteria. We could rank household pets by average size (goldfish, cats, dogs) and would then have ordinal data. Note that our ranking doesn’t tell us how much difference we have between categories. Ordinal data can be discrete or continuous.
Ratio and Interval scales not only rank data but show how far apart they are. We know not only that a temperature of 20 C is higher than a temperature of 10 C, but how much higher. The difference is that a ratio scale has a real zero point, while the interval scale does not. Measurement of distance in cm is a ratio scale because 0 cm has real meaning — it is the shortest physically possible distance. Measurement of direction from 0-360 degrees is an interval scale because 0 degrees (meaning north) is an arbitrary convention. Temperature is measured on both interval scales (e.g. Celsius) in which the zero point is arbitrary, and on a ratio scale (Kelvin) in which zero corresponds to the lowest physically possible temperature. Ratio and interval data can be discrete or continuous.
Accuracy and Precision

Whenever we make a measurement, it will have some error attached to it. No matter how painstaking our efforts, or how sophisticated our equipment, we can’t hope to measure the “true” value exactly, or even get exactly the same value every time we measure. What we can do, is estimate how much error we are making. This is what accuracy and precision tell us. The two terms are often thrown out interchangeably in casual usage, but each has a distinct meaning.
Accuracy measures how close our measurement(s) come to the true value. Precision tells us how how close our measurements of the same quantity come to one another (Zar 1996). A bath scale that gave you the same weight every morning, but underestimated your weight by five kg every time, would be precise but not accurate. A speedometer that bounced continually from five km/hr too low to five km/hr too high would be accurate (the average reading it gave would be the true value) but not precise. Accuracy tells us about consistent biases in our data. It is normally depicted by the number of significant figures presented (Zar 1996). If you presented a measurement as 8.0 cm (two significant figures) you are claiming more accuracy than if you presented it as 8 cm (one significant figure). Precision tells us about random measurement errors (Zar 1996). Precision of measuring instruments is depicted by an estimate of likely error size (see descriptive statistics for more detail). For example, ± 0.05 cm for a ruler. The same ± symbol is used more commonly in biology to depict the variation typically seen in nature (see again descriptive statistics). In most cases we cannot distinguish our measuring error from the variation among individuals, and simply try to make it as small as possible.