Statistical Tests in a Nutshell
I created the following as the second half of a presentation in a neuroscience seminar class (another student did the first half on descriptive statistics and the various t-tests).  I had forgotten about this document until a fellow graduate student reminded me about it just this afternoon.  It is a good summary for anyone who has a good experimental design and doesn't know where to begin with the statistical tests to analyze their data.

 First, a joke... A group of biologists and a group of statisticians were riding a train to (surprise) some conference for biologists and statisticians.  The statisticians were noticing that only one person in the biologists’ squad had a tickets for the train, so they were making fun of how absent-minded they were, and how this just goes to show, etc. Eventually all the biologists get up and go to the restroom and cram into it.  Shortly after. the conductor comes down, notices the closed bathroom door, and so he knocks on it, and the biologists slide the ticket out under the door.  All the statisticians fork over their tickets, the conductor leaves, and the biologists come out of the bathroom and mock them and call them nerds. Well, this gets the statisticians all riled up, but they are academics, so they learn from this experience.  On the ride back, they collectively purchase a single ticket. The biologists don't purchase any tickets at all, and so once again the statisticians mock them. The conductor is seen coming down the pike. All the statisticians go into a bathroom. All the biologists go into a bathroom. One of the biologists comes out and knocks on the bathroom door. The statisticians slide their ticket out under the door.  The biologist takes it and runs back to their bathroom. ... One possible moral of the story: You should understand a method before you use it.  Especially if you are a statistician.

No math will be presented in this overview!

Multisample Hypotheses Tests
ANOVA = ANalysis Of VAriance
Assumptions of ANOVA:
1. measured variables are distributed normally (each group should be tested separately)
2. group variances are homoscedastic (not significantly different)
3. effects of treatments are additive

All of the following tests partition variation within groups. In this way background (naturally occurring) variance can be eliminated and the variance due to the treatment can be tested for significance.

One-way ANOVA

*one way = "one independent factor"

Description: A test to see if the means of more than two groups are significantly different.

Example scenario: Are the weights of mice on three different diets significantly different?

Design of Experiment:
 Diet A Diet B Diet C Mouse Weight

Variables:

Independent: Diet
Dependent: Weight

Partitioning of the variation by sources:
1. Treatment (in this case Diet)
2. Error (natural variation in weights)

Nonparametric equivalents:
Kruskal-Wallis one-way multisample test
Multisample Median test

Nested ANOVA

Description: A test to compare means between groups and between subgroups nested within those groups.

Example scenario: You are testing for levels of nitrates in the soil in two areas.  Two hypotheses are tested: 1) Mean nitrate levels are different between the two areas and 2) Mean nitrate levels are different among the sites sampled within each area.

Design of Experiment:
 Area A Area B Site A Site B Site C Site D Site E Site F Nitrate Level

Possible outcomes:

1. A = no difference, S = no difference (negative results)
2. A = difference, S = no difference (areas are different, sites within areas are similar)
3. A = no difference, S = difference (overall the areas are the same, but much variability all around)
4. A = difference, S = difference (significant variability all around)

Variables:
Independent #1: Area
Independent #2: Sites
Dependent: Nitrate level

Partitioning of the variation by sources:
1. Area
2. Sites within Areas
3. Error (natural variation)

Nonparametric equivalent:
Nonparametric Nested ANOVA

Block ANOVA

Description: A test to compare means but eliminate additional sources of variability.

Example scenario: You are testing the effect of different drugs on the heart rates of several patients.  Block ANOVA eliminates the variability among the subjects and compares the effect of the different treatments themselves.

Design of Experiment:
 Block Drug A Drug B Subject #1 Subject #2 Subject #3

Variables:

Independent: Drugs
Independent: Patients (Block)
Dependent: Heart Rate

Partitioning of the variation by sources:
1. Drugs
2. Patients (Block)
3. Remainder (natural variation)*

*technically Error = Block + Remainder, but this is not used in the calculations since you are trying to remove the variation due to the block.

Nonparametric equivalent:
Friedman Block ANOVA

Two Factor ANOVA

Description: A test to compare means of a dependent variable resulting from the influence of two independent variables.

Example scenario: Action potentials are elicited from cultures of different types of tissue via the application of two different pharmacological agents. Because this is a crossed design, you now have several hypotheses to test:
1. The number of action potentials elicited is dependent upon the tissue type.
2. The number of action potentials elicited is dependent upon the drug applied.
3. The number of action potentials elicited is dependent upon the interaction of tissue type and the drug applied.

Design of Experiment:
 Drug A Drug B Tissue A Tissue B Tissue A Tissue B Action potentials

Variables:

Independent #1: Tissue type
Independent #2: Drug
Dependent: Number of action potentials

Partitioning of the variation by sources:
1. Tissue type
2. Drug
3. The interaction of tissue type and the drug
4. Error (natural variation)

Nonparametric equivalent:
Nonparametric Two Factor ANOVA

Multiple Comparison Tests

Description: Following a significant ANOVA in which several means are compared, a MCT determines the relationships among the individual means. (In other words, this is the statistical equivalent of the fine adjustment knob on your microscope.)

Example scenario: Recall the dieting mice.  If their weights were significantly different, how do the diets relate?  A MCT might reveal that Diet A and Diet B were not significantly different, but that Diet C was different that the others. This might be expressed as A = B > C.

Parametric Multiple Comparisons:
SNK MCT (Student-Newman-Keuls)
Tukey's MCT
Dunnett's MCT
*Both the SNK and Tukey's tests work as described in the above example.  However, Dunnett's test is useful for comparing treatment means against a control mean.

Nonparametric Multiple Comparisons:
Dunn's nonparametric test

Correlation

Description: A measure of the association between two or more variables.

Example scenario: You notice the number of microglia increases following neuron death.  Are you observing a significant scientific phenomenon?

Assumptions in parametric correlation analysis:
1. Subjects are randomly selected from the target population.
2. At least 2 continuous variables measured for each test subject.
3. Chosen values must be measured independently (i.e. can't measure both midterm grade and final grade).
4. Both variables are to be random (no set levels, unlike with regression).
5. Variables should follow a Gaussian distribution (for a "point cloud" when plotted).
6. The association should be linear (just like regression; see below).
7. No assumption of cause and effect (don't designate variables "X" and "Y").

Some notes regarding correlation:
A significant correlation does not imply causation.
Correlation gives the correlation coefficient, reported as "r."
Has no units and has a domain of -1 to 1, hence it indicates the direction of association.
Can be tested for significance (either one or two tailed).

Parametric version:
Pearson’s

Nonparametric equivalent:
Spearman

Other forms of correlation:
Multiple correlation - one variable compared to a combination (A vs. XYZ)
Canonical correlation -multiple vs. multiple (ABC vs. XYZ)
Autocorrelation - possible relationship within a single variable (different levels; A1 vs. A2)

Regression(a.k.a. SLR- Simple Linear Regression)

Description: A method used to generate a mathematical equation that will describe the relationship between two or more variables.

Example scenario: You notice a declining trend in the number of neurons in rat brains as the animals age.

Differences between correlation and regression:
1. Regression assumes causation; correlation does not.
2. Regression generates a mathematical model; correlation does not.

Uses of regression models:
1. Description (A model is a more compact description of a set of data.)
2. Prediction (extrapolation and interpretation)

How a model is generated:
A regression model draws the Line of Best Fit- the line that travels through the individual data points with the smallest sum of squared residuals.  This is accomplished by using the Method of Least Squares.

General equation for a linear bivariate relationship:

<dependent variable> (units) = intercept + slope * <independent variable> (units)

The coefficient of determination: r2:
-reported as the measure of fit b/w the independent and dependent variables.
-It has a domain of 0 to 1.
-r2 = variation explained by the model / unexplained variation.
-It is not directly tested for significance.

When reporting a regression model, state:
1. The complete model with the independent and dependent variables named
2. The probability of the model
3. r2
For example: “The following highly significant (p < 0.0001, r2 = 0.91) linear model was found between hemocrit and age of 9 men: hemocrit (%) = 65.5 - 0.563 (age, years).”

Regression requirements:

1. All variables are continuous.
2. The independent variable is fixed (i.e. they are all under the control of the investigator) while the dependent variable is random.
3. A linear function will be described by the data.  You may have to transform some of the data in order to accomplish this.
4. At each level of the independent variable, the dependent variables are all independently and normally distributed.
5. At each level of the independent variable, samples of the dependent variables are all homoscedastic

Hypothesis testing: With one regression model:
Test to see if it is statistically significant: (i.e. is the slope significantly different than zero?)

Hypothesis testing: With two regression models:
Compare the slopes of two models.*
Compare the elevations of two models.*
Compare predicted Y values for a given X between two models.
*Use approximately the same range of X when comparing two models.

Additional considerations regarding regression analysis:
1. The range of the independent is important: i.e. you could have a "window" in which the relationship appears linear.

2. Incorporate all the data, not just the means (or medians).  This increases the sample size and ensures that the raw data display a relationship, not just the means.
3. Make sure you know which is the independent variable and which is the dependent.  If you can't establish causation, then you will have to report correlation instead of a regression model.
4. Just because a model is statistically significant does not mean that it is the best model (i.e. it might not be a linear relationship).
5. Don't force a linear model on a data set.  The model could be significant, but the relationship not linear.  Nonsignificance does not mean that there is no relationship, just not a linear one.

Cook's D statistical assessment:
The presence of certain outliers can dramatically influence the development of a regression model.  If you suspect this is the case, Cook's D statistical assessment can identify these observations so that you may attempt your analysis without their influence.

Nonlinear regression (a topic unto itself)
Some relationships are not truly linear.  In these cases, transforming the data may allow you to generate a linear model.  If one of these appears best, you can manipulate the equation in such a way that the model is expressed in terms of the untransformed data.

Possible nonlinear associations include:
semi-logarithmic- either log X or log Y.
double logarithmic- both log X and log Y.
polynomial- X2, X3, X-2, X-2/3, etc.

Multiple regression:
Often more than one independent variable contributes to the value of a dependent variable.

Example scenario: Consider the independent variables that contribute to almost anything: the color of an apple, the national debt, your score on the GRE, etc.!!!
Approach multiple regression with a practical mind.  The ultimate goal of multiple regression is to generate the most simple, compact model that will accurately describe and/or predict the value of the dependent variable of interest.  This means choosing the independent variables that contribute the most to the value of the dependent variable.  There are many tests that will make these evaluations for you.

General equation for a linear multivariate relationship:
<dependent>=intrcpt+beta1*<predictor>+ beta2*<predictor>+ … etc.

ANCOVA = ANalysis of COVAriance

Description:  A combination of ANOVA and regression.  Regression adjusts the means of several groups so that ANOVA may analyze them.

Example scenario:  You want to find the effect -if any- of a neurotrophin on the number of neurons in rat brains.  However, you know that this changes with age anyway.  If this relationship is linear (or the data can be transformed to be linear) over time, you can statistically neglect this influence and study rats of a variety of ages.

Variables:
Independent:Neurotrophin
Independent: Age
Dependent: Number of neurons