Definition. The Pvalue is the probability of observing a test statistic (i.e., a summary of the data) that is as extreme or more extreme than currently observed test statistic under a statistical model that assumes, among other things, that the hypothesis being tested is true. This can be expressed as Pr(dataH_{0}), where "Pr" is read "the probability of" and "" is read as "given" or "conditional upon." The statistic should not be interpreted as the probability of H_{0} being true.
Interpretation: two competing frameworks. Pvalues can be used in multiple ways. This has caused a great deal of confusion, because there are two competing and sometimes contradictory philosophical frameworks used to derive the Pvalue. The first framework was formally developed and popularied by R. A. Fisher (Fisher, 1925). Fisher's framework is called significance testing. The second framework was developed by Jerzy Neyman and Egon Pearson (Neyman & Pearson, 1928, 1933). Neyman & Pearson's framework is called hypothesis testing. When we interpret the Pvalue borrowing some concepts from Fisher's framework and some from Neyman & Pearsons framework, incoherent interpretations may result. It is therefore important to understand the objectives and basis of each framework.
Fisher's significance testing. Pvalues are to be used flexibly in this framework, with the Pvalue interpreted as "a rational and welldefined measure of reluctance to accept the hypotheses they test" (Fisher, 1973, page 47). Although many have mistakely suggested a single threshold for determining "statistical significance" (myself included, mea culpa!), Fisher noted "no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas" (Fisher 1973). Nonetheless, the smaller the Pvalues, the stronger the evidence against the null hypothesis. Fisher intended the Pvalue to be combined with other sources of information from within and outside the study, often based on background knowledge. Thus, the researcher is not to place sole reliance on the Pvalue as a means of reaching a conclusion. Note that there is no alternative hypothesis in Fisher's significance testing framework, and that failure to reject the null hypothesis provides no evidence for its support.
NeymanPearson (NP) hypothesis testing. The NP hypothesis testing procedures is suited for decisionmaking, and less so for scientific inference. In this framework, we set acceptable rates for type I errors (false rejection of null hypotheses) and type II errors (false retention of null hypotheses) before the experiment is begun (i.e., preexperimentally). The acceptable type I error rate is referred to as "alpha." The acceptable type II error rate is referred to as "beta." Preexperimental error rates are not based on the data from the study. After the experiment is completed, we may calculate a Pvalue from the data and compare it to the preexperimental a level. If p < a, the null hypothesis is rejected. Note that in NP hypothesis testing, the conclusion of the test is not intended to verify or falsify the specific hypotheses tested. Instead, it provides "rules for behavior" that are intended to limit the number of type I and type II errors in a long run of similar experiments. The NP hypothesis testing procedure is has been criticized as being nonscientific being incapable of interpreting the results from a single scientific study.

Fisher Significance Testing^{} 
NeymanPearson Hypothesis Testing^{}

Logical basis 
Inducive
reasoning 
Rules of behavior based on a quasideductive model. 
Hypotheses tested 
The null hypothesis.[1]
There is no alternative hypothesis in this system. 
Null and alternative hypotheses tailored to the situation. 
Objective 
The Pvalue is used as an
informal measure of evidence to reflect upon the credibility of the null hypothesis.

Alpha and beta levels are provided preexperimentally to limit the
number of type I and type II errors in the long run.[2] 
p = .04 vs. p = .06 
These results provide approximately the same level of evidence
against the null hypothesis. 
Assuming a preexperiment alpha level set to .05, p = .04 provides a significant finding
while p = .06 provides a nonsignificant
finding. 
p = .04 vs. p = .001 
p = .001 provides much stronger
evidence against the hypothesis than does p
= .04. 
Assuming
a preexperiment alpha level set to .05, both studies provide
significant evidence to reject the null hypothesis, and the
two p values are treated equally. 
Conclusion 
The conclusions of the experiment should not be based on the Pvalue alone.[3]

Decisions should adhere to rejection and acceptance regions based on
alpha and beta set up before the study. 
�
In Fisher�s significance testing framework, the Pvalue is an inductive measure that assigns a number as a measure of the
credibility to the hypothesis being tested.
�
The Pvalue
is not a direct measure of inductive
statistical evidence. Inductive statistical evidence is defined as the
relative inductive support given to two hypotheses by the data.^{7}
Fisher�s Pvalue addresses only one
hypothesis: the null hypothesis [4]
�
The alpha level in the NP framework is akin but
not identical to the Pvalue. Both
the alpha level and the Pvalue are
based on unobserved data in the tail region of the probability model defined by
the null hypothesis. However, the Pvalue
is postexperimental, while the alpha level is preexperimental. It is a mistake
to view the postexperimental p value
as the smallest level of alpha at which the experimenter would reject the null
hypothesis (Goodman 1993, Greenland 1991).
�
Significance tests and hypothesis tests are both
forms of frequentist inference. Other forms of statistical inference include
Bayesian methods[5]
and standardized likelihoods.
[1] The
null hypothesis is the hypothesis to be nullified and is not necessarily
restricted to a statement of �no association.�
[2] A
postexperiment Pvalue can be slotted
into the hypothesis testing procedure by comparing it to the preexperimental
alpha level.
[3]
Fisher intended the Pvalue to be
used informally, as a flexible inductive measure with inferences depending on
background knowledge about the phenomenon under investigation.
[4]
Goodman 1993 cites the book Probability
and the weighing of evidence by I Good (New York: Charles Griffin & Co,
1950).
[5] Bayesian statistics were called inverse probabilities until the middle
of the twentieth century (Feinberg, 2006).