Weekend batch
Avijeet is a Senior Research Analyst at Simplilearn. Passionate about Data Analytics, Machine Learning, and Deep Learning, Avijeet is also interested in politics, cricket, and football.
Free eBook: Top Programming Languages For A Data Scientist
Normality Test in Minitab: Minitab with Statistics
Machine Learning Career Guide: A Playbook to Becoming a Machine Learning Engineer
In this lesson, we'll learn how to apply a method for developing a hypothesis test for situations in which both the null and alternative hypotheses are composite. That's not completely accurate. The method, called the likelihood ratio test , can be used even when the hypotheses are simple, but it is most commonly used when the alternative hypothesis is composite. Throughout the lesson, we'll continue to assume that we know the the functional form of the probability density (or mass) function, but we don't know the value of one (or more) of its parameters. That is, we might know that the data come from a normal distrbution, but we don't know the mean or variance of the distribution, and hence the interest in performing a hypothesis test about the unknown parameter(s).
The title of this page is a little risky, as there are few simple examples when it comes to likelihood ratio testing! But, we'll work to make the example as simple as possible, namely by assuming again, unrealistically, that we know the population variance, but not the population mean. Before we state the definition of a likelihood ratio test, and then investigate our simple, but unrealistic, example, we first need to define some notation that we'll use throughout the lesson.
We'll assume that the probability density (or mass) function of X is \(f(x;\theta)\) where \(\theta\) represents one or more unknown parameters. Then:
Let's make sure we are clear about that phrase "where \(\omega '\) is the complement of \(\omega\) with respect to the parameter space \(\Omega\)."
If the total parameter space of the mean \(\mu\) is \(\Omega = {\mu: −∞ < \mu < ∞}\) and the null hypothesis is specified as \(H_0: \mu = 3\), how should we specify the alternative hypothesis so that the alternative parameter space is the complement of the null parameter space?
If the null parameter space is \(\Omega = {\mu: \mu = 3}\), then the alternative parameter space is everything that is in \(\Omega = {\mu: −∞ < \mu < ∞}\) that is not in \(\Omega\). That is, the alternative parameter space is \(\Omega ' = {\mu: \mu ≠ 3}\). And, so the alternative hypothesis is:
\(H_A : \mu \ne 3\)
In this case, we'd be interested in deriving a two-tailed test.
If the alternative hypothesis is \(H_A: \mu > 3\), how should we (technically) specify the null hypothesis so that the null parameter space is the complement of the alternative parameter space?
If the alternative parameter space is (\omega ' = {\mu: \mu > 3}\), then the null parameter space is \(\omega = {\mu: \mu ≤ 3}\). And, so the null hypothesis is:
\(H_0 : \mu \le 3\)
Now, the reality is that some authors do specify the null hypothesis as such, even when they mean \(H_0: \mu = 3\). Ours don't, and so we won't. (That's why I put that "technically" in parentheses up above.) At any rate, in this case, we'd be interested in deriving a one-tailed test.
Definition. Let:
\(L(\hat{\omega})\) denote the maximum of the likelihood function with respect to \(\theta\) when \(\theta\) is in the null parameter space \(\omega\).
\(L(\hat{\Omega})\) denote the maximum of the likelihood function with respect to \(\theta\) when \(\theta\) is in the entire parameter space \(\Omega\).
Then, the likelihood ratio is the quotient:
\(\lambda = \dfrac{L(\hat{\omega})}{L(\hat{\Omega})}\)
And, to test the null hypothesis \(H_0 : \theta \in \omega\) against the alternative hypothesis \(H_A : \theta \in \omega'\), the critical region for the likelihood ratio test is the set of sample points for which:
\(\lambda = \dfrac{L(\hat{\omega})}{L(\hat{\Omega})} \le k\)
where \(0 < k < 1\), and k is selected so that the test has a desired significance level \(\alpha\).
A food processing company packages honey in small glass jars. Each jar is supposed to contain 10 fluid ounces of the sweet and gooey good stuff. Previous experience suggests that the volume X , the volume in fluid ounces of a randomly selected jar of the company's honey is normally distributed with a known variance of 2. Derive the likelihood ratio test for testing, at a significance level of \(\alpha = 0.05\), the null hypothesis \(H_0: \mu = 10\) against the alternative hypothesis H_A: \mu ≠ 10\).
Because we are interested in testing the null hypothesis \(H_0: \mu = 10\) against the alternative hypothesis \(H_A: \mu ≠ 10\) for a normal mean, our total parameter space is:
\(\Omega =\left \{\mu : -\infty < \mu < \infty \right \}\)
and our null parameter space is:
\(\omega = \left \{10\right \}\)
Now, to find the likelihood ratio, as defined above, we first need to find \(L(\hat{\omega})\). Well, when the null hypothesis \(H_0: \mu = 10\) is true, the mean \(\mu\) can take on only one value, namely, \(\mu = 10\). Therefore:
\(L(\hat{\omega}) = L(10)\)
We also need to find \(L(\hat{\Omega})\) in order to define the likelihood ratio. To find it, we must find the value of \(\mu\) that maximizes \(L(\mu)\) . Well, we did that back when we studied maximum likelihood as a method of estimation. We showed that \(\hat{\mu} = \bar{x}\) is the maximum likelihood estimate of \(\mu\) . Therefore:
\(L(\hat{\Omega}) = L(\bar{x})\)
Now, putting it all together to form the likelihood ratio, we get:
which simplifies to:
Now, let's step aside for a minute and focus just on the summation in the numerator. If we "add 0" in a special way to the quantity in parentheses:
we can show that the summation can be written as:
\(\sum_{i=1}^{n}(x_i - 10)^2 = \sum_{i=1}^{n}(x_i - \bar{x})^2 + n(\bar{x} -10)^2 \)
Therefore, the likelihood ratio becomes:
which greatly simplifies to:
\(\lambda = exp \left [-\dfrac{n}{4}(\bar{x}-10)^2 \right ]\)
Now, the likelihood ratio test tells us to reject the null hypothesis when the likelihood ratio \(\lambda\) is small, that is, when:
\(\lambda = exp\left[-\dfrac{n}{4}(\bar{x}-10)^2 \right] \le k\)
where k is chosen to ensure that, in this case, \(\alpha = 0.05\). Well, by taking the natural log of both sides of the inequality, we can show that \(\lambda ≤ k\) is equivalent to:
\( -\dfrac{n}{4}(\bar{x}-10)^2 \le \text{ln} k \)
which, by multiplying through by −4/ n , is equivalent to:
\((\bar{x}-10)^2 \ge -\dfrac{4}{n} \text{ln} k \)
which is equivalent to:
\(\dfrac{|\bar{X}-10|}{\sigma / \sqrt{n}} \ge \dfrac{\sqrt{-(4/n)\text{ln} k}}{\sigma / \sqrt{n}} =k* \)
Aha! We should recognize that quantity on the left-side of the inequality! We know that:
\(Z = \dfrac{\bar{X}-10}{\sigma / \sqrt{n}} \)
follows a standard normal distribution when \(H_0: \mu = 10\). Therefore we can determine the appropriate \(k^*\) by using the standard normal table. We have shown that the likelihood ratio test tells us to reject the null hypothesis \(H_0: \mu = 10\) in favor of the alternative hypothesis \(H_A: \mu ≠ 10\) for all sample means for which the following holds:
\(\dfrac{|\bar{X}-10|}{ \sqrt{2} / \sqrt{n}} \ge z_{0.025} = 1.96 \)
Doing so will ensure that our probability of committing a Type I error is set to \(\alpha = 0.05\), as desired.
Well, geez, now why would we be revisiting the t -test for a mean \(\mu\) when we have already studied it back in the hypothesis testing section? Well, the answer, it turns out, is that, as we'll soon see, the t -test for a mean \(\mu\) is the likelihood ratio test! Let's take a look!
Suppose that a random sample \(X_1 , X_2 , \dots , X_n\) arises from a normal population with unknown mean \(\mu\) and unknown variance \(\sigma^2\). (Yes, back to the realistic situation, in which we don't know the population variance either.) Find the size \(\alpha\) likelihood ratio test for testing the null hypothesis \(H_0: \mu = \mu_0\) against the two-sided alternative hypothesis \(H_A: \mu ≠ \mu_0\) .
Our unrestricted parameter space is:
\( \Omega = \left\{ (\mu, \sigma^2) : -\infty < \mu < \infty, 0 < \sigma^2 < \infty \right\} \)
Under the null hypothesis, the mean \(\mu\) is the only parameter that is restricted. Therefore, our parameter space under the null hypothesis is:
\( \omega = \left\{(\mu, \sigma^2) : \mu =\mu_0, 0 < \sigma^2 < \infty \right\}\)
Now, first consider the case where the mean and variance are unrestricted. We showed back when we studied maximum likelihood estimation that the maximum likelihood estimates of \(\mu\) and \(\sigma^2\) are, respectively:
\(\hat{\mu} = \bar{x} \text{ and } \hat{\sigma}^2 = \dfrac{1}{n}\sum_{i=1}^{n}(x_i - \bar{x})^2 \)
Therefore, the maximum of the likelihood function for the unrestricted parameter space is:
\( L(\hat{\Omega})= \left[\dfrac{ne^{-1}}{2\pi \Sigma (x_i - \bar{x})^2} \right]^{n/2} \)
Now, under the null parameter space, the maximum likelihood estimates of \(\mu\) and \(\sigma^2\) are, respectively:
\( \hat{\mu} = \mu_0 \text{ and } \hat{\sigma}^2 = \dfrac{1}{n}\sum_{i=1}^{n}(x_i - \mu_0)^2 \)
Therefore, the likelihood under the null hypothesis is:
\( L(\hat{\omega})= \left[\dfrac{ne^{-1}}{2\pi \Sigma (x_i - \mu_0)^2} \right]^{n/2} \)
And now taking the ratio of the two likelihoods, we get:
which reduces to:
\( \lambda = \left[ \dfrac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{\sum_{i=1}^{n}(x_i - \mu_0)^2} \right] ^{n/2}\)
Focusing only on the denominator for a minute, let's do that trick again of "adding 0" in just the right away. Adding 0 to the quantity in the parentheses, we get:
\( \sum_{i=1}^{n}(x_i - \mu_0)^2 = \sum_{i=1}^{n}(x_i - \bar{x})^2 +n(\bar{x} - \mu_0)^2 \)
Then, our likelihood ratio \(\lambda\) becomes:
\( \lambda = \left[ \dfrac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{\sum_{i=1}^{n}(x_i - \mu_0)^2} \right] ^{n/2} = \left[ \dfrac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{ \sum_{i=1}^{n}(x_i - \bar{x})^2 +n(\bar{x} - \mu_0)^2} \right] ^{n/2} \)
which, upon dividing through numerator and denominator by \( \sum_{i=1}^{n}(x_i - \bar{x})^2 \) simplifies to:
Therefore, the likelihood ratio test's critical region, which is given by the inequality \(\lambda ≤ k\), is equivalent to:
which with some minor algebraic manipulation can be shown to be equivalent to:
So, in a nutshell, we've shown that the likelihood ratio test tells us that for this situation we should reject the null hypothesis \(H_0: \mu= \mu_0\) in favor of the alternative hypothesis \(H_A: \mu ≠ \mu_0\) if:
\( \dfrac{(\bar{x}-\mu_0)^2 }{s^2 / n} \ge k^{*} \)
Well, okay, so I started out this page claiming that the t -test for a mean \(\mu\) is the likelihood ratio test. Is it? Well, the above critical region is equivalent to rejecting the null hypothesis if:
\( \dfrac{|\bar{x}-\mu_0| }{s / \sqrt{n}} \ge k^{**} \)
Does that look familiar? We previously learned that if \(X_1, X_2, \dots, X_n\) are normally distributed with mean \(\mu\) and variance \(\sigma^2\), then:
\( T = \dfrac{\bar{X}-\mu}{S / \sqrt{n}} \)
follows a T distribution with n − 1 degrees of freedom. So, this tells us that we should use the T distribution to choose \(k^{**}\) . That is, set:
\(k^{**} = t_{\alpha /2, n-1}\)
and we have our size \(\alpha\) t -test that ensures the probability of committing a Type I error is \(\alpha\).
It turns out... we didn't know it at the time... but every hypothesis test that we derived in the hypothesis testing section is a likelihood ratio test. Back then, we derived each test using distributional results of the relevant statistic(s), but we could have alternatively, and perhaps just as easily, derived the tests using the likelihood ratio testing method.
Composite Hypothesis:
A statistical hypothesis which does not completely specify the distribution of a random variable is referred to as a composite hypothesis.
Browse Other Glossary Entries
Planning on taking an introductory statistics course, but not sure if you need to start at the beginning? Review the course description for each of our introductory statistics courses and estimate which best matches your level, then take the self test for that course. If you get all or almost all the questions correct, move on and take the next test.
Considering becoming adata scientist, customer analyst or our data science certificate program?
Advanced Statistics Quiz
Looking at statistics for graduate programs or to enhance your foundational knowledge?
Regression Quiz
Entering the biostatistics field? Test your skill here.
Read up on our latest blogs
Learn about our certificate programs
Find the right course for you
Our mentors and academic advisors are standing by to help guide you towards the courses or program that makes the most sense for you and your goals.
300 W Main St STE 301, Charlottesville, VA 22903
(434) 973-7673
By submitting your information, you agree to receive email communications from Statistics.com. All information submitted is subject to our privacy policy . You may opt out of receiving communications at any time.
A h ypothesis is an educated guess about how something works. In the scientific method, a hypothesis is an idea that can be tested. If the hypothesis is correct, then the experiment will support the hypothesis. If the hypothesis is incorrect, the experiment will not support the hypothesis.
A hypothesis is simple if it specifies the population completely, i.e., it specifies the population distribution uniquely, while a composite hypothesis leads to two or more possibilities.
Before diving further into their differences, let’s first define a few terms that are handy in understanding the concept of a hypothesis.
Let’s dive in;
A hypothesis is a proposed explanation for a phenomenon. A scientific theory is a well-substantiated explanation for an aspect of the natural world supported by a vast body of evidence. Theories are generally much broader in scope than hypotheses and are often not as specific.
The objective of statistics is to make inferences about a population based on information contained in the sample.
There are two major areas of statistical inference, namely;
We will develop general methods for testing hypotheses and then apply them to common problems.
A statistical hypothesis is a testable statement about a population parameter. The statement is based on an assumption about the population parameter. This assumption is usually made about the population parameters based on past research or experience. The statistical hypothesis is used to make predictions about future events. These predictions are based on the assumption that the population parameters will remain the same.
A statistical hypothesis is about a population parameter, usually denoted by some symbol, such as μ or θ.
Statistical hypothesis testing is a method of statistical inference. There are two types of statistical hypothesis tests:
The alternative hypothesis is the hypothesis that is being tested against the null hypothesis. The alternative hypothesis could be that μ>0 or μ<0.
A statistical hypothesis test determines whether or not to reject the null hypothesis. The null hypothesis is rejected if the test statistic is greater than or less than the critical value.
A hypothesis is a statement or claims about how two variables are related. Hypothesis testing is a statistical procedure used to assess whether the null hypothesis—a statement that there is no difference between two groups or no association between two variables—can be rejected based on sample data. There are four steps in hypothesis testing:
The first step is to state the null and alternative hypotheses. The null hypothesis is that the two variables have no difference or association. The alternative hypothesis is the statement that there is a difference or an association between two variables.
The second step is to select a significance level. The significance level is the probability of rejecting the null hypothesis when it is true. The most common significance levels are 0.05 and 0.01.
The third step is to calculate the test statistic. The test statistic measures the difference between the null and alternative hypotheses. There are many different test statistics, and the choice of test statistic depends on the data type and hypothesis test.
The fourth and final step is to interpret the results. The results of a hypothesis test are either significant or not significant. A significant result means that the null hypothesis can be rejected. A non-significant result means that the null hypothesis cannot be rejected.
In statistics, a null hypothesis is a statement one seeks to disprove, reject or nullify. Most commonly, it is a statement that the phenomenon being studied produces no effect or makes no difference. For example, if one were testing the efficacy of a new drug, the null hypothesis would be that the drug does not affect the treated condition.
The null hypothesis is usually denoted H0, and the alternate hypothesis is denoted H1. If the null hypothesis is rejected in favor of the alternative hypothesis, it is said to be “statistically significant.” The null hypothesis is often assumed to be true until it can be proved otherwise.
Many different types of tests can be used to test a null hypothesis. The most common is the Student’s t-test, which compares the means of two groups. If the t-test is significant, there is a statistically significant difference between the two groups.
Other tests that can be used to test the null hypothesis include the chi-square, Fisher’s exact, and Wilcoxon rank-sum tests.
The alternative hypothesis is the hypothesis that is being tested in a statistical test. This is the hypothesis that is the opposite of the null hypothesis. We are trying to find evidence for the alternative hypothesis in a test.
Simple hypothesis.
Hypotheses can be composite or simple, and both are useful depending on the research question and the available evidence.
A simple hypothesis is a straightforward statement that proposes a relationship between two variables. It is a clear, concise statement that is easy to test and evaluate. A simple hypothesis is often used in experimental research where the researcher wants to test the effect of one variable on another.
Examples of hypothesis :
An example of a simple hypothesis is “students who study more will get better grades.” This hypothesis proposes a direct relationship between the amount of time a student spends studying and their academic performance. This hypothesis is testable by comparing the grades of students who study more with those who study less.
Another example of a simple hypothesis is “increased exposure to sunlight will result in higher vitamin D levels.” This hypothesis proposes a direct relationship between sunlight exposure and vitamin D levels. This hypothesis is testable by measuring the vitamin D levels of individuals with varying levels of sunlight exposure.
Simple hypotheses are advantageous because they are easy to test and evaluate. They also allow researchers to focus on a specific research question and avoid unnecessary complexity. Simple hypotheses are particularly useful in experimental research where researchers manipulate one variable to observe its effect on another.
However, simple hypotheses also have limitations. They may oversimplify complex phenomena, and their results may not generalize to a larger population. The available evidence may also limit simple hypotheses, and additional research may be necessary to understand the relationship between variables fully.
In essence, a simple hypothesis is a straightforward statement that proposes a relationship between two variables. Simple hypotheses are useful in experimental research and allow researchers to focus on a specific research question. However, simple hypotheses also have limitations and should be evaluated in the context of the available evidence and research question.
A composite hypothesis, on the other hand, proposes multiple relationships between two or more variables. For example, a composite hypothesis might state that “there is a significant difference between the average heights of men and women, and there is also a significant difference between the average heights of people from different continents.”
Composite hypothesis testing is a statistical technique used to determine the probability of an event or phenomenon based on observed data. This technique is often used in scientific research, quality control, and decision-making processes where the outcome of a particular experiment or test is uncertain.
A composite hypothesis is an alternative hypothesis encompassing a range of possible outcomes. It is defined as a hypothesis with more than one parameter value. For example, if we are testing the hypothesis that the mean of a population is greater than a certain value, we could define the composite hypothesis as follows:
H1: μ > μ0, where μ is the population means, and μ0 is the hypothesized value of the mean.
The composite hypothesis, in this case, includes all values of μ greater than μ0. This means we are not specifying a specific value of μ, but rather a range of possible values.
Composite hypothesis testing involves evaluating the probability of observing a particular result under the null hypothesis and then comparing it to the probability of observing the same result under the composite hypothesis. The result is considered significant if the probability of observing it under the composite hypothesis is sufficiently low.
We use statistical tests such as the t-test, F-test, or chi-square test to test a composite hypothesis. Given the null hypothesis and the observed data, these tests allow us to calculate the probability of observing a particular result.
In conclusion, composite hypothesis testing is a valuable statistical technique used to determine the probability of an event or phenomenon based on observed data. It allows us to test hypotheses that encompass a range of possible outcomes and is an essential tool for scientific research, quality control , and decision-making processes.
Understanding composite hypothesis testing is essential for anyone working in these fields and can help ensure that decisions are made based on solid statistical evidence.
Previous post.
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .
Associated data.
To provide guidelines for identifying composite hypotheses and addressing the probability of false rejection for multiple hypotheses.
Examples from the literature in health services research are used to motivate the discussion of composite hypothesis tests and multiple hypotheses.
This article is a didactic presentation.
It is not rare to find mistaken inferences in health services research because of inattention to appropriate hypothesis generation and multiple hypotheses testing. Guidelines are presented to help researchers identify composite hypotheses and set significance levels to account for multiple tests.
It is important for the quality of scholarship that inferences are valid: properly identifying composite hypotheses and accounting for multiple tests provides some assurance in this regard.
Recent issues of Health Services Research ( HSR ), the Journal of Health Economics , and Medical Care each contain articles that lack attention to the requirements of multiple hypotheses. The problems with multiple hypotheses are well known and often addressed in textbooks on research methods under the topics of joint tests (e.g., Greene 2003 ; Kennedy 2003 ) and significance level adjustment (e.g., Kleinbaum et al. 1998 ; Rothman and Greenland 1998 ; Portney and Watkins 2000 ; Myers and Well 2003 ; Stock and Watson 2003 ); yet, a look at applied journals in health services research quickly reveals that attention to the issue is not universal.
This paper has two goals: to remind researchers of issues regarding multiple hypotheses and to provide a few helpful guidelines. I first discuss when to combine hypotheses into a composite for a joint test; I then discuss the adjustment of test criterion for sets of hypotheses. Although often treated in statistics as two solutions to the same problem ( Johnson and Wichern 1992 ), here I treat them as separate tasks with distinct motivations.
In this paper I focus on Neyman–Pearson testing using Fisher's p -value as the interpretational quantity. Classically, a test compares an observed value of a statistic with a specified region of the statistic's range; if the value falls in the region, the data are considered not likely to have been generated given the hypothesis is true, and the hypothesis is rejected. However, it is common practice to instead compare a p -value to a significance level, rejecting the hypothesis if the p -value is smaller than the significance level. Because most tests are based on tail areas of distributions, this is a distinction without a difference for the purpose of this paper, and so I will use the p -value and significance-level terms in this discussion.
Of greater import is the requirement that hypotheses are stated a priori. A test is based on the prior assertion that if a given hypothesis is true, the data generating process will produce a value of the selected statistic that falls into the rejection region with probability equal to the corresponding significance level, which typically corresponds to a p -value smaller than the significance level. Setting hypotheses a priori is important in order to avoid a combinatorial explosion of error. For example, in a multiple regression model the a posteriori interpretation of regression coefficients in the absence of prior hypotheses does not account for the fact that the pattern of coefficients may be generated by chance. The important distinction is between the a priori hypothesis “the coefficient estimates for these particular variables in the data will be significant” and the a posteriori observation that “the coefficient estimates for these particular variables are significant.” In the first case, even if variables other than those identified in the hypothesis do not have statistically significant coefficients, the hypothesis is rejected nonetheless. In the second case, the observation applies to any set of variables that happen to have “statistically significant” coefficients. Hence, it is the probability that any set of variables have resultant “significant” statistics that drives the a posteriori case. As the investigator will interpret any number of significant coefficients that happen to result, the probability of significant results, given that no relationships actually exist, is the probability of getting any pattern of significance across the set of explanatory variables. This is different from a specific a priori case in which the pattern is preestablished by the explicit hypotheses. See the literatures on False Discovery Rate (e.g., Benjamini and Hochberg 1995 ; Benjamini and Liu 1999 ; Yekutieli and Benjamini 1999 ; Kwong, Holland, and Cheung 2002 ; Sarkar 2004 ; Ghosh, Chen, and Raghunathan 2005 ) and Empirical Bayes ( Efron et al. 2001 ; Cox and Wong 2004 ) for methods appropriate for a posteriori investigation.
What is achieved by testing an individual a priori hypothesis in the presence of multiple hypotheses? The answer to this question provides guidance for determining when a composite hypothesis (i.e., a composition of single hypotheses) is warranted. The significance level used for an individual test is the marginal probability of falsely rejecting the hypothesis: the probability of falsely rejecting the hypothesis regardless of whether the remaining hypotheses are rejected (see online-only appendix for details). The implied indifference to the status of the remaining hypotheses, however, is indefensible if the conclusions require a specific result from other hypotheses. This point underlies a guiding principle:
Guideline 1 : A joint test of a composite hypothesis ought to be used if an inference or conclusion requires multiple hypotheses to be simultaneously true .
The guideline is motivated by the logic of the inference or conclusion and is independent of significance levels. Examples from the literature can prove helpful in understanding the application of this guideline. Because it is unnecessarily pejorative to reference specific studies, the following discussion will only identify the nature of the problem in selected articles but not the articles themselves (the editors of HSR and the reviewers of this paper were provided the explicit references), but the general character of the examples ought to be familiar to most researchers.
Two recent articles in the Journal of Health Economics each regressed a dependent variable on, among other variables, a second order polynomial—a practice used to capture nonlinear relationships. The null hypothesis for each coefficient of the polynomial was rejected according to its individual t -statistic. It was concluded that the explanatory variable had a parabolic relationship with the dependent variable, suggesting the authors rejected the hypotheses that both coefficients were simultaneously zero: the joint hypothesis regarding both coefficients is the relevant one. This is different from a researcher testing second-order nonlinearity (as opposed to testing the parabolic shape); in this case an individual test of the coefficient on the second-order term (i.e., the coefficient on the squared variable) is appropriate because the value of the first order term is not influential in this judgment of nonlinearity.
A recent article in Medical Care categorized a count variable into three size-groups and used a corresponding set of dummy variables to represent the two largest (the smallest group being the reference category); based on the individual significance of the two dummy variables they rejected the hypothesis that both coefficients were zero and concluded that the dependent variables was related to being larger on the underlying concept. In this conclusion, they collapsed two categories into a single statement about being larger on the underlying variable. Yet, if the authors meant that both categories are larger than the reference group, then it is a test of both coefficients being simultaneously zero that is relevant. A similar example using dummy variables is if we have an a priori hypothesis that the utilization of emergency services is not greater for blacks than whites, and another a priori hypothesis stating that utilization is not greater for Native Americans than whites. We may be justified in testing each coefficient if our interest in each minority group is independent of the other. However, a claim that “blacks and Native Americans both do not differ from whites in their utilization” makes sense only if both coefficients are simultaneously zero. Again, a joint test is indicated.
Recent articles in HSR and the Journal of Health Economics , developed and tested a priori hypotheses regarding individual model parameters. So far, so good; but it was then inferred that the expected value of the dependent variable would differ between groups defined by different profiles of the explanatory variables. Here again the conclusion requires rejecting that the coefficients are simultaneously zero. For example, suppose we reject the hypothesis that age does not differentiate health care utilization and we reject the hypothesis that wealth does not differentiate health care utilization. These individual hypothesis tests do not warrant claims regarding wealthy elderly, poor youth, or other combinations. The coefficients for the age and wealth variables must both be nonzero, if such claims are to be made.
Recent articles in Medical Care , HSR , and the Journal of Health Economics each included analyses in which the same set independent variables were regressed on a number of dependent variables. Individual independent variables were considered regarding their influence across the various dependent variables. If an independent variable is considered to be simultaneously related to a number of dependent variables, then a joint test of a composite hypothesis is warranted. For example, suppose I wish to test a proposition that after controlling for age, health care utilization does not differ by sex. Suppose I use a two-part model (one part models the probability of any utilization, the other part models positive utilization given some utilization). 1 In this case I have two dependent variables (an indicator of any utilization and another variable measuring how much utilization gives positive utilization). If my proposition is correct then the coefficients on sex across both models should be simultaneously zero: a joint test is appropriate. If instead I test the two sex coefficients separately, I will implicitly be testing the hypotheses that (1) sex does not differentiate any utilization whether or not it differentiates positive utilization and (2) sex does not differentiate positive utilization whether or not it differentiates any utilization , which statistically does not address the original proposition. One might suppose if the dependent variables were conditionally independent from each other the joint test would provide similar results as the two individual hypotheses, not so. The type 1 error rate when using the individual tests is too large, unless the corresponding significance levels are divided by the number of hypotheses (see the section on adjusting for multiple tests below), in which case this type of adjustment is sufficient for independent tests.
Alternatively, suppose I wish to consider the effects of using nurse educators regarding diabetes care on health outcomes (e.g., A1c levels) and on patients' satisfaction with their health care organization, but my interest in these effects are independent of each other. In this case I am interested in two separate hypotheses, say for example (1) there is no effect on health outcomes regardless of the effect on satisfaction and (2) there is no effect on satisfaction regardless of any effect on outcomes. So long as I do not interpret these as a test that both effects are simultaneously zero, I can legitimately consider each hypothesis separately. But if each individual test does not reject the null, I should not infer that both effects are zero in the population (even with appropriate power) as this would require a joint test.
The preceding examples are in terms of individual model parameters. Guideline 1, however, applies to any set of hypotheses regardless of their complexity. In general, if a researcher desires to test a theory with multiple implications that must simultaneously hold for the theory to survive the test, then the failure of a single implication (as an independent hypothesis) defeats the theory. A joint hypothesis test is indicated.
The following guideline presents another heuristic to distinguish the need for joint versus separate tests.
Guideline 2 : If a conclusion would follow from a single hypothesis fully developed, tested, and reported in isolation from other hypotheses, then a single hypothesis test is warranted .
Guideline 2 asks whether a paper written about a given inference or conclusion would be coherent if based solely on the result of a single hypothesis. If so, then a single hypothesis test is warranted; if not, then consideration should be given to the possibility of a composite hypothesis. One could not support a claim that wealthy elderly use more services than poor youth, based solely on the hypothesis relating wealth and utilization, information regarding age is required.
Unfortunately, joint tests have a limitation that must be kept in mind, particularly when the hypothesis being tested is not the hypothesis of interest (which is often the case with null hypotheses). Rejecting a joint test of a composite hypothesis does not tell us which specific alternative case is warranted. Remember that a joint test of N hypotheses has 2 N −1 possible alternatives (in terms of the patterns of possible true and false hypotheses); for example, a joint test of two hypotheses (say, h 1 and h 2 ) has three possible alternatives ( h 1 true and h 2 false; h 1 false and h 2 true; and both h 1 and h 2 false); a joint test of five hypotheses has 31 possible alternatives. If your interest is in a specific alternative (e.g., all hypotheses are false, which is common and is the case in many of the examples discussed above), the rejection of a joint test does not provide unique support.
To answer the question of why the joint hypothesis was rejected, it can be useful to switch from the testing paradigm to a classic p -value paradigm by inspecting the relative “level of evidence” the data provides regarding each alternative case. Here p -values are used to suggest individual components of the composite hypothesis that are relatively not well supported by the data, providing a starting point for further theory development. In this exercise, the individual p -values of the component hypotheses are compared relative to each other; they are not interpreted in terms of significance. For example, if a two component composite null hypothesis is rejected but the individual p -values are .45 and .15, the “nonsignificance” of the p -values is irrelevant, it is the observation that one of the p -values is greatly smaller than others that provides a hint regarding why the joint hypothesis was rejected. This is theory and model building, not testing; hence, analyzing the joint test by inspecting the marginal p -values associated with its individual components is warranted as a useful heuristic—but admittedly not very satisfactory.
Alternatively, because identifying reasons for failure of a joint hypothesis is an a posteriori exercise, one could apply the methods of False Discovery Rate ( Benjamini and Hochberg 1995 ; Benjamini and Liu 1999 ; Yekutieli and Benjamini 1999 ; Kwong, Holland, and Cheung 2002 ; Sarkar 2004 ; Ghosh, Chen, and Raghunathan 2005 ) or Empirical Bayes Factors ( Efron et al. 2001 ; Cox and Wong 2004 ) to identify reasons for failure of the joint test (i.e., the individual hypotheses that are more likely to be false).
In this section I use the phrase “significance level” to mean the criterion used in a test (commonly termed the “ α level”); I use the phrase “probability of false rejection,” denoted by pfr , to refer to the probability of falsely rejecting one or more hypotheses. A significance level is an operational part of a test (denoting the probability associated with the test's rejection region) whereas a probability of false rejection is a theoretical result of a test or grouping of tests. I use the modifier “acceptable” in conjunction with pfr to mean a probability of false rejection deemed the largest tolerable risk. I use the modifier “implied” in conjunction with pfr to mean the probability of false rejection resulting from the application of a test or group of tests. An acceptable pfr is subjective and set by the researcher, whereas an implied pfr is objective and calculated by the researcher. Suppose I wish to test three hypotheses, and I consider a 0.1 or less probability of falsely rejecting at least one of the hypotheses as acceptable across the tests; the acceptable pfr is 0.1. If I set the significance level for each test to 0.05 (thereby determining to reject a hypothesis if the p -value of its corresponding statistic is less than .05), the probability of false rejection is 1–(1 − 0.05) 3 =0.143; this is the implied pfr of the analysis associated with the hypothesis testing strategy. In this case, my strategy has an implied pfr value (0.143) that exceeds my acceptable pfr value (0.1); by this accounting, my strategy is unacceptable in terms of the risk of falsely rejecting hypotheses.
The preceding section on joint hypothesis tests presents guidance for identifying appropriate individual and composite hypotheses. Once a set of hypotheses is identified for testing, significance levels for each test must be set; or more generally, the rejection regions of the statistic must be selected. This task requires setting acceptable pfr 's; that is determining the acceptable risk of rejecting a true hypothesis.
A pfr can be associated with any group of tests. Typically no more than three levels are considered: individual hypotheses, mutually exclusive families of hypotheses, and the full analysis-wide set of hypotheses. Although common, it is not required to use the same acceptable pfr for each test. Some hypotheses may have stronger prior evidence or different goals than others, warranting different test-specific acceptable pfr 's. A family of hypotheses is a subset of the hypotheses in the analysis. They are grouped as a family explicitly because the researcher wishes to control the probability of false rejection among those particular hypotheses. For example, a researcher may be investigating two specific health outcomes and have a set of hypotheses for each; the hypotheses associated with each outcome may be considered a family, and the researcher may desire that the pfr for each family be constrained to some level. An acceptable analysis-wide pfr reflects the researcher's willingness to argue their study remains useful in the face of criticisms such as “Given your hypotheses are correct, the probability of reporting one or more false rejections is P ” or “Given your hypotheses are correct, the expected number of false rejections is N .”
The usefulness of setting pfr 's depends on one's perspective. From one view, we might contend that the information content of investigating 10 hypotheses should not change depending on whether we pursue a single study comprising all 10 hypotheses or we pursue 10 studies each containing one of the hypotheses; yet if we apply an analysis-wide pfr to the study with 10 hypotheses, we expect to falsely reject fewer hypotheses than we expect if we tested each hypothesis in a separate study. 2 If the hypotheses are independent such that the 10 independent repetitions of the data generating process do not in themselves accrue a benefit, there is merit to this observation and we might suppose that an analysis-wide pfr is no more warranted than a program-wide pfr (i.e., across multiple studies).
Our judgment might change, however, if we take a different view of the problem. Suppose I designed a study comprising 10 hypotheses that has an implied pfr corresponding to an expected false rejection of 9 of the 10 hypotheses. Should I pursue this study of 10 hypotheses for which I expect to falsely reject 90 percent of them if my theory is correct? Moreover, is it likely the study would be funded? I suggest the answers are both no. What if instead we expect 80 percent false rejections, or 40 percent, or 10 percent? The question naturally arises, what implied pfr is sufficient to warrant pursuit of the study? To answer that question is to provide an acceptable pfr . Once an acceptable pfr is established it seems prudent to check whether the design can achieve it, and if it cannot, to make adjustments. Clearly, this motivation applies to any family of hypotheses within a study as well. From this perspective, the use of pfr 's in the design of a single study is warranted.
This is not to suggest that researchers ought to always concern themselves with analysis-wide pfr 's in their most expansive sense; only that such considerations can be warranted. Research is often more complex than the preceding discussion implies. For example, it is common practice to report a table of descriptive statistics and nuisance parameters (e.g., parameters on control variables) as background along side the core hypotheses of a study. A researcher may legitimately decide that controlling the Type 1 error across these statistics is unimportant and focus on an acceptable pfr only for the family of core hypotheses. In this case, however, a more elegant approach is to provide interval estimates for the descriptive statistics and nuisance parameters without the pretense of “testing,” particularly as a priori hypotheses about background descriptive statistics are not often developed, thereby precluding them from the present consideration.
In setting acceptable pfr 's, the researcher should keep in mind that the probability of mistakenly rejecting hypotheses increases with the number of hypothesis tests. For example, if a researcher has settled on 10 tests for their analysis, the probability of mistakenly rejecting one or more of the 10 hypotheses at a significance level of 0.05 is approximately 0.4. Is it acceptable to engage a set of tests when the probability of falsely rejecting one or more of them is 40 percent? The answer is a matter of judgment depending on the level of risk a researcher, reviewer, editor, or reader is willing to take regarding the reported findings.
Being arbitrary, the designation of an acceptable pfr is not likely to garner universal support. Authors in some research fields, with the goal of minimizing false reports in the literature, recommend adjusting the significance levels for tests to restrict the overall analysis-wide error rate to 0.05 ( Maxwell and Delaney 2000 ). However, when there are numerous tests, this rule can dramatically shrink the tests' significance levels and either require a considerably larger sample or severely diminish power. A recent article in Health Services Research reported results of 74 tests using a significance level of 0.05: there is a 98 percent chance of one or more false rejections across the analysis, a 10 percent chance of six or more, and a 5 percent chance of between seven and eight or more. The expected number of false rejections in the analysis is approximately four. The analysis-wide pfr can be restricted to less than 0.05 as Maxwell and Delaney (2000) suggest by setting the significance levels to 0.00068 (the process of adjustment is explained below). This recommended significance level is two orders of magnitude smaller than 0.05. If an analysis-wide pfr of 0.98 (associated with the significance levels of 0.05) is deemed unacceptably high and an analysis-wide pfr of 0.05 (associated with the significance levels of 0.00068) is deemed too strict, the researchers may settle on a reasoned intermediate value. For example, to obtain a just-better-than-even odds against a false rejection across the full analysis of 74 tests (e.g., setting the pfr to 0.499), the significance levels would have been adjusted to 0.0093. Alternatively, the researchers might desire to control the expected number of false rejections across the full analysis, which can be calculated as the sum of the individual significance levels. For example, setting significance levels to 0.01351 provides an expectation of one false rejection among the 74 tests rather than the expected four associated with the original 0.05 significance levels. The adjusted significance levels in this example are less than the original significance level of 0.05, and they vary in their magnitude (and therefore power to discern effects) depending on their underlying reasoning.
Whatever rational determines the acceptable pfr' s for the analysis, the significance levels must be set to assure these pfr 's are not exceeded at any level for which they are set. Guideline 3 presents one procedure to accomplish this task.
Guideline 3 : A five-step procedure for setting significance levels . Step 1 . Determine the set of hypotheses to be tested (applying Guidelines 1 and 2 to identify any joint hypotheses), assign an acceptable pfr to each hypothesis, and set the significance levels equal to these pfr' s. Step 2 . Determine families of hypotheses, if any, within which the probability of false rejection is to be controlled, and assign each family an acceptable pfr . Step 3 . Assign an acceptable analysis-wide pfr if desired. Step 4 . For each family, compare the implied family pfr with the acceptable family pfr . If the implied pfr is greater than the acceptable pfr , adjust the significance levels (see the following discussion on adjustment) so the implied pfr based on the adjusted significance levels is no greater than the acceptable pfr . Step 5 . If an analysis-wide acceptable pfr is set, calculate the analysis-wide pfr implied by the significance levels from Step 4 . If the implied pfr exceeds the acceptable analysis-wide pfr , then adjust the test-specific significance levels such that the implied pfr does not exceed the acceptable pfr .
By this procedure the resulting significance levels will assure that the acceptable pfr at each level (hypothesis, family, and analysis) is not exceeded. The resulting significance levels are governed by the strictest pfr 's. Ignoring a level is implicitly setting its pfr to the sum of the associated significance levels.
Steps 4 and 5 of Guideline 3 require the adjustment of significance levels. One approach to making such adjustments is the ad hoc reassessment of the acceptable hypothesis-specific pfr 's such that they are smaller. By this approach, the researcher reconsiders her acceptable pfr for each hypothesis and recalculates the comparisons with the higher level pfr 's. Of course, the outside observer could rightfully wonder how well reasoned these decisions were to begin with if they are so conveniently modified. A common alternative is to leave all pfr 's as they were originally designated and use a Bonferroni-type adjustment (or other available adjustment method). To preserve the relative importance indicated by the relative magnitudes among the acceptable pfr 's, a researcher transforms the current significance levels into normalized weights and sets the new significance levels as the weight multiplied by the higher-level pfr . For example, if a family of three hypotheses has significance levels of 0.05, 0.025, and 0.01, the implied family-level pfr is 0.083. If the acceptable family pfr is 0.05, then the implied pfr is greater than the acceptable pfr and adjustment is indicated. Weights are constructed from the significance levels as w 1 = 0.05/(0.05+0.025+0.01), w 2 = 0.025/(0.05+0.025+0.01), and w 3 = 0.01/(0.05+0.025+0.01). The adjusted significance levels are then calculated as 0.029= w 1 × 0.05, 0.015= w 2 × 0.05, and 0.006= w 3 × 0.05, which have an implied family-level pfr = 0.049 meeting our requirement that it not exceed the acceptable pfr of 0.05.
In the preceding example the adjusted significance levels implied a pfr that was less than the acceptable pfr (i.e., 0.049<0.05). A Bonferroni-type adjustment assures that the implied pfr is less than or equal to the acceptable pfr , consequently it is conservative and may unnecessarily diminish power by setting overly strict significance levels. An adjustment with better power, while not exceeding the acceptable pfr , can be attained by inflating the Bonferroni adjusted significance levels by a constant factor until the implied pfr is equal to the acceptable pfr . Although perhaps trivial in the present example, inflating each Bonferroni-adjusted significance level by a factor of 1.014 yields significance levels with an implied family-level pfr of 0.05, exactly that of the acceptable pfr .
Adjusting significance levels by reweighting produces the distribution of a higher-level pfr according to the relative importance implied by the initial significance levels. The final adjusted significance levels are a redistribution of the strictest pfr ; therefore, adjusted significance levels no longer represent the acceptable hypothesis-level pfr 's. However, the implied hypothesis-level pfr 's will be less than the acceptable hypothesis-level pfr 's thereby meeting the requirement that the probability of false rejection is satisfactory at all levels. Because, when the inflation factor of the preceding paragraph is not used, the adjusted significance levels sum to the strictest pfr , this pfr can be interpreted as the expected number of false rejections. If the inflation factor is used, then of course the adjusted significance levels will be larger and the expected number of rejections will be larger than the pfr .
Sample size and power calculations should be based on the final adjusted significance levels. If the corresponding sample size is infeasible or the power unacceptable, then reconsideration of the study design, sampling strategy or estimators is warranted. If no additional efficiency is forthcoming, then a re-evaluation of the project goals may save the proposed study. For example, it may be that changing the study goals from guiding policy decisions to furthering theory will warrant greater leniency in the acceptable pfr 's.
This paper is not intended as a tutorial on the statistical procedures for joint tests and significance level adjustment: there is considerable information in the statistics and research methods literatures regarding these details. F -statistics and χ 2 statistics are commonly available for testing sets of hypotheses expressed as functions of jointly estimated model parameters, including both single and multiple equation models (see, e.g., Greene 2003 ; Kennedy 2003 ). More generally, there are joint tests available for sets of hypotheses generated from separately estimated models (see the routine Seemingly Unrelated Estimation, based on White's sandwich variance estimator, in STATA 2003 ); for example, hypotheses comparing functions of parameters from a linear regression, a logistic regression, and a Poisson model can be jointly tested. If these tests are not applicable, tests based on bootstrapped data sets can often be successfully used ( Efron and Tibshirani 1993 ). Regarding basic descriptions of Bonferroni adjustment, see Johnson and Wichern (1992) , Harris (2001) , and Portney and Watkins (2000) , among others.
The preceding section on when to adjust significance levels implies such adjustments are warranted. This is not a universally accepted view; indeed, the use of adjustments for multiple tests has been the focus of considerable debate (see e.g., Rothman 1990 ; Saville 1990 ; 1995 , 1998 ; Goodman 1998 ; Thompson 1998a , b ). When reviewing this debate, or considering the merit of multiple testing adjustment, the distinction between a priori hypotheses and a posteriori observations is important, a distinction carefully drawn by Thompson (1998a) .
Although didactic in nature, I do not presume the ideas presented here are new to the majority of health services researchers; however, reading the journals in our field suggests that we may sometimes forget to apply what we know. The two goals of this paper were to remind researchers to consider their approach to multiple hypotheses and to provide some guidelines. Whether researchers use these guidelines or others, it is important for the quality of scholarship that we draw valid inferences from the evidence we consider: properly identifying composite hypotheses and accounting for multiple tests provides some assurance in this regard.
The following supplementary material for this article is available online:
When to Combine Hypothesis and Adjust for Multiple Tests.
1 Appreciation to an anonymous reviewer for suggesting the two-part model as an example.
2 Appreciation to an anonymous reviewer for pointing out the 10 hypotheses/10 studies example.
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Nature Neuroscience ( 2024 ) Cite this article
10 Altmetric
Metrics details
The human brain experiences functional changes through childhood and adolescence, shifting from an organizational framework anchored within sensorimotor and visual regions into one that is balanced through interactions with later-maturing aspects of association cortex. Here, we link this profile of functional reorganization to the development of ventral attention network connectivity across independent datasets. We demonstrate that maturational changes in cortical organization link preferentially to within-network connectivity and heightened degree centrality in the ventral attention network, whereas connectivity within network-linked vertices predicts cognitive ability. This connectivity is associated closely with maturational refinement of cortical organization. Children with low ventral attention network connectivity exhibit adolescent-like topographical profiles, suggesting that attentional systems may be relevant in understanding how brain functions are refined across development. These data suggest a role for attention networks in supporting age-dependent shifts in cortical organization and cognition across childhood and adolescence.
This is a preview of subscription content, access via your institution
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
24,99 € / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
195,33 € per year
only 16,28 € per issue
Buy this article
Prices may be subject to local taxes which are calculated during checkout
Data availability.
Data from the CCNP dataset used here are available at CCNP—Lifespan Brain-Mind Development Data Community at Science Data Bank ( https://ccnp.scidb.cn/en ) including both anonymized neuroimaging data ( https://doi.org/10.57760/sciencedb.07860 ) and unthresholded whole-brain connectivity matrices grouped by relevant ages (children and adolescents) ( https://doi.org/10.11922/sciencedb.00886 ). The raw CCNP data are available from the website upon reasonable request. The ABCD data used in this report came from the Annual Release v.2.0 ( https://doi.org/10.15154/1503209 ) of the ABCD BIDS Community Collection (ABCC; NDA Collection 3165). Source data are provided with this paper.
Code is available via GitHub: (1) preprocessing CCNP datasets ( https://github.com/zuoxinian/CCS ); (2) preprocessing ABCD datasets ( https://github.com/ThomasYeoLab/ABCD_scripts ); (3) FC gradient analysis ( https://github.com/NeuroanatomyAndConnectivity/gradient_analysis ); and (4) Gradient maturation analysis ( https://github.com/HolmesLab/GradientMaturation ).
Casey, B. J., Heller, A. S., Gee, D. G. & Cohen, A. O. Development of the emotional brain. Neurosci. Lett. 693 , 29–34 (2019).
Article CAS PubMed Google Scholar
Casey, B. J., Getz, S. & Galvan, A. The adolescent brain. Dev. Rev. 28 , 62–77 (2008).
Article CAS PubMed PubMed Central Google Scholar
Luna, B. et al. Maturation of widely distributed brain function subserves cognitive development. Neuroimage 13 , 786–793 (2001).
Kang, H. J. et al. Spatio-temporal transcriptome of the human brain. Nature 478 , 483–489 (2011).
Huttenlocher, P. R. & Dabholkar, A. S. Regional differences in synaptogenesis in human cerebral cortex. J. Comp. Neurol. 387 , 167–178 (1997).
Paquola, C. et al. Shifts in myeloarchitecture characterise adolescent development of cortical gradients. eLife 8 , e50482 (2019).
Zilles, K., Palomero-Gallagher, N. & Amunts, K. Development of cortical folding during evolution and ontogeny. Trends Neurosci. 36 , 275–284 (2013).
Reardon, P. K. et al. Normative brain size variation and brain shape diversity in humans. Science 360 , 1222–1227 (2018).
Bethlehem, R. A. I. et al. Brain charts for the human lifespan. Nature 604 , 525–533 (2022).
Margulies, D. S. et al. Situating the default-mode network along a principal gradient of macroscale cortical organization. Proc. Natl Acad. Sci. USA 113 , 12574–12579 (2016).
Yeo, B. T. et al. The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J. Neurophysiol. 106 , 1125–1165 (2011).
Article PubMed Google Scholar
Gao, W. et al. Temporal and spatial evolution of brain network topology during the first two years of life. PLoS ONE 6 , e25278 (2011).
Fair, D. A. et al. Functional brain networks develop from a ‘Local to Distributed’ organization. PLoS Comput. Biol. 5 , e1000381 (2009).
Article PubMed PubMed Central Google Scholar
Dong, H. M., Margulies, D. S., Zuo, X. N. & Holmes, A. J. Shifting gradients of macroscale cortical organization mark the transition from childhood to adolescence. Proc. Natl Acad. Sci. USA 118 , e2024448118 (2021).
Somerville, L. H., Hare, T. & Casey, B. J. Frontostriatal maturation predicts cognitive control failure to appetitive cues in adolescents. J. Cogn. Neurosci. 23 , 2123–2134 (2011).
Tottenham, N. & Sheridan, M. A. A review of adversity, the amygdala and the hippocampus: a consideration of developmental timing. Front. Hum. Neurosci. 3 , 68 (2009).
PubMed Google Scholar
Sydnor, V. J. et al. Neurodevelopment of the association cortices: patterns, mechanisms, and implications for psychopathology. Neuron 109 , 2820–2846 (2021).
Fair, D. A. et al. Development of distinct control networks through segregation and integration. Proc. Natl Acad. Sci. USA 104 , 13507–13512 (2007).
Betzel, R. F. et al. Changes in structural and functional connectivity among resting-state networks across the human lifespan. Neuroimage 102 , 345–357 (2014).
Betzel, R. F. et al. Generative models of the human connectome. Neuroimage 124 , 1054–1064 (2016).
Zuo, X. N. et al. Human connectomics across the life span. Trends Cogn. Sci. 21 , 32–45 (2017).
Tooley, U. A., Bassett, D. S. & Mackey, A. P. Functional brain network community structure in childhood: Unfinished territories and fuzzy boundaries. Neuroimage 247 , 118843 (2022).
Mesulam, M. M. From sensation to cognition. Brain 121 , 1013–1052 (1998).
Sepulcre, J., Sabuncu, M. R., Yeo, T. B., Liu, H. & Johnson, K. A. Stepwise connectivity of the modal cortex reveals the multimodal organization of the human brain. J. Neurosci. 32 , 10649–10661 (2012).
Power, J. D. et al. Functional network organization of the human brain. Neuron 72 , 665–678 (2011).
Dosenbach, N. U. et al. Distinct brain networks for adaptive and stable task control in humans. Proc. Natl Acad. Sci. USA 104 , 11073–11078 (2007).
Seeley, W. W. et al. Dissociable intrinsic connectivity networks for salience processing and executive control. J. Neurosci. 27 , 2349–2356 (2007).
Dosenbach, N. U., Fair, D. A., Cohen, A. L., Schlaggar, B. L. & Petersen, S. E. A dual-networks architecture of top-down control. Trends Cogn. Sci. 12 , 99–105 (2008).
Dosenbach, N. U. et al. A core system for the implementation of task sets. Neuron 50 , 799–812 (2006).
Labache, L., Ge, T., Yeo, B. T. T. & Holmes, A. J. Language network lateralization is reflected throughout the macroscale functional organization of cortex. Nat. Commun. 14 , 3405 (2023).
Liu, S. et al. Chinese color nest project: an accelerated longitudinal brain-mind cohort. Dev. Cogn. Neurosci. 52 , 101020 (2021).
Alexander-Bloch, A. F. et al. On testing for spatial correspondence between maps of human brain structure and function. Neuroimage 178 , 540–551 (2018).
Schaefer, A. et al. Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity MRI. Cereb. Cortex 28 , 3095–3114 (2018).
Gogtay, N. et al. Dynamic mapping of human cortical development during childhood through early adulthood. Proc. Natl Acad. Sci. USA 101 , 8174–8179 (2004).
Langs, G. et al. Identifying shared brain networks in individuals by decoupling functional and anatomical variability. Cereb. Cortex 26 , 4004–4014 (2016).
Coifman, R. R. & Lafon, S. Diffusion maps. Appl. Comput. Harmon. A 21 , 5–30 (2006).
Article Google Scholar
Fan, X. R. et al. A longitudinal resource for population neuroscience of school-age children and adolescents in China. Sci. Data 10 , 545 (2023).
Casey, B. J. et al. The Adolescent Brain Cognitive Development (ABCD) study: imaging acquisition across 21 sites. Dev. Cogn. Neurosci. 32 , 43–54 (2018).
Luciana, M. et al. Adolescent neurocognitive development and impacts of substance use: overview of the adolescent brain cognitive development (ABCD) baseline neurocognition battery. Dev. Cogn. Neurosci. 32 , 67–79 (2018).
Ricard, J. A. et al. Confronting racially exclusionary practices in the acquisition and analyses of neuroimaging data. Nat. Neurosci. 26 , 4–11 (2023).
Li, J. et al. Cross-ethnicity/race generalization failure of behavioral prediction from resting-state functional connectivity. Sci. Adv. 8 , eabj1812 (2022).
Fox, M. D. et al. The human brain is intrinsically organized into dynamic, anticorrelated functional networks. Proc. Natl Acad. Sci. USA 102 , 9673–9678 (2005).
Corbetta, M. & Shulman, G. L. Control of goal-directed and stimulus-driven attention in the brain. Nat. Rev. Neurosci. 3 , 201–215 (2002).
Gordon, E. M. et al. Precision functional mapping of individual human brains. Neuron 95 , 791–807 e797 (2017).
Sridharan, D., Levitin, D. J. & Menon, V. A critical role for the right fronto-insular cortex in switching between central-executive and default-mode networks. Proc. Natl Acad. Sci. USA 105 , 12569–12574 (2008).
Farrant, K. & Uddin, L. Q. Asymmetric development of dorsal and ventral attention networks in the human brain. Dev. Cogn. Neurosci. 12 , 165–174 (2015).
Casey, B. J., Giedd, J. N. & Thomas, K. M. Structural and functional brain development and its relation to cognitive development. Biol. Psychol. 54 , 241–257 (2000).
Molloy, M. F. et al. Effect of extremely preterm birth on adolescent brain network organization. Brain Connect. 13 , 394–409 (2023).
Huizinga, M., Dolan, C. V. & van der Molen, M. W. Age-related change in executive function: developmental trends and a latent variable analysis. Neuropsychologia 44 , 2017–2036 (2006).
Luna, B., Garver, K. E., Urban, T. A., Lazar, N. A. & Sweeney, J. A. Maturation of cognitive processes from late childhood to adulthood. Child Dev. 75 , 1357–1372 (2004).
Gordon, E. M. et al. A somato-cognitive action network alternates with effector regions in motor cortex. Nature 617 , 351–359 (2023).
Pfisterer, U. & Khodosevich, K. Neuronal survival in the brain: neuron type-specific mechanisms. Cell Death Dis. 8 , e2643 (2017).
Bullmore, E. & Sporns, O. The economy of brain network organization. Nat. Rev. Neurosci. 13 , 336–349 (2012).
Gee, D. G. et al. Early developmental emergence of human amygdala-prefrontal connectivity after maternal deprivation. Proc. Natl Acad. Sci. USA 110 , 15638–15643 (2013).
Dong, H. M. et al. Charting brain growth in tandem with brain templates at school age. Sci. Bull. 65 , 1924–1934 (2020).
Yang, N. et al. Chinese color nest project: growing up in China (in Chinese). Chin. Sci. Bull. 62 , 3008–3022 (2017).
Auchter, A. M. et al. A description of the ABCD organizational structure and communication framework. Dev Cogn Neurosci 32 , 8–15 (2018).
Clark, D. B. et al. Biomedical ethics and clinical oversight in multisite observational neuroimaging studies with children and adolescents: the ABCD experience. Dev. Cogn. Neurosci. 32 , 143–154 (2018).
Manjon, J. V. & Coupe, P. volBrain: an online MRI brain volumetry system. Front. Neuroinform. 10 , 30 (2016).
Xu, T. et al. A connectome computation system for discovery science of brain. Sci. Bull. 60 , 86–95 (2015).
Xing, X. X. et al. Connectome computation system: 2015–2021 updates. Sci. Bull. 67 , 448–451 (2022).
Friston, K. J. et al. Statistical parametric maps in functional imaging: a general linear approach. Hum. Brain Mapp. 2 , 189–210 (1994).
Jenkinson, M., Beckmann, C. F., Behrens, T. E., Woolrich, M. W. & Smith, S. M. FSL. Neuroimage 62 , 782–790 (2012).
Cox, R. W. AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. Comput. Biomed. Res. 29 , 162–173 (1996).
Fischl, B. FreeSurfer. Neuroimage 62 , 774–781 (2012).
Pruim, R. H. R. et al. ICA-AROMA: a robust ICA-based strategy for removing motion artifacts from fMRI data. Neuroimage 112 , 267–277 (2015).
Hagler, D. J. Jr. et al. Image processing and analysis methods for the adolescent brain cognitive development study. Neuroimage 202 , 116091 (2019).
Chen, J. et al. Shared and unique brain network features predict cognitive, personality, and mental health scores in the ABCD study. Nat. Commun. 13 , 2217 (2022).
Download references
This work was supported by the STI 2030—the major projects of the Brain Science and Brain-Inspired Intelligence Technology (2021ZD0200500 to X.-N.Z.), the National Institute of Mental Health (grants R01MH120080 and R01MH123245 to A.J.H.), the Major Fund for International Collaboration of National Natural Science Foundation of China (81220108014 to X.-N.Z.) and the National Basic Science Data Center ‘Interdisciplinary Brain Database for In vivo Population Imaging’ (ID-BRAIN to X.-N.Z.). B.T.T.Y. is supported by the NUS Yong Loo Lin School of Medicine (NUHSRO/2020/124/TMR/LOA), the Singapore National Medical Research Council (NMRC) LCG (OFLCG19May-0035), NMRC CTG-IIT (CTGIIT23jan-0001), NMRC STaR (STaR20nov-0003), Singapore Ministry of Health (MOH) Centre Grant (CG21APR1009), the Temasek Foundation (TF2223-IMH-01) and the United States NIH (R01MH120080 and R01MH133334). Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not reflect the views of the Singapore NMRC, MOH or Temasek Foundation.
These authors contributed equally: Avram J. Holmes, Xi-Nian Zuo.
Department of Psychology, Yale University, New Haven, CT, USA
Hao-Ming Dong, Xi-Han Zhang & Loïc Labache
Centre for Sleep and Cognition and Centre for Translational Magnetic Resonance Research, Yong Loo Lin School of Medicine, Singapore, National University of Singapore, Singapore, Singapore
Shaoshi Zhang, Leon Qi Rong Ooi & B. T. Thomas Yeo
Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore
B. T. Thomas Yeo
N.1 Institute for Health and Institute for Digital Medicine, National University of Singapore, Singapore, Singapore
Centre National de la Recherche Scientifique, Frontlab, Institut du Cerveau et de la Moelle Epinière, Paris, France
Daniel S. Margulies
Department of Psychiatry, Brain Health Institute, Rutgers University, Piscataway, NJ, USA
Avram J. Holmes
State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, China
Xi-Nian Zuo
National Basic Science Data Center, Beijing, China
Developmental Population Neuroscience Research Center, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, China
You can also search for this author in PubMed Google Scholar
H.-M.D., A.J.H. and X.-N.Z. designed the research. A.J.H. and X.-N.Z. supervised the research. H.-M.D. and A.J.H. conducted analyses and made figures. X.-H.Z., L.L., S.Z., L.Q.R.O. and B.T.T.Y. conducted validation analyses based on the ABCD dataset. H.-M.D., A.J.H. and X.-N.Z. wrote the initial draft. X.-H.Z., L.L., S.Z., L.Q.R.O., B.T.T.Y. and D.S.M. edited the paper.
Correspondence to Hao-Ming Dong , Avram J. Holmes or Xi-Nian Zuo .
Competing interests.
The authors declare no competing interests.
Peer review information.
Nature Neuroscience thanks Brenden Tervo-Clemmens and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data fig. 1 the transition from unimodal to transmodal organization revealed by the increasing in percentiles of functional connectome..
The threshold of connectivity matrices was adjusted in children and adolescent group and then redrive the gradients. The results revealed a marked transition in functional connectivity strength from childhood to adolescence. With the 95% threshold retaining the strongest connections, a unimodal organization was evident in both children and adolescents as their primary gradients. However, as additional weaker connections are included in the functional connectome at a 90% threshold, the primary gradients diverged for children and adolescents, revealing a unimodal and transmodal organization, respectively. Yet, with an 85% threshold incorporating even more weaker connections, the primary gradients converged into a transmodal organization for both children and adolescents.
Extended data fig. 2 gradient maps in low ventral attention connectivity groups derived from longitudinal data..
A set of child participants (n=22) were identified from the low ventral attention group who were also subsequently scanned in their adolescence. Surface maps exhibit a stable adolescent-like gradient architecture in both childhood and adolescence. Their first gradient in childhood is highly correlated (r=0.9429, p<0.01, two-sided spin test) with the first gradient that in their adolescence. A consistent group profile that was also evident when considering their second gradients in both childhood and adolescence (r=0.9353, p<0.01, two-sided spin test).
A set of child participants (n=21) were identified from the high ventral attention group who were also subsequently scanned in their adolescence. Surface maps exhibit a developmentally normative pattern of gradient reversals from their childhood to adolescence. Their first gradient in childhood was highly correlated with their second gradient in adolescence (absolute r=0.9793, p<0.01, two-sided spin test), while their second gradient in childhood were highly correlated with the first gradient in their adolescence (absolute r=0.9748, p<0.01, two-sided spin test).
Virtual lesion analyses were performed for all the networks respectively. It is revealed that in children group, the drop off of visual, somato/motor, ventral attention and frontoparietal networks generating transmodal organization in the first gradient, while the drop off of dorsal attention, limbic and default networks conserve the unimodal organization in the first gradient. Readers should interpret these maps with caution as functional networks each contain distinct numbers of vertices along the cortical sheet. Accordingly, the direct examination across the canonical networks, is likely biased by their relative sizes.
Supplementary information.
Supplementary Discussion and Tables 1–17.
Source data figs. 1–5 and extended data figs. 1–4.
Degree centrality values, Euclidean values and network-level mean and standard gradient values after dropping off ventral attention network and the null distribution generated by the permutation test. Config file for plotting Chord diagram. Gradient values of High/Low ventral attention groups in the CCNP dataset. Gradient values of high/low ventral attention groups in the ABCD dataset. Gradient values of adolescents and children groups with different percentiles of functional connectome. Gradient values in low ventral attention connectivity groups derived from longitudinal data. Gradient values in high ventral attention connectivity groups derived from longitudinal data. Gradient maps with functional networks dropped off separately.
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
Reprints and permissions
Cite this article.
Dong, HM., Zhang, XH., Labache, L. et al. Ventral attention network connectivity is linked to cortical maturation and cognitive ability in childhood. Nat Neurosci (2024). https://doi.org/10.1038/s41593-024-01736-x
Download citation
Received : 19 May 2023
Accepted : 18 July 2024
Published : 23 August 2024
DOI : https://doi.org/10.1038/s41593-024-01736-x
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.
IMAGES
COMMENTS
What is a composite hypothesis test? Approaches to testing using multiple parameters, explained in simple terms. Composite null hypothesis explained.
Lecture 10: Composite Hypothesis Testing In many real world problems, it is di cult to precisely specify probability distributions. Our models for data may involve unknown parameters or other characteristics. Here are a few motivating examples.
The first two options given above are called one-sided tests. The third is called a two-sided test, Rejection and failure to reject the null hypothesis, critical regions, C, and type I and type II errors have the same meaning for a composite hypotheses as it does with a simple hypothesis.
This lecture explains simple and composite hypotheses. Other videos @DrHarishGarg How to write H0 and H1: • How to Write Null and Alternative Hyp... Simple and Composite Hypothesis ...
Composite Hypothesis. In subject area: Mathematics. Classically, composite hypotheses are used to determine if a point null is statistically distinguishable from the best alternative, or to determine if the best supported alternative lies on a specified side of the point null. From: Philosophy of Statistics, 2011.
H H is called a simple hypothesis, if it completely specifies the population distribution; in this case, the sampling distribution of the test statistic is a function of sample size alone. H H is called a composite hypothesis, if it does not completely specify the population distribution; for example, the hypothesis may only specify one parameter of the distribution and leave others unspecified.
The concept of simple and composite hypotheses applies to both the null hypothesis and alternative hypothesis. Hypotheses may also be classified as exact and inexact.
The "composite" part means that such a hypothesis is the union of many simple point hypotheses. In a Null Hypothesis Statistical Test only the null hypothesis can be a point hypothesis. Also, a composite hypothesis usually spans from -∞ to zero or some value of practical significance or from such a value to +∞.
7.1 Composite null and alternative hypotheses This week we will discuss various hypothesis testing problems involving a composite null hypothesis and a compositive alternative hypothesis. To motivate the discussion, consider the following examples:
Rejection and failure to reject the null hypothesis, critical regions, C, and type I and type II errors have the same meaning for a composite hypotheses as it does with a simple hypothesis.
Recall that simple hypothesis testing problems have one state per hypothesis. Composite hypothesis testing problems have at least one hypothesis containing more than one state.
The first two options given above are called one-sided tests. The third is called a two-sided test, Rejection and failure to reject the null hypothesis, critical regions, C, and type I and type II errors have the same meaning for a composite hypotheses as it does with a simple hypothesis.
Composite Hypothesis A composite hypothesis is a statement that assumes more than one condition or outcome. It can be divided into several sub-hypotheses, each of which represents a different possible outcome.
A different Likelihood ratio for composite hypothesis testing On composite hypotheses, where both null and alternate hypothesis map to values of μ, we can define an alternative likelihood-ratio test statistics that has better properties
In this lesson, we'll learn how to apply a method for developing a hypothesis test for situations in which both the null and alternative hypotheses are composite. That's not completely accurate. The method, called the likelihood ratio test, can be used even when the hypotheses are simple, but it is most commonly used when the alternative ...
Explore hypothesis testing, a fundamental method in data analysis. Understand how to use it to draw accurate conclusions and make informed decisions.
In this lesson, we'll learn how to apply a method for developing a hypothesis test for situations in which both the null and alternative hypotheses are composite. That's not completely accurate. The method, called the likelihood ratio test, can be used even when the hypotheses are simple, but it is most commonly used when the alternative hypothesis is composite. Throughout the lesson, we'll ...
Composite Hypothesis: A statistical hypothesis which does not completely specify the distribution of a random variable is referred to as a composite hypothesis.
A composite hypothesis, on the other hand, proposes multiple relationships between two or more variables. For example, a composite hypothesis might state that "there is a significant difference between the average heights of men and women, and there is also a significant difference between the average heights of people from different ...
The test accepts the hypothesis that the data is normal. Notice, however, that something is different. Matlab grouped the data into 6 intervals, so chi-squared test from previous lecture should have r − 1 = 6 − 1 = 5 degrees of freedom, but we have 'df: 3'! The difference is that now our hypothesis is not that the data comes from a particular given distribution but that the data comes ...
A composite hypothesis does not point to a unique probability measure to be used in the hypothesis testing, and this makes it more challenging to test composite hypotheses than simple hypotheses.
To provide guidelines for identifying composite hypotheses and addressing the probability of false rejection for multiple hypotheses.Examples from the literature in health services research are used to motivate the discussion of composite hypothesis tests ...
In the problem of composite hypothesis testing, identifying the potential uniformly most powerful (UMP) unbiased test is of great interest. Beyond typical hypothesis settings with exponential famil...
Understanding brain development and systems linked to behavioral change is a key goal in population neuroscience. The authors show the ventral attention network is key for brain development and ...