Skip to content

How to dismiss a correlational study “proving” that the SAT predicts some outcome

[sidebar to What Pinker Got Wrong]

Pinker, like other participants in the debate over standardized tests, looks for and finds evidence that SAT scores predict desired outcomes.

The prototypical study obtains a large sample of people who have taken the SAT, along with measures of some outcome—college grades, lifetime earnings, professional honors–and constructs a regression equation in which the outcome variable is regressed on SAT scores, after controlling for—attempting to remove the influence of—other predictors. For instance, someone whose parents went to college may be more successful in college than a first generation student. The regression procedure allows us to remove the influence of parental education, and any number of other rival explanations, before testing whether SAT scores are associated with the success in college. In analyzing the regression results, we check whether the coefficient of prediction for the SAT is positive and significant, after first accounting for known contaminating factors. In plain English, do higher SAT scores uniquely predict higher grades?

* Although a reasonable number of recent college graduates will be familiar with linear regression, few study it in any depth, and fewer still, even among professors who publish regression-based work, will have delved into the guts of the procedure as I do here. Please forgive the somewhat labored description of the basics. I’m trying to reach a wide rather than narrow audience.

Most regression tests done in the field, on topics such as the SAT, use many variables. But the basic math, and the dangling thread that allows these studies to be dismissed in the context of Pinker’s proposal, can be illustrated by examining the two variable case. Table 1 in the sidebar shows a cross-tabulation of SAT scores, grouped as shown on the bottom axis into nine categories, against an outcome variable grouped into eleven categories. I constructed the data to show a strong positive association, with a correlation coefficient of +.403—stronger than most results in the social sciences, and probably on the high side for findings of SAT predictiveness.  More to the point, I constructed something close to a bivariate normal distribution, and this is crucial for what follows.

* If I were a statistician—rather than a scholar who sometimes uses statistics in my work—I would have been able to generate an exact bivariate normal distribution from equations.  Alas, I am not.

Look closely at the quantities in Table 1. As SAT score increases, the peak counts fall higher and higher on the outcome scale. There is always some dispersion—high SAT scores with relatively modest outcome values, and vice versa—but the trend is clear. The bold line in Figure 1 shows the mean outcome for each SAT score range, and it increases steadily. Such a chart, in any study of the SAT, would be taken as evidence that higher SAT scores lead to higher life outcomes.

I sized the sample to be 100,000, and arranged both the SAT scores and the outcome measure to be normally distributed. In a sample of this size, the correlation is highly significant, p < .001.

Now to the point of this exercise: look at Table 2 and Figure 2. They are identical to Table 1 and Figure 1, except on the right side, over the top two SAT groups (those scoring between 750 and 799, and those with a perfect 800).  Here I’ve attenuated the association with the outcome measure to make it consistent with a threshold model of academic ability.  In the second table and figure, entries for the top two groups of SAT scores have been distributed exactly as for the third ranked SAT group, those with scores between 650 and 750. The line charting means in Figure 2 now flattens out, showing the ceiling effect.

By the way, SAT scores between 650 and 750—1.5 to 2.5 standard deviations above the mean—are what used to be called strong scores, or top scores. This range captures students in the 93rd through 99th percentiles.  Evidence of high academic merit, yes, but as we’ve seen, nowhere near good enough for the Ivy League, if Pinker’s proposal were to be implemented in a rigorous way.

By flattening the outcome line only for scores above 750, the data in Table 2, when compared to the data underlying Table 1, provides a test of whether the correlation analyses that underlie studies of SAT predictiveness have the power to pick up ceiling effects, if any. Put another way: If Ivy League caliber test scores, out at the 99.9th and 99.99th percentile, turned out to be mostly noise, not really different from strong scores down in the 93rd to 99th percentiles, would this show up as a reduced or impaired correlation, in studies of the SAT?

No.  The correlation measured off the data in Table 2 dips to .397, from .403 in Table 1.  Both would conventionally be reported in two digits as the identical value: “r = .40.”  Large sample tests of SAT predictiveness cannot pick up on ceiling effects that only kick in above 2.5 standard deviations from the mean. There are two few extreme scores to move the needle. It will always be that way, if SAT scores and outcome measures are each normally distributed.

And, to tie back to the earlier discussion: ceiling (and floor) effects at the extremes are exactly what we would expect, if random error becomes more lopsidedly positive, the farther out toward the tail of the distribution we go.  Students scoring 770 may be indistinguishable in their true level of ability from students scoring 720; the high scorers happened to luck out when playing testing roulette.

The beauty of this dismissal of correlational studies: you can continue to acknowledge that SAT tests are pretty good predictors, maybe even very good predictors of a range of outcomes (if such is your position).  It is not necessary to rebut the correlation studies supporting the SAT. My dismissal is targeted and exact: correlations derived using a full range of normally distributed* scores from the population must lack the resolution needed to pick up ceiling effects at the very extreme tail of those populations.

* Most of us lack an intuitive appreciation of how steep a normal distribution is. Put another way, we don’t appreciate how very rare values beyond three standard deviations must be, for a distribution to be normal, as opposed to fat-tailed.

Therefore, no correlational study can support a preference for the 99.9th percentile student over the 99th percentile student, or even the 95th percentile student.

By contrast, a mass spectrometer really can tell the difference between a galaxy moving at 0.3 the speed of light, ten billion light years away, and another galaxy nine billion light years away moving at 0.25 the speed of light.  But neither the SAT test nor any other existing measure of human mental ability has that kind of resolution.

Tables, figures, data

 

How the initial cross-tabulation was generated:

  1. Pick an arbitrary (large) sample size; I used 100,000, about the minimum to make the table meaningful.
  2. Grab a statistics textbook and find the Appendix in back titled something like “Cumulative Normal Probabilities.” This will give you percentiles linked to standard deviations above the mean, generally up to six sigma
  3. Keep in mind that the SAT is scaled to a mean of 500 and a standard deviation of 100.  When we find the percentiles corresponding to a given SAT score, we know how entries have to be distributed horizontally to be approximately normal.
  4. Pick an arbitrary outcome variable distributed through at least three standard deviations.  I picked grades, because the 11 values typically used gave me an outcome variable that could be distributed between + / – five standard deviations

* Of course, with grade inflation having gone to ridiculous lengths, it’s laughable to suppose that A+ grades in college are as rare as shown here.  Maybe I should have used average annual increment in income from attending college, rather than only high school, and replaced A+ with $5 million per year. Trust me, five standard deviations is way out there on the tail, and you need outcomes about as rare as that income level.

Here’s table 1:

 

And here’s Figure 1, showing the mean outcome rising as SAT scores increase:

 

Now here’s Table 2.  Note how I’ve made the vertical distribution for the top two SAT groupings identical to that for the third from the top group. There are no other changes to the table.

 

Figure 2 shows how the means flatten out: a classic ceiling effect.

 

If you want to run your own statistical analyses, here’s the table recast as a flat data file with weights, suitable for analysis in SPSS or other statistical package of your choice.