Symposium: New Directions in ability research
Organizers and Chair: R.D. Roberts &
P. Kyllonen
CHALLENGING G-MANIA IN INTELLIGENCE RESEARCH:
ANSWERS NOT GIVEN, DUE
TO QUESTIONS NOT ASKED
WERNER W. WITTMANN & HEINZ-MARTIN SÜß
UNIVERSITY OF MANNHEIM
Abstract
In the last decade the London school was on the rise again.
Spearman's g attracted a lot of research interest. Carroll s
(1993) heroic attempt to find all published correlation matrices
and reanalyze them convincingly, supported Spearman's positive
manifold thesis. All measures ever invented to measure
intelligence correlate positively with one another, at least in
unrestricted samples, which is the sine qua non condition for g.
That little g is out there in the real world, there is no doubt
and no real challenge exists in disputing this simple fact! What
are we doing and what should we do about g? Answering questions
like these bring us to the birthplace of our most loved and hated
controversies and debates in psychology. Concentrating on g as
Herrnstein and Murray (1994) or Brand (1996) did, or looking for
the biological bases of g or its reducibility to speed of
information processing, we easily forget the second important
part of Carroll's message. Not only was g found but also evidence
for hierarchical models of intelligence with g at the apex. What
kind of questions can be asked with hierarchical models? Our
questions should be related to the criteria we are most
interested in. In seeking to ask good questions it is wise to
look where the very smart guys are. What are the commonalities of
questions they ask leading to answers which sometimes are
reinforced with Nobel and other prizes? If we do this we will
notice one general class of principles behind those successful
questions, especially in physics, i.e. principles of symmetry. In
psychology, many years ago Egon Brunswik incorporated them in his
famous lens model. Using hierarchical variants of the lens model
we were forced to think about the relationship between predictor
and criterion model hierarchies. From what level of generality in
the predictor or what level of generality in the criterion do the
best predictions and explanations occur? Using the framework of a
hierarchical variant of the Berlin model of intelligence
structure (BIS) (Jäger, 1984; Wittmann, 1988) in predicting
school grades or complex problem solving performance we found
that the g-level was not the best level to predict and explain
hierarchical variants of these criteria. For working memory
capacity the g-level was very good, but even here we profited
much from lower levels as regards explanation. The principles
coined Brunswik-symmetry are demonstrated at a modification of
Tucker's lens equation leading to explanations under what
conditions predictions succeed or fail.
Author's address: Universität Mannheim
Schloß, EO
D-68131 Mannheim
Tel.: (0621)292-5639, Fax: (0621)292-2528
Email: wittmann@tnt.psychologie.uni-mannheim.de
Working papers about our research can be downloaded at our homepage:
http://www.uni-mannheim.de/fakul/psycho/welcome.html
Introduction
In discussions about the importance of intelligence many researchers have turned to Spearman's g as an important explanatory construct. The bell curve (Herrnstein & Murray, 1994) demonstrates many unexpected relationships to many criteria of utmost social importance using a g-type predictor and has attracted a lot of controversies and debates. Chris Brand has experienced the depublishing of his book (1996) about g by Wiley-Publishers because he discussed race differences in g. In predicting job performance Frank Schmidt, John Hunter and coworkers have demonstrated the virtues of general intelligence with enormous economic impacts.
The US-Army found in large scale-research projects using their ASVAB-test battery the ubiquitous predictiveness of g (Ree and Earles, 1994) and discussions concern the problem of diminishing returns, in further researching dimensions of ability (Alderton & Larson, 1994). It is understandable that due to of all these facts basic researchers want to better understand that little g, e.g. most prominentely Arthur Jensen standing for many others. John B. Carroll (1993) in reanalyzing all published correlation matrices found clear evidence for g but also for many factors at lower levels of generality and organized all what he found in a stratum taxonomy. Kyllonen (e.g. 1994) described the comprehensive cognitive abilities measurement (CAM) taxonomy which he and coworkers are developing and using at the Armstrong laboratory of the US-Air-Force. There is little to challenge today that g is at the apex of all taxonomies ever used if we organize them as a hierarchy.
Fig. 1 depicts many question marks and we will challenge that most of these questions are best answered from the level of g. It is important to have a strategy for asking questions to which nature gives clear answers. Looking around where the real smart guys and girls are we often stumble into basic sciences like physics. There we find out that nature gives answers and being pretty much impressed we should try to learn what the secrets of those questions are. Again and again we learn that these questions are woven around the key concepts of principles of symmetry. To name just a few role models, look at the theories and questions of Richard Feynman, Murray Gell-Mann or in Great Britain Michael Faraday and you'll find the secrets woven into principles of symmetry. Principles of symmetry also are the key concepts in psychology, especially in Egon Brunswik's work, i.e. his famous lens-model. Let see how we can profit from that symmetry!
The scope and virtues of Brunswik-symmetry
Brunswik-symmetry is a framework for structuring our questions
concerning the predictive and explanatory power of intelligence
constructs. Using hierarchical intelligence models at the
predictor side, the Gestalt principles of the symmetrical
properties of the lens model force us to conceptualize the
criterion model as a hierarchical one as well. Once we have done
this we can start asking what is the most symmetrical level of
generality or hierarchy related to the g-level or to the group
factor level or whatever level of interest!
Understanding the principles of aggregation is very important. Aggregation of parallel or at least partially parallel items, scales or parcels leading to more reliable scales at the aggregate level is what psychometric theory tells us. Thus we assume that the items to be aggregated have the same true scores but differ in error components. Aggregation leads to canceling or at least diminishing the error parts so that true score variance gets larger in relation to error variance at the composite and reliability enhances. But reliability not being validity is silent and blind to theory. Does the way we aggregate our measures really map the constructs we want to assess? In maximizing construct validity we need theory at least as a taxonomy. In Wittmann (1988) I proposed therefore to distinguish between wanted, unwanted and error variance. Wanted and unwanted systematic variance compose the true part of a score, hence reliability is always composed of wanted and unwanted true score components. Exploratory factor analysis applied to correlation matrices leads to factors composed of a mixture of wanted and unwanted systematic true variance because the correlation of our items is a function of the true score parts whether parts of it are wanted or unwanted. The factor scores basically are a weighted aggregate of the items forming the correlation matrix.
Relating these factor scores to criterion models of a known
wanted structure and assuming we have measures for the criterion
construct consisting only of wanted variance the correlation is
attenuated due to the amount of unwanted variance in the
predictor additionally due to the amount of random error
variances both in the predictor and criterion measures. Unwanted
systematic true variance introduces asymmetry which violates
Brunswik-symmetry. If we try to map the true scores of a wanted
latent construct and our proxies are polluted with unwanted true
variance they are to a certain degree construct unreliable. I
introduced that term to distinguish it from traditional
psychometric unreliability stemming from random error variance,
although most of us still prefer here the term construct
validity, instead of construct reliability. But I insist on the
distinction of construct reliability because we have to
distinguish again wanted, unwanted, and error variance. In my
terms construct validity only refers to the amount of overlap of
wanted variance of our proxies to the true wanted variance of the
latent construct of a predictor to that one of a criterion. So a
measure consisting only of wanted and error variance but with
only partial overlap of wanted variance in the predictor with the
true latent wanted variance of the criterion is said to be not
perfect psychometrically reliable and construct reliable and also
lacking in perfect construct validity. A measure with perfect
overlap of wanted variance in predictor and criterion but
augmented by additional unwanted true variance and error variance
is perfect construct valid but lacking in construct reliability
and psychometric reliability as well. The two cases are to be
distinguished in the direction of asymmetry. Fig. 2 depicts the
case of perfect Brunswik-symmetry as the true wanted latent
structure of a predictor and a criterion hierarchy respectively.
Cases 1 to 4 in Fig. 2-5 denote four variants of asymmetry. Fig.
2-5 help in visualizing what we tried and go on to explain in
verbs and numbers.
Lack of overlap of wanted variance between predictors and criteria leads to lack of construct validity, lack of construct reliability and lack of psychometric reliability in our normal error-prone measures. Complete overlap of wanted variance but additional unwanted and random error variance leads to measures with perfect construct validity but lack of construct reliability and lack of psychometric reliability. The measures in our four cases of asymmetry in these terms are as follows. In case 1 the predictors and the criterias psychometric reliability may be perfect, they may be perfect construct valid for different constructs, but due to total asymmetry either the criterion or the predictor is perfectly construct unreliable depending on what your theory tells, you should be interested in. Assuming theory focuses on the criterion side, then the predictor also being perfect psychometrically reliable is completely construct unreliable but probably perfect construct valid and construct reliable for a different independent construct. The framework of case 1 hints to how we should test theories with the convergent and divergent (or discriminant) validation strategy (Campbell & Fiske, 1959). We always should have measures mapping what construct A CR is, but also what construct A CR is not; i.e. non-A CR e.g. B CR. To predict or explain A CR we need at the predictor side of the lens, measures for A PR but at the same time non-A PR i.e. B PR. Testing our theory should lead to:
This is what is meant with perfect Brunswik-symmetry. The only unsolved problem, which still needs to be dealt with is at what level of generality we are with that pattern. Contrasting case 4 to case 1 illuminates that problem and hints to solutions. In case 4 the asymmetry is on both sides of the lens. Say our theory focuses on a construct called a. But our measures are contaminated with true non-a variance, non-a PR and non-a CR. This kind of asymmetry suggests that non-a PR is different from non-a CR; i.e. the correlation between both is zero. To test the predictive validity of a PR in relation to a CR we should remove non-a PR from the predictor model and non-a CR from the criterion model. This can be done through Jack Cohen's ingenious set-correlation system with the bipartial variant (Cohen 1982, Cohen & Cohen 1983). If our theory is true the bipartial correlation should be close to one after correcting the psychometric unreliability of the partialled predictor and the partialled criterion. If we have the predictor and the criterion as hierarchical models we could use the lower level group factors for investigating their pairwise relationships. If we find a higher correlation between some pairs of these factors we know that symmetry is at a lower level of generality than in case 1. Case 2 and Case 3 denote one-sided asymmetries . In case 2 the predictor either contains unwanted variance as regards the criterion or the criterion lacks parts of wanted variance, in case 3 either the criterion contains unwanted variance or the predictor lacks wanted variance.
The parameters R PR and R CR in the lens
model equation map the degree of construct reliability of our
predictor and criterion models respectively (see Fig. 7 and 8 for
linear and nonlinear variants of that equation). In circumstances
where we have no theory we do not know what our wanted constructs
are and can give no estimation of these parameters. Having no
theory nor a taxonomy is very embarrassing because it
demonstrates that you are telling us not knowing what you want!
All research starts in an exploratory way, doing a lot of trial
and error and so we developed measures for a construct of
intelligence where we only vaguely know what kind of facets
constitute it. Yet decades of research on intelligence have
supplied us with diverse taxonomies like Guilford's SOI-model,
Cattell and Horn's distinction of fluid and crystallized
intelligence and their related hierarchical models, the model of
the British school (Vernon and Burt) and many other variants.
From the very start of intelligence research we have in a
Popperian sense the keen hypothesis of Charles Spearman of a
positive manifold that all measures of intelligence are
positively correlated justifying a general intelligence factor g
and testspecific factors. The construct validity of each measure
can therefore always be explained by two factors a very general
and a specific one. Spearman's challengers concentrated on the
specific part of an intelligence measure assuming that these
specifics might be more important then originally thought of.
These specifics might also be correlated with other specifics
leading to the notion of group factors, turning specific true
variance into common true variance. After expanding the set of
intelligence measures it might turn out what was thought to be
specific variance might be much more important than general
g-factor variance, a track opened by Spearman's opponent L.L.
Thurstone. Over one hundred years of research on intelligence has
supplied us with a multitude of correlation matrices. John B.
Carroll (1993) in a heroic enterprise reanalysed all these
matrices resulting in a three stratum hierarchical model with
Spearman's g at the apex but also with many group factors at
different levels of generality (i.e. stratums). Although debates
will go on what the best hierarchical model of intelligence is,
the very fact that g is there but also different group factors no
longer needs to be challenged. The compromise, the synthesis lies
in hierarchies. As it is often in science competing theories are
settled by subsuming and synthesizing them into more general
models or theories. What is not solved is what level of
generality is the most important one.
Prediction and explanation of school grades
The first example in challenging the ubiquity of g is grades.
School grades can be organized into a hierarchical model as well,
with total or averaged grade at the apex. A single examination in
a teacher made test consists of several tasks which a pupil has
to solve. A grade in an topical area then is given, transforming
the number of correct solutions into a grade e.g. with six
levels. In the German school system a one is the best and a six
the worst grade. Within a topical area over a school year
normally two to five different subject class tests are given and
these grades again are averaged to build the total grade in that
subject area, i.e. mathematics for the whole school year. In our
study total grades for a school year were available for seven
disciplines, four grades for science, i.e. mathematics, physics,
chemistry, biology, and three grades for language, i.e. German,
first foreign language (mostly English) and second foreign
language (mostly French). A factor analysis with these seven
grades using Varimax-rotation lead to two group factors which we
labeled as science and language respectively, because the salient
loadings were the three science disciplines on the first and the
three language disciplines on the second. Biology had salient
loadings on both of them.
Fig. 9 depicts these factor analytical results. Total grade as
a g-type measure of grades can be derived by averaging over all
seven grades or after a nonorthogonal rotational solution. In a
hierarchical version of the Brunswik-lens we now can start asking
from what level of generality in the intelligence hierarchy what
level of generality in the grade model is to be best predicted.
Tab. 1 gives the results. For the rationale behind the Berlin
model of intelligence see Wittmann (1988). The table contains as
columns the different hierarchical levels of the intelligence
models, i.e. g, the four group factors of the operative mode, the
three group factors from the content mode, all seven group
factors together and the 12 components derived from an average of
pairwise combinations of operative and content mode (4x3 cells).
The rows contain the single grades, total overall grade, and the
first two unrotated (F2URGES, F2URNVG) and rotated group grade
factors (F2RNAT, F2RGEI).
The results are given in terms of either bivariate or multiple
correlation coefficients squared, always first rows within grade
level as unadjusted and second rows as adjusted for number of
predictors used. It is clearly visible that the g-level is not
the best (most symmetrical) level in predicting the different
levels of the grade hierarchy. The language group factor and
almost all language grades correlate around zero with g! The
picture is different for science. There we find correlations
different from zero and the highest one for mathematics. Looking
at the best predictions from other levels of generality we find
that the highest amount of variance explained stems from the 12
cell components, averaging content and operative mode. The second
unrotated group factor F2URNVG which is a bipolar factor
contrasting science with language grades is predicted up to 50,6%
(adjusted 45,8%), the best result in all comparisons. The table
gives a variety of other interesting results further challenging
the ubiquity of g. All language grades are substantially
predicted from the cell-level and looking at the group factor
level especially from the content group factor level. The last
three columns showing the results of a commonality analysis
(Cooley & Lohnes, 1976) demonstrate with language grades that
content factors have a higher unique variance whereas with
science grades the uniqueness of operative factors outperforms
that from content factors. Profiting from the different levels of
hierarchy we can better explain the different successes in
prediction. Tab. 2 gives these explanations from the group factor
level in terms of beta-weights from regressing grades on
intelligence operative and content group factors separately. Now
it becomes clear that for science grades the factor K from
operative mode, i.e. processing capacity for complex
informations, which basically is a reasoning factor, and the
number factor from that content mode got the highest weights.
With language grades the verbal content factor is most dominant.
In Tab. 3 we go to the level of the 12 cell components. The
columns contain the cell components, the rows the various grades.
The first row within grade gives the beta-weights of regressing
grades on that level of intelligence facets in a simultaneous
regression equation. The second row shows the beta-weights after
step-wise regression using inclusion level p < .15. The third
row denotes the simple bivariate correlations. We have starred
the significant weights or correlations to give a visualization
aid in recognizing the most important facets (cell components).
KV being processing capacity for complex verbal content
(Americans will prefer the term reasoning with verbal content)
and BV being speed on relatively simple verbal tasks, attract
most of the stars. Please note differences in signs! A negative
sign means the higher the intelligence the better a single
subject grade. The rotated group factors science (F2RNAT),
language (F2RGEI) and the first unrotated general grade factor
are scaled as high scores mapping good grades. The second
unrotated grade factor contrast science grades with language
grades meaning high scores good grades in science and low scores
good grades in language. Obviously and probably as no surprise
verbal content dominates in school. School is at least in Germany
a real "talk-show"! Reasoning and speed with verbal
content is very important. Tab. 3 also gives after a detailed
study a lot of hints to suppressor effects. In Fig. 10 we give
these details for the science and in Fig. 11 for the language
group factor. Fig. 12 puts both together in a path-analytical
framework using EQS as a structural equation modeling tool. The
most important results are the negative path coefficients! Pupils
with higher scores on reasoning with numbers and figures have
worse grades in language than those with lower scores, although
all facets of reasoning correlate positively with one another.
Whether pupils with higher ability on KF and KN do not invest
their talents in the topics of language or whether language
teachers do not teach language using visualization and numerical
reasoning or whether language teachers do not understand pupils
talking in complex figures and numbers to them is unclear. Yet
our favorite hypothesis is that in teaching a one-sided verbal
approach dominates and pupils with talents in figural and
numerical reasoning do not have the learning opportunities they
need. Looking at science we see that those who are fast at simple
verbal tasks have the worse science grades. It obviously pays off
in science to be slower, think longer and probably not to pollute
your working memory resources with irrelevant verbiage. On the
other hand science needs all three facets of reasoning.
Prediction and explanation of performance in complex
business games
Problem solving performance in complex intransparent situations like in computer based business games is a key criterion and a kind of gold standard for intelligence. How well can the performances be predicted and explained from different levels of intelligence. In a large study conducted by Süß and coworkers (Süß, 1996) in Berlin 125 pupils worked 5 times over a period of two years on a complex business game named tailorshop, where they should manage the computer simulated company into fortune. Additionally they took the Berlin structure of intelligence test (BIS), a computer knowledge and skilltest and a system knowledge test with the taylorshop, taken after some exploratory trials.
Tab. 4-6 shows the results organized in the same way as with grades in Tab. 1-5. Again g is not the best predictor, it is dramatically outperformed by group factors and cell components. We will not discuss details of the Tab. 4-6 but concentrate on a comparison of g with the two most important components of the cell level. Fig. 13 and 14 compare both in a path-analytical modeling approach using intelligence, knowledge and performance. The total effect as the sum of all direct and indirect effect in terms of variance explained is 6,7 % for g and for the two cell components 30,7 %, the latter having direct and indirect effect over knowledge, as well! Obviously in aggregating our intelligence measure to the g-level an important amount of variance is diminished or canceled out, thus proving the multivariate reliability theory (Wittmann, 1988). In Fig. 14 the two faces of mental speed gain visibility. If pupils have more processing capacity with numbers the better their performance and the sooner they lead their tailorshop into fortune. The faster they are on simple verbal tasks the lower their performance. Simple speed does not pay off here and a more detailed analysis in Fig. 15a shows this phenomenon being a suppressor effect. Fig. 16a replicates that with simple speed on numbers. Obviously the variance part due to the usual time-limited presentation of all intelligence tasks has to be removed to get better predictions. How important it is to separately assess these two faces of mental speed is demonstrated in Fig. 15b and 16b if we use a test which does not distinguish between both. Predictive validity entrances dramatically after separating both!
Assume that an organization like the US-Army uses a test
similar to our compounds in Fig. 15b or 16b instead of a test
like in Fig. 15a or 16a for selecting their officers. What a
dramatic loss in predictive validity if we accept that the scores
in the business games are similar and content valid for what
officers are expected to do, i.e. strategic thinking and handling
complex situations. Inserting these pairs of validity
coefficients in Brogden-type utilitiy formulae like Frank Schmidt
and John Hunter do will demonstrate the billions of dollars in
opportunity costs, using g-type measure confounded in reasoning
and speed instead of a test able to disentangle both! We are
anxious about how the US-Army and others are fooling themselves
in using such a g-type measures!
Intelligence and working memory capacity
To demonstrate that g is not for all questions a suboptimal answer we shortly refer to our research on working memory and intelligence. Tab. 7-9 are organized as Tab. 4-6. The results indicate that a g-type component of 25 working memory tasks correlates pretty high with psychometrician's g of intelligence, replicating the work of Kyllonen & Christal that reasoning is (little more than) working-memory capacity. But that depends on what we define as little. If our K is reasoning, then we have to acknowledge that all three other operative group factors contribute substantial variance in explaining WMC-g. At least here we can profit much using the hierarchy framework for getting better explanations.
For more details on that project see also Süß and Oberauer
in the Poster-Session on: "Working memory and
intelligence".
References
Alderton, D.L. & Larson, G.E. (1994). Dimensions of ability: Diminishing returns? In M.G. Rumsey, C.B. Walker & J.H. Harris (eds.). Personnel selection and classification. Hillsdale NJ: Lawrence Erlbaum.
Brand, C.R. (1996). The g factor.
Campbell, P.T. & Fiske, D.W. (1959). Convergent and discriminant validation by the multitrai-multimethod matrix. Psychological Bulletin, 56, 81-105.
Carroll, J.B. (1993). Human cognitive abilities. A survey of factor-analytic studies. New York: Cambridge University Press.
Cohen, J. & Cohen, P. (1983). Applied multiple regression / correlation analysis for the behavioral sciences. 2nd ed. Hillsdale NJ: Lawrence Erlbaum.
Cohen, J. (1982). Set correlation as a general multivariate data analytic method. Multivariate Behavioral Research, 17, 301-341.
Cooley, W.W. & Lohnes, P.R. (1976). Evaluation research in education. New York: Irvington.
Herrnstein, R.J. & Murray, C. (1994). The bell curve. Intelligence and class structure in American life. New York: Free Press.
Jäger, A.O. (1984). Intelligenzstrukturforschung: Konkurrierende Modelle, neue Entwicklungen, Perspektiven. Psychologische Rundschau, 35, 21-35.
Kyllonen, P.C. & Christal, R.E. (1990). Reasoning ability is (little more than) working-memory capacity?! Intelligence, 14, 389-433.
Kyllonen, P.C. (1994). Cognitive ability testing: An agenda for the 1990s. In M.G. Rumsey, C.B. Walker & J.H. Harris (eds.). Personnel selection and classification. Hillsdale NJ: Lawrence Erlbaum.
Ree, M.J. & Earles, J.A. (1994). The ubiquitous predictivness of g. In M.G. Rumsey, C.B. Walker & J.H. Harris (eds.). Personnel selection and classification. Hillsdale NJ: Lawrence Erlbaum.
Süß, H.-M. (1996). Intelligenz, Wissen und Problemlösen. Göttingen: Hogrefe.
Wittmann, W.W. (1988). Multivariate reliability theory.
Principles of symmetry and successful validation strategies. In
J.R. Nesselroade & R.B. Cattell (Eds.), Handbook of
multivariate experimental psychology (2nd ed.) (pp. 505-560). New
York: Plenum.
06.08.97 Dietrich Wagener