Lehrstuhl für Psychologie II

ISSID

"International Society for the Study of Individual Differences"

July 19-23, 1997, University of Aarhus, Denmark

Symposium: New Directions in ability research

Organizers and Chair: R.D. Roberts & P. Kyllonen

CHALLENGING G-MANIA IN INTELLIGENCE RESEARCH:

ANSWERS NOT GIVEN, DUE TO QUESTIONS NOT ASKED

WERNER W. WITTMANN & HEINZ-MARTIN SÜß

UNIVERSITY OF MANNHEIM


Abstract

In the last decade the London school was on the rise again. Spearman's g attracted a lot of research interest. Carroll s (1993) heroic attempt to find all published correlation matrices and reanalyze them convincingly, supported Spearman's positive manifold thesis. All measures ever invented to measure intelligence correlate positively with one another, at least in unrestricted samples, which is the sine qua non condition for g. That little g is out there in the real world, there is no doubt and no real challenge exists in disputing this simple fact! What are we doing and what should we do about g? Answering questions like these bring us to the birthplace of our most loved and hated controversies and debates in psychology. Concentrating on g as Herrnstein and Murray (1994) or Brand (1996) did, or looking for the biological bases of g or its reducibility to speed of information processing, we easily forget the second important part of Carroll's message. Not only was g found but also evidence for hierarchical models of intelligence with g at the apex. What kind of questions can be asked with hierarchical models? Our questions should be related to the criteria we are most interested in. In seeking to ask good questions it is wise to look where the very smart guys are. What are the commonalities of questions they ask leading to answers which sometimes are reinforced with Nobel and other prizes? If we do this we will notice one general class of principles behind those successful questions, especially in physics, i.e. principles of symmetry. In psychology, many years ago Egon Brunswik incorporated them in his famous lens model. Using hierarchical variants of the lens model we were forced to think about the relationship between predictor and criterion model hierarchies. From what level of generality in the predictor or what level of generality in the criterion do the best predictions and explanations occur? Using the framework of a hierarchical variant of the Berlin model of intelligence structure (BIS) (Jäger, 1984; Wittmann, 1988) in predicting school grades or complex problem solving performance we found that the g-level was not the best level to predict and explain hierarchical variants of these criteria. For working memory capacity the g-level was very good, but even here we profited much from lower levels as regards explanation. The principles coined Brunswik-symmetry are demonstrated at a modification of Tucker's lens equation leading to explanations under what conditions predictions succeed or fail.

 


Author's address: Universität Mannheim

Schloß, EO

D-68131 Mannheim

Tel.: (0621)292-5639, Fax: (0621)292-2528

Email: wittmann@tnt.psychologie.uni-mannheim.de

Working papers about our research can be downloaded at our homepage:

http://www.uni-mannheim.de/fakul/psycho/welcome.html


Introduction

In discussions about the importance of intelligence many researchers have turned to Spearman's g as an important explanatory construct. The bell curve (Herrnstein & Murray, 1994) demonstrates many unexpected relationships to many criteria of utmost social importance using a g-type predictor and has attracted a lot of controversies and debates. Chris Brand has experienced the depublishing of his book (1996) about g by Wiley-Publishers because he discussed race differences in g. In predicting job performance Frank Schmidt, John Hunter and coworkers have demonstrated the virtues of general intelligence with enormous economic impacts.

The US-Army found in large scale-research projects using their ASVAB-test battery the ubiquitous predictiveness of g (Ree and Earles, 1994) and discussions concern the problem of diminishing returns, in further researching dimensions of ability (Alderton & Larson, 1994). It is understandable that due to of all these facts basic researchers want to better understand that little g, e.g. most prominentely Arthur Jensen standing for many others. John B. Carroll (1993) in reanalyzing all published correlation matrices found clear evidence for g but also for many factors at lower levels of generality and organized all what he found in a stratum taxonomy. Kyllonen (e.g. 1994) described the comprehensive cognitive abilities measurement (CAM) taxonomy which he and coworkers are developing and using at the Armstrong laboratory of the US-Air-Force. There is little to challenge today that g is at the apex of all taxonomies ever used if we organize them as a hierarchy.

Fig. 1 depicts many question marks and we will challenge that most of these questions are best answered from the level of g. It is important to have a strategy for asking questions to which nature gives clear answers. Looking around where the real smart guys and girls are we often stumble into basic sciences like physics. There we find out that nature gives answers and being pretty much impressed we should try to learn what the secrets of those questions are. Again and again we learn that these questions are woven around the key concepts of principles of symmetry. To name just a few role models, look at the theories and questions of Richard Feynman, Murray Gell-Mann or in Great Britain Michael Faraday and you'll find the secrets woven into principles of symmetry. Principles of symmetry also are the key concepts in psychology, especially in Egon Brunswik's work, i.e. his famous lens-model. Let see how we can profit from that symmetry!

The scope and virtues of Brunswik-symmetry

Brunswik-symmetry is a framework for structuring our questions concerning the predictive and explanatory power of intelligence constructs. Using hierarchical intelligence models at the predictor side, the Gestalt principles of the symmetrical properties of the lens model force us to conceptualize the criterion model as a hierarchical one as well. Once we have done this we can start asking what is the most symmetrical level of generality or hierarchy related to the g-level or to the group factor level or whatever level of interest!

Understanding the principles of aggregation is very important. Aggregation of parallel or at least partially parallel items, scales or parcels leading to more reliable scales at the aggregate level is what psychometric theory tells us. Thus we assume that the items to be aggregated have the same true scores but differ in error components. Aggregation leads to canceling or at least diminishing the error parts so that true score variance gets larger in relation to error variance at the composite and reliability enhances. But reliability not being validity is silent and blind to theory. Does the way we aggregate our measures really map the constructs we want to assess? In maximizing construct validity we need theory at least as a taxonomy. In Wittmann (1988) I proposed therefore to distinguish between wanted, unwanted and error variance. Wanted and unwanted systematic variance compose the true part of a score, hence reliability is always composed of wanted and unwanted true score components. Exploratory factor analysis applied to correlation matrices leads to factors composed of a mixture of wanted and unwanted systematic true variance because the correlation of our items is a function of the true score parts whether parts of it are wanted or unwanted. The factor scores basically are a weighted aggregate of the items forming the correlation matrix.

Relating these factor scores to criterion models of a known wanted structure and assuming we have measures for the criterion construct consisting only of wanted variance the correlation is attenuated due to the amount of unwanted variance in the predictor additionally due to the amount of random error variances both in the predictor and criterion measures. Unwanted systematic true variance introduces asymmetry which violates Brunswik-symmetry. If we try to map the true scores of a wanted latent construct and our proxies are polluted with unwanted true variance they are to a certain degree construct unreliable. I introduced that term to distinguish it from traditional psychometric unreliability stemming from random error variance, although most of us still prefer here the term construct validity, instead of construct reliability. But I insist on the distinction of construct reliability because we have to distinguish again wanted, unwanted, and error variance. In my terms construct validity only refers to the amount of overlap of wanted variance of our proxies to the true wanted variance of the latent construct of a predictor to that one of a criterion. So a measure consisting only of wanted and error variance but with only partial overlap of wanted variance in the predictor with the true latent wanted variance of the criterion is said to be not perfect psychometrically reliable and construct reliable and also lacking in perfect construct validity. A measure with perfect overlap of wanted variance in predictor and criterion but augmented by additional unwanted true variance and error variance is perfect construct valid but lacking in construct reliability and psychometric reliability as well. The two cases are to be distinguished in the direction of asymmetry. Fig. 2 depicts the case of perfect Brunswik-symmetry as the true wanted latent structure of a predictor and a criterion hierarchy respectively. Cases 1 to 4 in Fig. 2-5 denote four variants of asymmetry. Fig. 2-5 help in visualizing what we tried and go on to explain in verbs and numbers.

Lack of overlap of wanted variance between predictors and criteria leads to lack of construct validity, lack of construct reliability and lack of psychometric reliability in our normal error-prone measures. Complete overlap of wanted variance but additional unwanted and random error variance leads to measures with perfect construct validity but lack of construct reliability and lack of psychometric reliability. The measures in our four cases of asymmetry in these terms are as follows. In case 1 the predictors and the criterias psychometric reliability may be perfect, they may be perfect construct valid for different constructs, but due to total asymmetry either the criterion or the predictor is perfectly construct unreliable depending on what your theory tells, you should be interested in. Assuming theory focuses on the criterion side, then the predictor also being perfect psychometrically reliable is completely construct unreliable but probably perfect construct valid and construct reliable for a different independent construct. The framework of case 1 hints to how we should test theories with the convergent and divergent (or discriminant) validation strategy (Campbell & Fiske, 1959). We always should have measures mapping what construct A CR is, but also what construct A CR is not; i.e. non-A CR e.g. B CR. To predict or explain A CR we need at the predictor side of the lens, measures for A PR but at the same time non-A PR i.e. B PR. Testing our theory should lead to:

This is what is meant with perfect Brunswik-symmetry. The only unsolved problem, which still needs to be dealt with is at what level of generality we are with that pattern. Contrasting case 4 to case 1 illuminates that problem and hints to solutions. In case 4 the asymmetry is on both sides of the lens. Say our theory focuses on a construct called a. But our measures are contaminated with true non-a variance, non-a PR and non-a CR. This kind of asymmetry suggests that non-a PR is different from non-a CR; i.e. the correlation between both is zero. To test the predictive validity of a PR in relation to a CR we should remove non-a PR from the predictor model and non-a CR from the criterion model. This can be done through Jack Cohen's ingenious set-correlation system with the bipartial variant (Cohen 1982, Cohen & Cohen 1983). If our theory is true the bipartial correlation should be close to one after correcting the psychometric unreliability of the partialled predictor and the partialled criterion. If we have the predictor and the criterion as hierarchical models we could use the lower level group factors for investigating their pairwise relationships. If we find a higher correlation between some pairs of these factors we know that symmetry is at a lower level of generality than in case 1. Case 2 and Case 3 denote one-sided asymmetries . In case 2 the predictor either contains unwanted variance as regards the criterion or the criterion lacks parts of wanted variance, in case 3 either the criterion contains unwanted variance or the predictor lacks wanted variance.

The parameters R PR and R CR in the lens model equation map the degree of construct reliability of our predictor and criterion models respectively (see Fig. 7 and 8 for linear and nonlinear variants of that equation). In circumstances where we have no theory we do not know what our wanted constructs are and can give no estimation of these parameters. Having no theory nor a taxonomy is very embarrassing because it demonstrates that you are telling us not knowing what you want! All research starts in an exploratory way, doing a lot of trial and error and so we developed measures for a construct of intelligence where we only vaguely know what kind of facets constitute it. Yet decades of research on intelligence have supplied us with diverse taxonomies like Guilford's SOI-model, Cattell and Horn's distinction of fluid and crystallized intelligence and their related hierarchical models, the model of the British school (Vernon and Burt) and many other variants. From the very start of intelligence research we have in a Popperian sense the keen hypothesis of Charles Spearman of a positive manifold that all measures of intelligence are positively correlated justifying a general intelligence factor g and testspecific factors. The construct validity of each measure can therefore always be explained by two factors a very general and a specific one. Spearman's challengers concentrated on the specific part of an intelligence measure assuming that these specifics might be more important then originally thought of. These specifics might also be correlated with other specifics leading to the notion of group factors, turning specific true variance into common true variance. After expanding the set of intelligence measures it might turn out what was thought to be specific variance might be much more important than general g-factor variance, a track opened by Spearman's opponent L.L. Thurstone. Over one hundred years of research on intelligence has supplied us with a multitude of correlation matrices. John B. Carroll (1993) in a heroic enterprise reanalysed all these matrices resulting in a three stratum hierarchical model with Spearman's g at the apex but also with many group factors at different levels of generality (i.e. stratums). Although debates will go on what the best hierarchical model of intelligence is, the very fact that g is there but also different group factors no longer needs to be challenged. The compromise, the synthesis lies in hierarchies. As it is often in science competing theories are settled by subsuming and synthesizing them into more general models or theories. What is not solved is what level of generality is the most important one.

Prediction and explanation of school grades

The first example in challenging the ubiquity of g is grades. School grades can be organized into a hierarchical model as well, with total or averaged grade at the apex. A single examination in a teacher made test consists of several tasks which a pupil has to solve. A grade in an topical area then is given, transforming the number of correct solutions into a grade e.g. with six levels. In the German school system a one is the best and a six the worst grade. Within a topical area over a school year normally two to five different subject class tests are given and these grades again are averaged to build the total grade in that subject area, i.e. mathematics for the whole school year. In our study total grades for a school year were available for seven disciplines, four grades for science, i.e. mathematics, physics, chemistry, biology, and three grades for language, i.e. German, first foreign language (mostly English) and second foreign language (mostly French). A factor analysis with these seven grades using Varimax-rotation lead to two group factors which we labeled as science and language respectively, because the salient loadings were the three science disciplines on the first and the three language disciplines on the second. Biology had salient loadings on both of them.

Fig. 9 depicts these factor analytical results. Total grade as a g-type measure of grades can be derived by averaging over all seven grades or after a nonorthogonal rotational solution. In a hierarchical version of the Brunswik-lens we now can start asking from what level of generality in the intelligence hierarchy what level of generality in the grade model is to be best predicted. Tab. 1 gives the results. For the rationale behind the Berlin model of intelligence see Wittmann (1988). The table contains as columns the different hierarchical levels of the intelligence models, i.e. g, the four group factors of the operative mode, the three group factors from the content mode, all seven group factors together and the 12 components derived from an average of pairwise combinations of operative and content mode (4x3 cells). The rows contain the single grades, total overall grade, and the first two unrotated (F2URGES, F2URNVG) and rotated group grade factors (F2RNAT, F2RGEI).

The results are given in terms of either bivariate or multiple correlation coefficients squared, always first rows within grade level as unadjusted and second rows as adjusted for number of predictors used. It is clearly visible that the g-level is not the best (most symmetrical) level in predicting the different levels of the grade hierarchy. The language group factor and almost all language grades correlate around zero with g! The picture is different for science. There we find correlations different from zero and the highest one for mathematics. Looking at the best predictions from other levels of generality we find that the highest amount of variance explained stems from the 12 cell components, averaging content and operative mode. The second unrotated group factor F2URNVG which is a bipolar factor contrasting science with language grades is predicted up to 50,6% (adjusted 45,8%), the best result in all comparisons. The table gives a variety of other interesting results further challenging the ubiquity of g. All language grades are substantially predicted from the cell-level and looking at the group factor level especially from the content group factor level. The last three columns showing the results of a commonality analysis (Cooley & Lohnes, 1976) demonstrate with language grades that content factors have a higher unique variance whereas with science grades the uniqueness of operative factors outperforms that from content factors. Profiting from the different levels of hierarchy we can better explain the different successes in prediction. Tab. 2 gives these explanations from the group factor level in terms of beta-weights from regressing grades on intelligence operative and content group factors separately. Now it becomes clear that for science grades the factor K from operative mode, i.e. processing capacity for complex informations, which basically is a reasoning factor, and the number factor from that content mode got the highest weights. With language grades the verbal content factor is most dominant. In Tab. 3 we go to the level of the 12 cell components. The columns contain the cell components, the rows the various grades. The first row within grade gives the beta-weights of regressing grades on that level of intelligence facets in a simultaneous regression equation. The second row shows the beta-weights after step-wise regression using inclusion level p < .15. The third row denotes the simple bivariate correlations. We have starred the significant weights or correlations to give a visualization aid in recognizing the most important facets (cell components). KV being processing capacity for complex verbal content (Americans will prefer the term reasoning with verbal content) and BV being speed on relatively simple verbal tasks, attract most of the stars. Please note differences in signs! A negative sign means the higher the intelligence the better a single subject grade. The rotated group factors science (F2RNAT), language (F2RGEI) and the first unrotated general grade factor are scaled as high scores mapping good grades. The second unrotated grade factor contrast science grades with language grades meaning high scores good grades in science and low scores good grades in language. Obviously and probably as no surprise verbal content dominates in school. School is at least in Germany a real "talk-show"! Reasoning and speed with verbal content is very important. Tab. 3 also gives after a detailed study a lot of hints to suppressor effects. In Fig. 10 we give these details for the science and in Fig. 11 for the language group factor. Fig. 12 puts both together in a path-analytical framework using EQS as a structural equation modeling tool. The most important results are the negative path coefficients! Pupils with higher scores on reasoning with numbers and figures have worse grades in language than those with lower scores, although all facets of reasoning correlate positively with one another. Whether pupils with higher ability on KF and KN do not invest their talents in the topics of language or whether language teachers do not teach language using visualization and numerical reasoning or whether language teachers do not understand pupils talking in complex figures and numbers to them is unclear. Yet our favorite hypothesis is that in teaching a one-sided verbal approach dominates and pupils with talents in figural and numerical reasoning do not have the learning opportunities they need. Looking at science we see that those who are fast at simple verbal tasks have the worse science grades. It obviously pays off in science to be slower, think longer and probably not to pollute your working memory resources with irrelevant verbiage. On the other hand science needs all three facets of reasoning.

Prediction and explanation of performance in complex business games

Problem solving performance in complex intransparent situations like in computer based business games is a key criterion and a kind of gold standard for intelligence. How well can the performances be predicted and explained from different levels of intelligence. In a large study conducted by Süß and coworkers (Süß, 1996) in Berlin 125 pupils worked 5 times over a period of two years on a complex business game named tailorshop, where they should manage the computer simulated company into fortune. Additionally they took the Berlin structure of intelligence test (BIS), a computer knowledge and skilltest and a system knowledge test with the taylorshop, taken after some exploratory trials.

Tab. 4-6 shows the results organized in the same way as with grades in Tab. 1-5. Again g is not the best predictor, it is dramatically outperformed by group factors and cell components. We will not discuss details of the Tab. 4-6 but concentrate on a comparison of g with the two most important components of the cell level. Fig. 13 and 14 compare both in a path-analytical modeling approach using intelligence, knowledge and performance. The total effect as the sum of all direct and indirect effect in terms of variance explained is 6,7 % for g and for the two cell components 30,7 %, the latter having direct and indirect effect over knowledge, as well! Obviously in aggregating our intelligence measure to the g-level an important amount of variance is diminished or canceled out, thus proving the multivariate reliability theory (Wittmann, 1988). In Fig. 14 the two faces of mental speed gain visibility. If pupils have more processing capacity with numbers the better their performance and the sooner they lead their tailorshop into fortune. The faster they are on simple verbal tasks the lower their performance. Simple speed does not pay off here and a more detailed analysis in Fig. 15a shows this phenomenon being a suppressor effect. Fig. 16a replicates that with simple speed on numbers. Obviously the variance part due to the usual time-limited presentation of all intelligence tasks has to be removed to get better predictions. How important it is to separately assess these two faces of mental speed is demonstrated in Fig. 15b and 16b if we use a test which does not distinguish between both. Predictive validity entrances dramatically after separating both!

Assume that an organization like the US-Army uses a test similar to our compounds in Fig. 15b or 16b instead of a test like in Fig. 15a or 16a for selecting their officers. What a dramatic loss in predictive validity if we accept that the scores in the business games are similar and content valid for what officers are expected to do, i.e. strategic thinking and handling complex situations. Inserting these pairs of validity coefficients in Brogden-type utilitiy formulae like Frank Schmidt and John Hunter do will demonstrate the billions of dollars in opportunity costs, using g-type measure confounded in reasoning and speed instead of a test able to disentangle both! We are anxious about how the US-Army and others are fooling themselves in using such a g-type measures!

Intelligence and working memory capacity

To demonstrate that g is not for all questions a suboptimal answer we shortly refer to our research on working memory and intelligence. Tab. 7-9 are organized as Tab. 4-6. The results indicate that a g-type component of 25 working memory tasks correlates pretty high with psychometrician's g of intelligence, replicating the work of Kyllonen & Christal that reasoning is (little more than) working-memory capacity. But that depends on what we define as little. If our K is reasoning, then we have to acknowledge that all three other operative group factors contribute substantial variance in explaining WMC-g. At least here we can profit much using the hierarchy framework for getting better explanations.

For more details on that project see also Süß and Oberauer in the Poster-Session on: "Working memory and intelligence".

References

Alderton, D.L. & Larson, G.E. (1994). Dimensions of ability: Diminishing returns? In M.G. Rumsey, C.B. Walker & J.H. Harris (eds.). Personnel selection and classification. Hillsdale NJ: Lawrence Erlbaum.

Brand, C.R. (1996). The g factor.

Campbell, P.T. & Fiske, D.W. (1959). Convergent and discriminant validation by the multitrai-multimethod matrix. Psychological Bulletin, 56, 81-105.

Carroll, J.B. (1993). Human cognitive abilities. A survey of factor-analytic studies. New York: Cambridge University Press.

Cohen, J. & Cohen, P. (1983). Applied multiple regression / correlation analysis for the behavioral sciences. 2nd ed. Hillsdale NJ: Lawrence Erlbaum.

Cohen, J. (1982). Set correlation as a general multivariate data analytic method. Multivariate Behavioral Research, 17, 301-341.

Cooley, W.W. & Lohnes, P.R. (1976). Evaluation research in education. New York: Irvington.

Herrnstein, R.J. & Murray, C. (1994). The bell curve. Intelligence and class structure in American life. New York: Free Press.

Jäger, A.O. (1984). Intelligenzstrukturforschung: Konkurrierende Modelle, neue Entwicklungen, Perspektiven. Psychologische Rundschau, 35, 21-35.

Kyllonen, P.C. & Christal, R.E. (1990). Reasoning ability is (little more than) working-memory capacity?! Intelligence, 14, 389-433.

Kyllonen, P.C. (1994). Cognitive ability testing: An agenda for the 1990s. In M.G. Rumsey, C.B. Walker & J.H. Harris (eds.). Personnel selection and classification. Hillsdale NJ: Lawrence Erlbaum.

Ree, M.J. & Earles, J.A. (1994). The ubiquitous predictivness of g. In M.G. Rumsey, C.B. Walker & J.H. Harris (eds.). Personnel selection and classification. Hillsdale NJ: Lawrence Erlbaum.

Süß, H.-M. (1996). Intelligenz, Wissen und Problemlösen. Göttingen: Hogrefe.

Wittmann, W.W. (1988). Multivariate reliability theory. Principles of symmetry and successful validation strategies. In J.R. Nesselroade & R.B. Cattell (Eds.), Handbook of multivariate experimental psychology (2nd ed.) (pp. 505-560). New York: Plenum.


Tables

Table 1

Table 2

Table 3

Table 4 & 7

Table 5 & 8

Table 6 & 9

 

Figures

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Figure 10

Figure 11

Figure 12

Figure 13

Figure 14

Figure 15

Figure 16


06.08.97 Dietrich Wagener