LARGE SCALE EXAMINING--THE DEVELOPMENT OF OBJECTIVE TESTING
My assignment this afternoon is to discuss with you the question of large scale examining for university entrance: to describe for you the methods of assessment that have been developed in the United States in the upper reaches of the secondary school and to share with you some thoughts about the aim and purposes and achievements and problems in the work and although this is my first visit to Scotland, I have been lucky enough to travel just a bit for the College Entrance Examination Board in other parts of the world and I have come to realise that as regards the U.S. system of university entrance or college admissions, as we tend to call it, neither our system nor the tests we have developed for use in it are precisely exportable and that our experience can be at best only tangentially applicable in another setting. I discovered in Japan for instance that some of my colleagues had apparently suggested that a national system of objective tests would eliminate the traditional 'shek en dee go coo' or 'hell of examinations' that there characterises the college admission scene. Among other things, however, they failed to foresee the strike threats on the part of students who perceived the new tests as part of an imperialist right-wing American inspired plot. Or, in Latin America, where the Professors told me that the U.S. type objective tests are regarded as a left-wing Communist threat to the perpetuation of the social and intellectual aristocracy. You realise that these two interpretations represent the extremes of a rather long continuum and that they provide you with a pretty wide range over which to let your imagination roam when it comes to guessing the political implications which might be attached to objective testing in Scotland. I know they provide me with ample warning not to try to advise you how to use such instruments and suggest rather that I attempt simply to describe to you how we have been and are using these tests in our own circumstances and leave you to the task of interpreting our experience to your situation. Now, although the multiple choice tests were introduced in our system in the twenties, it was not until the last decade that students were permitted to know their scores, the results of the tests. Into the fifties the rule of secrecy applied and the mystery of
2
the scores provided a very convenient refuge behind which harried Admission Officers of highly selective institutions could hide from any student or parent who questioned the College's decision to reject him, something simply could be muttered about inadequate college board score and no one would be any the wiser and those were the days and circumstances when some Admission Officers were said to use the tests much as a drunk uses a lamp post, more for support than for illumination. Ten years ago the veil of secrecy had finally been lifted by the forces of enlightenment and we at the College Board have been engaged ever since in the always complex and sometimes very frustrating task. of attempting to explain to each succeeding class of secondary students, not to mention parents, teachers, counsellors, headmasters and other wellwishers, to explain to them the meaning of their test scores, for you realise that our test scores have no meaning except in relation to the other criteria which are used in the admissions process: the criteria by which students decide to apply to which college and by-which colleges decide to accept which students. And just as test scores have no meaning except in context, so the tests themselves can be comprehended really only within the total setting, that is in the light of all the relevant circumstances in the. conditions in which the tests have been developed and in which they operate. So in order for me meaningfully to describe this total set of conditions as it pertains to the United States and from which remember I have invited you to draw your inferences with respect to the conditions here in Scotland, it is necessary for me to indulge in a series of mini-analyses relating to organisation, history, sociology, economics, education and psychometrics, an analysis which of necessity must be a little bit more involved than my earlier attempt at political science.
Well, first a quick overview of this College Entrance Examination Board for which I work. From an original exclusive interest in examinations for university entrance, which maintained really through World War II, the Board's arena of activity has in the last two decades grown to include testing for pre-college guidance, for college work done in secondary school, for credit by examination at the college level, for testing proficiency in English for use in evaluation of foreign students and testing in Spanish for use in Puerto Rico and Latin America. It includes also the central processing of family financial data to assist colleges in the award of scholarships and other financial aids, the training of admission officers and financial aid officers, the formal publication of more than 15,000,000 pieces of literature annually under some 100 or so titles, the management of a budget that
3
runs to something over (Ponds) 10,000,000 a year and the conduct of a seemingly endless round of meetings and conferences.
Now I am not sure' what inferences you will draw from this recital of the activities of the College Board-perhaps that it's only another manifestation of our penchant over there for big businessbut I hope that from some of the things that I shall say later you realise that some part of the nature and extent of all these activities really only reflects the complexities of university entrance in the United States. Nevertheless, the heart of our programme remains today our admissions tests and our admissions testing programme. Although I. return later to a more comprehensive description and psychometric analysis, you ought to know a little bit about the size and dimensions of the programme right at the beginning. The number of examinations approached 2,700,000 last year; these involved about 1,700,000 registrants or individual candidates, from nearly 15,000 secondary schools tested on five Saturdays spaced throughout the year in nearly 5,000 testing centres throughout the world.
Substantively, the Admissions Testing Programme exists in two kinds: the three-hour Scholastic Aptitude Test, or S.A.T., and a series of one-hour subject matter Achievement Tests in English, Mathematics, the Physical Sciences, History and Social Studies and in Foreign Languages. Most of the member colleges of the College Board require the candidate to take the S.A.T., the Aptitude Test and many require them to take both the S.A.T. and several, usually three, Achievement Tests.
Now, despite the implications, of its name, the Scholastic Apti- tude Test is not an intelligence test. Rather it is a multiple- choice, generalised achievement test yielding two scores: verbal reasoning (V) and mathematical reasoning ability (M). The Achievement Test except for the English composition test which may include a short essay also consists of multiple-choice questions and students are guided by the requirements of the colleges to which they intend to apply for admissions in deciding which tests to take and on which of these five regular Saturday 'dates to take them. Standardised scores are reported, on a scale from 200 to 800, to colleges indicated by the student and to his secondary school. "Pass-fail" evaluations are not made by the College Board and each college is encouraged to use test scores in accordance with its own unique experience as to their validity.
Well, so much for the tests at the moment and now on to the setting in which they must be observed to be understood, to these mini-analyses of mine involving sociology, economics and, for a star-
4
ter, history. The College Board, as you have heard, was founded in 1900 in response to the concern of secondary school leaders over the variety of subject matter entrance requirements among 'the leading colleges in the north-east part of our country. Faced therefore with the necessity of having almost as many, literally almost as many, different courses in a single subject as there were colleges to which their students aspired, school headmasters called for and achieved the adoption of uniform college entrance requirements. A common college preparatory content was achieved through the construction of commonly agreed upon, carefully prescriptive, syllabi and this was followed by the agreed use of uniform college entrance examinations based on the content of those syllabi. These first "College Boards," and this is a phrase I will use because this is the way that our examinations are referred to generally, the "College Boards" were traditional written examinations and so too were the second generation of College Boards, the "Comprehensives." Still using the essay, free response, written examinations, they were based on a much more generalised course content in order that colleges might draw upon students from secondary schools which, because they did not traditionally prepare for college board colleges, were not following the standard carefully prescriptive syllabi. Well, to jump history fast, by 1940 the "Comprehensives" had become for most students the college boards, but not for all and, to explain why, it's necessary to go back again to 1900 and pick up three more historical threads. While the colleges in the north-east were seeking to solve the admissions dilemma through the medium of common entrance examinations, another group of colleges in our mid-west, under the leadership of the University of Michigan, was developing a system of admission by accreditation, a system which said to a secondary school in effect that if the Faculty members had studied the right courses in the right universities, students who pass their courses and received diplomas would automatically be admitted to higher education. Thus at the beginning of World War 11 there was operating in the United States two quite different university entrance systems. A system of admission by examination and another system, quite apart, of admission by accreditation and it's not surprising under the circumstances, that the curriculum followed by students within and between the two systems could and did vary and it varied importantly, because, you must remember, in the United States we have local control of education, In secondary education it is the responsibility of the local School Board to set the curriculum. That the curricula varied widely is in part explained by a line of development which involved the appearance of the progressive education movement in the United States and this
5
phenomenon which budded in the twenties and, blossomed in the thirties, spawned curricular innovations among schools preparing stu- dents in both systems that further inhibited College Board colleges in their attempts to attract nationally representative student bodies.
Fortunately for the College Board we had in Carl Brigham, a professor of psychology at Princeton University, a man of splendid vision, who had begun experimenting in the mid-1920's with multiple choice, objective tests and by the late 1930's these then new instruments had proven through the very large number of questions that they can ask, not only to be able to sample student achievement across a variety of curricula but also to make surprisingly accurate estimates of the basic scholastic aptitudes apparently essential to successful college work. By 1940 then, the S.A.T., the Aptitude Test and these onehour Achievement Tests were being used by member colleges of the College Board in lieu of the Comprehensives-still, written- using these new tests for a limited number of students, particularly applicants for scholarships and aid, from schools well outside the realm of those which were regularly preparing their candidates.
These were the circumstances in 1941 when history in the more usual sense intervened and the United States became directly involved in the Second World War. Travel restrictions made it impossible to assemble readers to grade centrally, as was the custom, written Corn- prehensives and so the template-scorable Objective Tests became the "College Boards."
Now, this period of ancient College Board history left many residual implications for the modern age of college admissions testing in the United States but the points I want to make here are that, as one studies our large-scale use of objective testing for university entrance, one must remember that the ultimate battle between the respective proponents of essay examinations and objective tests was never joined; that the ultimate definitive debate over the relative merits of hand-written, people-graded examinations and hand-stroked, machine-scored tests was never held. It was the fortunes of a real war, so to speak, that swept S.A.T. and the Achievement Tests through our cross roads of decision in the United States without a murmur of dissent and that some of the continuing controversy over the use of objective tests in our country reflects the fact that- the debate never took place, that the issues were never clearly drawn, the necessary accommodations and compromises never satisfactorily derived.
Let me turn now to sociology and economics and here in the interests of time I must indulge in that favourite pastime of educa-
6
tional burreaucrats like myself gross over generalisation. We have in the United States been long engaged in the process of democratisation of education, a process that has both social purpose and economic impetus: young people have been successively kept in school longer not just because education is a good thing but because it helps to meet the demands of an increasingly industrialised and technological economy by cutting down on the untrained and building up the trained man-power pool. Now taking this century in thirds, and here come. the generalisations for which I do apologise, the first saw the achievements of universal primary and secondary education and the transformation of secondary education from an upper class privilege to middle class prerogative. Before that transformation the original syllabi-based examinations provided a neat conjunction between se- condary schools and colleges that were serving essentially the same constituency. But by the time it was completed even the less restric- tive comprehensives were feeling the strain.
In the second third just ended the broadening of educational opportunity in the United States has continued. Secondary education has become for us now virtually universal and higher education has been opened up to the middle class. To the words of others I shall leave the assertion of the proposition that the College Board objec- tive tests played an important supportive role and you will find in your document that was prepared in advance a quote from an article on the revolution in admissions at Princeton University. I will quote here only from the end of it and here the writer says "there is no doubt whatever that the current form of the College Board tests has been a major factor in promoting social mobility in the United States. That able boy from the Middle West could now be identified and given an opportunity for a Princeton education."
Now, as we begin the final third of the Twentieth Century, we have as a nation committed ourselves if not to universal higher edu- cation at least to universal opportunity beyond the secondary school. Roughly 50%, a little less today, of the age group in the United States does continue education in some form beyond the secondary school. As we enter this latest phase, it is not surprising that instruments which served so well in opening up college opportunities to the middle class do not appear adequate to the task of serving all classes and joining universal secondary education successfully to universal higher education, of dealing, with the average and peculiar man at the lower end of the normal distribution curve. And so it is that some of the criticism that you will hear in the United States about objective tests, about university entrance tests, is that they discriminate against some-
7
body. And so it is too that we at the College Board are embarked now in the process of seeking to develop the next generation of measurement devices to assist in achievement of universal higher education, and we would like to break out of the mould of the paper and pencil test here and to develop other instruments that will help us to identify the creative, the highly motivated, the manually skill- ed person.
Having outlined this socioeconomic line of development, linking examinations for university entrance with the democratization of education, I should like at this point to go back, pick up, spin out some of the other threads that I left dangling earlier in these re- marks but which have to be woven into the pattern of circumstances in which our large scale programme of college admissions testing must be observed if it is to be understood. For example, I touched briefly on the two systems of university entrance that had developed and matured prior to World War II: Examination and Accreditation. Now the growing pressures for college admission, uneven pressures to be sure-extremely severe at some institutions and effectively non-existent at others because students have a choice you know of picking and choosing any college they wish-this increasing demand for higher education rendered both systems inadequate to the new circumstances. Both were predicted on the pass/fail philosophy. In one setting either you passed the examinations or you didn't, in the other either you graduated from an accredited secondary school or you didn't. In both, either you qualified for entrance and were automatically admitted or you did not qualify and were automatically rejected. However, as the pressures grew in these post-war years entrance to selective institutions became not just a question of whether a student qualified but one of by how much he qualified. For institutions with more qualified students than they could accept had to have some means by Which to rank order candidates in order to make satisfactory choices among them, to predict which students would succeed better in a particular institution of higher learning. Now by another set of gross over- generalisations let me try to make the case this way. In the admission by examination camp, one problem was that the written examinations while graded with great care and reliability around the pass/fail point failed by reason of variation of the reader estimates to spread the total group of pass candidates with equitable precision. This was something that the objective tests could do but the price, as I tried to suggest earlier, was a loosening of the connection between the examining instruments and the specific subject matter which the student had studied in secondary school.
8
But because the objective tests could only sample what he had learned, some way had to be found of comprehending what a student had studied and how well he had succeeded in those studies in a secondary school. This his record, his grades in school could do. In the admission by accreditation camp on the other hand, it was the variation in grading standards among schools that had produced an inequity for which externally administered objective tests could compensate. So what happened was this. In the early 1950's there was a convergence or confluence and on occasion I fear, confusion, of the two systems of admission wherein what courses a student had studied and how well he had achieved in what he had studied and how well he had done in his tests, all three were taken into account in the admissions decision. Using these factors in unique combinations, colleges were able to make much more reasonable predictions of college performance that they had ever been able to achieve under either of the original systems alone using one of the factors in isolation.