PRINCIPLES OF SCALING AND THE USE OF GRADES IN EXAMINATION

A.E.T. BARROW Secretary, Council for the Indian School Certificate Examinations, New Delhi

National Council of Educational Research and Training NEW DELHI

January, 1971

PRINCIPLES OF SCALING AND THE USE OF GRADES IN EXAMINATIONS

1. The sanctity of raw marks

The myth of the omniscience of the numerical marks awarded by examiners is nurtured in India and this sanctity accorded to raw marks is one of the main hurdles in the reform of the examination system.

An illustration of this is quoted from the "Statesman" of 26 September, 1970.

"THE ONE ON TOP"

"The............Board of Secondary Education has upgraded successful-student, who stood second in the final examination to first position, reports PTI.

The Board amended the original 50-name merit list of the March 1970 examination, declaring......... of........ School at ............ to have secured the first position.

The amendment followed detection of an error of only one mark, on a request for verification by.........., in the total secured by him in Sanskrit. This increased his percentage from 87.13 to 87.25-0.11% more than that of........of.........,who had topped the original merit list."

0.11% more than the next candidate-incrediblel Yet, this form of absolute worship of marks of examiners in India is 'absolute'.

The above quotation is not meant to be a criticism of any par- ticular examining board, but a criticism of the present system.

2.Characteristics of a good examination: reliability and validity

Studies of the reliability of traditional examinations have been conducted from time to time, and the earliest enquiries, those of Valentine (The Reliability of Examinations-University of London Press, 1932) and of Hartog and Rhodes (An Examination of Examinations, 1966) brought into prominence the major defects of marks

2

of examiners. They highlighted the need for reliable and valid examinations.

The disturbing facts revealed, by the enquiries mentioned above, led also to efforts to try and improve and reform the traditional type of examination.

The main characteristics of a good examination are reliability and validity. Reliability can best be defined as consistency. An instrument that measures consistently is reliable. Thus, taking an example from ordinary life, a tape measure as a means of measuring length or height is obviously a more reliable instrument than a piece of elastic. Validity is best defined as "the extent to which a test or examination does what it is designed to do."

The concepts of reliability and validity will become clear if the relationship between them is illustrated.

A test can be perfectly reliable and yet invalid. Thus, for instance, if English composition is marked by the number of words written the measuring instrument would be perfectly reliable but the purpose of the examination namely, to assess linguistic ability, communication of adequate and relevant ideas and clear and appropriate arrangement of subject matter, would not be achieved and, therefore, the examination would not be valid. In designing an examination, therefore, emphasis must not be laid on reliability to the deteriment of the validity of the examination. The problem, therefore, of ensuring the reliability of an examination and not affecting its validity must be the main pre-occupation of examination reform.

India is moving from the stage of an educated elite towards that of an educated society and there is no force which can prevent this democratic movement. The explosion of numbers in our examination system is a stark fact. Reform, therefore, in the system of examinations must be based on the increasing use of statistical me- thods. As far back as 1962 a Committee of the University Grants Commission in their report on Examination Reforms recommended:

"The present methods of marking examination scripts and of combining and tabulating marks in university examinations without reference to recognised statistical procedures are not satisfactory. The procedures will have to be developed to make marking and combining of marks more objective."

Two problems are thus raised:

1. methods of coordination of the marking of scripts in individual subjects;

3

2. combining of marks secured by a candidate in' different subjects offered by him.

These problems are accentuated and magnified in mass conducted examinations in which thousands, nay lakhs, of candidates are in- volved.

The first issue, namely, marking of scripts in individual subject in an examination conducted for a large number of candidates raises the age-old question of the subjective element entering into marking and, therefore, invalidating the marks of examiners because they are not comparable. Stated in another way, it means that if the same scripts are given to different examiners it will be found that the marks given by them vary very considerably. The problem then is how to remove the subjective element and bias of individual examiners.

3. Standardizing examiners

Before dealing with the statistical procedures required to remove the subjective element and bias of individual examiners in the marking of scripts, certain refinements in the setting, moderation and marking of scripts will be considered to help in this process.

As the preparation of a question paper is a time-consuming pro- cess the work should start over a year before the date of the examina- tion.

The first procedure is to draw up a blue-print of the question paper to be set, so that the validity of the examination in that sub- ject is achieved, that is, the purpose of the examination is ensured. The blue-print will indicate the proportion of marks to be allotted to the areas of knowledge, skills, concepts, etc., which are to be tested. Thus, in * Geography a blue-print might be drawn up as in the table given on page 4.

The Chief examiner or the paper setter must base his questions on the blue-print taking into consideration the scope of the syllabus, whether the question papers are of equivalent standard to the question papers of the previous years, the age group of the candidates, the number of years of study for the prescribed course and such other relevant factors.


* Adapted from Examinations Bulletin No.3-The Certificate of Secondary Education: An introduction to some techniques of examining- Secondary School Examinations Council, England,

4

 
        
                                          
Behaviour Content Know- Under Appli- Rele- Total ledge of stand- cation Skills vant facts ing of of con native etc. con- cepts lnsight cepts
India 4 5 5 2 4 20
World Geography 4 5 5 2 4 20
Special Regions 4 4 4 4 4 20 World Issues 5 5 3 3 4 20 Local Geographi- cal Experience 4 3 3 6 4 20 21 22 20 17 20 100

In drawing up this first draft the chief examiner should be as- sisted by senior colleagues. This draft must then be sent to a moderator whose function is to safeguard the point of view of the candidates who are taking the papers. He must ensure that the papers are technically correct and that they are a fair and sufficient test for the candidates for whom they are intended. The moderator must submit a report on the draft question paper.

Thereafter, the report must be considered at a meeting of the chief examiners of the different papers in that subject, the moderator and experienced senior examiners. If necessary, questions may be rejected or modifications carried out in accordance with the decisions taken at the meeting.

New versions of questions must again be submitted to the mode- rator and, if necessary, another meeting of the chief examiners in the different papers in that subject convened till agreement has been reached on the final form of the paper.

It is important that in subjects where problems are set, e.g., in the sciences and mathematics, assistant examiners who have not been

5

responsible for drawing up the questions or reviewing, them should be given the task of working the draft questions and providing solutions. The difficulty of the problems and the validity of the time allowed for the paper are thus tested.

After this and before the examination begins, the chief examiner with the help of senior colleagues must draw up the scheme of marking which will be used by all assistant examiners. The scheme will vary in length and in detail according to the nature of the subject and the paper. In general, the scheme should set out the principles of marking which are to be observed, maximum marks which are to be allotted to the various questions, steps of working and the, points or versions which are to be rejected or accepted. Thereafter, the scheme must be circulated to the assistant examiners to be studied by them.

When the scripts have been received, the chief examiner must mark a certain number of scripts, selected specimens which are typical of the various standards of attainment or which illustrate points of particular interest. Photographic copies of these specimens must be supplied to assistant examiners and then a co-ordination meeting of all examiners must be summoned. The meeting will discuss the scheme of marking, which may be amended or added to in the light of the scripts which have been seen. The spacimen scripts will then be marked independently by all the assistant examiners, the discrepancies discussed and investigated and rulings given by the chief examiner on doubtful points.

The assistant examiners then begin the process of marking, following the marking scheme with the aid of the specimen scripts. Where the number of candidates is large, for every four or five assis- tant examiners a senior examiner known as a 'team leader' should be appointed to scrutinize the marked scripts of the assistant examiners. These in turn should be submitted to the chief examiner who reviews the sample scripts of the assistant examiners and, if necessary, holds discussions with the team leader and assistant examiners. The whole purpose of the processes described above is to standardize the examiners.

Other factors, which will help in the standardization of exami- ners are fair remuneration, a limited number of scripts (not more than 300 to 400), a fixed number of hours of marking in properly ventilated and, if necessary, in air-conditioned rooms.

6

4. Random selection of scripts

However, these factors will not eliminate the subjective element of individual examiners. One of the main factors which brings into play the subjective factor is the quality of scripts which an examiner is expected to mark. In spite of the detailed marking scheme, good working conditions, adequate remuneration, a lighter load of scripts, examiners are affected and influenced by the quality of scripts they are required to mark. If the average quality of scripts to be marked by examiners is good, then the poorer scripts, by comparison, will be marked strictly.

On the other hand, if the average quality of the scripts is sub- standard then the scripts which would otherwise be of average quality are given marks which would normally be given to good scripts.

The first need, therefore, is that the different examiners get scripts which are more or less of equal average quality. It would seem that this problem is not easily soluble because until scripts are examined it will not be possible to predict their average quality. There is, however, a basic satistical principle at can be invoked to solve this problem-the principle of "random selection". If scripts to different examiners are allotted on the basis of a process equivalent to drawing up lots (and lotteries are now fashionable!) then the lots given to different examiners will be approximately of the same average quality.

There are two important statistical factors in the principle of "random selection" which will determine a common pattern of marking and will reduce the subjective element. These are-

(i) that in lots of three to four hundred scripts it will be found that the mean (average mark) or the median (middle mark) of the different lots of scripts will lie between a narrow range of two to three marks.

(ii) the range or spread of marks (the lowest mark scored and the highest mark secured) will not vary very greatly from one lot of scripts to another.

Thus, if there are great variations in the mean or median mark or in the range or spread of marks, it will mean that the subjective bias of the examiners is dominant and therefore, it will be statistically justifiable to scale the marks given by the examiners to conform to a common pattern.

Another important statistical axiom which justifies the scaling the

7

marks of examiners to a common pattern is that where there are scripts of several thousands of candidates, taught in a large number of schools, by hundreds of teachers, it is mathematically sound to conclude that the standards of teaching, the quality and the prepara- tion for the examination cannot show wide fluctuations from one year to another. Any variations found from year to year cannot be attributed to variations in teaching or the intelligence or attainment of candidates but in the standard of the question papers, the standard of marking and other concomitants of the examination.

5. Adjusting the mark of different examiners

An experiment was made by Gauhati University on the adjustment of the marks of different examiners.

This study helped to highlight that chance in the conventional examination is far greater than has been previously suspected on ac- count of unpredictable variations between the standards applied by different examiners. It did also show that the scaling of marks can compensate for these variations and in the words of the report, "that the marks thus awarded can result in greater validity than would otherwise be possible." "By scaling is meant the adjustment of marks to a common pattern. It has long been recognised as a necessary procedure, for examination results mean little or nothing when candidate A is judged on one standard and candidate B on another."

The Gauhati University investigation was based on what is termed 'median scaling'. To quote the report again, "If all the marks on the sheet are taken in order of magnitude beginning with the highest and ending with the lowest, the middle mark is the 'median.' The median divides the group of marks into upper and the lower half with the same number of entries in each. In the same way, the upper quartile is the middle mark in the top half and the lower quartile is the middle mark in the bottom half."

"The mean is a measure of the standard of marking. It is the mark which the examiner gives to a scrip of average merit."

In other words, the average of the median mark of the different examiners is worked out and treating this as the norm for those cases where the median mark of an examiner differs from the norm beyond a certain range, he marks in the whole lot of scripts marked by an examiner are adjusted, that is to say, raised or lowered proportion- ately.

8

Bu, it should be noted that the Gauhati University investigation adjusted the marks of the different examiners in the median marks only and not in the spread of marks.

The report itself states:

"Mark sheets differ not only in the value of the median, but also in the spread of marks.... The spread is measured by the standard deviation which is approximately three-quarters of the interquartile range."

The report continues: "Ideally marks should be scaled so that all sets of marks have (a) the same mean or median, and (b) the same standard deviation. Of these (b) presents the more difficult problem, which needs further study...."

In an article published in the Indian Educational Review (January 1968), Professor V.M. Dandekar commenting on this observation of the report says:

"However, this is not entirely true. Two sets of marks having the same mean or median and also the same standard deviation may differ in several important respects."

He goes on to illustrate this and concludes thus: "The reason why two sets of marks with the same mean and standard deviation do not agree in several important respects is simple. As pointed out above, the mean and the standard deviation are particular measures of the average level and of the spread of marks. These measures would have special significance only if the distribution of marks, as given by examiners were perfectly normal. The term 'normal' here does not mean more than a particular form of statistical distribution. If the distribution of marks were perfectly 'normal' in this sense, it could be shown that two sets of marks having the same mean and We same standard deviation would agree in all other respects."

6. The J-effect

The marks of examiners, unfortunately, do not conform to a normal distribution curve. The Gauhati report draws particular attention to this important aspect of the marks of examiners: "A prominent feature of many mark sheets has been called the 'J-effect', since it often given a J-shaped distribution. In these mark sheets a disproportionate number of scripts are placed exactly at the pass mark, and there is a corresponding gap in the marks immediately below...."