INTERPRETATION GUIDE FOR STUDENT OPINION OF
TEACHING EFFECTIVENESS (SOTE) RESULTS
Prepared by
October 2004
NOTE: This Guide
applies to the newly revised SOTE rating form adopted in Fall 2003 (Appendix
A). The SOLATE rating form (Appendix B)
is currently undergoing modifications.
Continue to use the previous edition of the Interpretation Guide
(attached) when reviewing SOLATE reports.
Following several
years of development by the Student Evaluation Review Board (SERB), a new SOTE
rating form was adopted for implementation beginning in the Fall 2003 semester
(see F02-2). In addition to developing
tools for rating teaching effectiveness, SERB is also charged with “developing and making available to the University community,
information and guidelines for the effective interpretation of the rating
instruments." As per this charge, the interpretation guide
presented here provides a description of the new form, explanations for the
statistics included in the SOTE report, and factors that influence SOTE
ratings. The interpretation guide refers to and explains analyses of SOTE
results generated during the Fall 2003 semester when the new form was
administered. Based on administration to 2827 classes and returns of 66443 SOTE
forms during the Fall 2003 semester, a new set of norms were generated for use
in evaluating teaching effectiveness. Additionally, data on a variety of
demographic and other variables were collected for use in helping to identify
meaningful patterns in SOTE responses. Drawing upon previous guidelines for the
interpretation of SOTES, and incorporating the changes dictated by the current
SOTE, this interpretation guide should be used to evaluate the both the
“statistical” opinion of teaching effectiveness data provided by students, and
the written subjective comments of students in order to reach a qualitative
judgment about effectiveness in teaching assignment.
The following are
the most important differences between the previous version of the SOTE and the
current version. Each of these has implications for interpreting the SOTE, and
these implications are noted.
Format
Unlike the
previous version of the SOTE, the current version presents each item in a
separate box of its own. The form was designed in this way to maximize the
likelihood that each item would be read and considered on its own, and to
reduce the likelihood that students would simply endorse the same rating for
each item by marking the same number in a straight line.
Scale
The rating scale
for the current SOTE consists of five points in a Likert type scale with
ratings of (5) Very strongly agree; (4) Strongly Agree; (3) Agree; (2)
Disagree; and (1) Strongly Disagree. There is also a sixth option, (NA) Not
Applicable/No Opportunity to Observe. Note that in the previous version of the
SOTE, the ratings ranged from (5) Excellent to (1) Far Below Average, with (3)
rated as Average. In interpreting the previous version of the SOTE, there were,
in essence only two points (ratings of 4 and 5) that signified teaching
excellence. In the current version there are three points (agree, strongly
agree, very strongly agree) that signify a positive evaluation of teaching
effectiveness. Students now have the option of choosing among a greater range
of “good” evaluations. When looking at dossiers that contain both the previous
and the current version of the SOTE, RTP committee members should consider that
the
two sets of ratings are not directly comparable. In
interpreting SOTES collected using both the old and the current SOTE,
instructors’ scores should be evaluated in comparison to the corresponding
Department, College, and University norms for each item (see below for an
explanation of new norms).
Number of items
Unlike the
previous version, the current SOTE contains 13 items, the last of which is
comparable to the “old” item 14, and refers to overall teaching effectiveness.
This item shows a very high correlation with most of the other items and
therefore is a good index of overall effectiveness. Nonetheless, RTP committees should carefully examine
ratings for all 13 items and not solely rely on ratings for item 13.
Subjective Evaluations
The new rating
form is formatted as a 2-page booklet.
The first page contains standardized rating items, the second page of
the new form contains questions in which students are asked to provide
subjective written comments regarding the teaching effectiveness of their
instructor. Subjective ratings of “officially” rated classes must be included
in the dossier. In
interpreting these responses, members of RTP committees should take into
account the majority of comments, rather than focusing on atypical responses.
However, if comments are repeatedly observed for the same instructor across
sections and time, then the RTP committees should consider further evaluations
for that instructor.
SOTE Report Display
The SOTE Report
remains basically unchanged from the previous version. However, item medians now appear alongside the item
means and standard deviations. Each
report is also identified as “OFFICIAL” or “PERSONAL” along the top and bottom
margins. Also appearing on the bottom of Page 2 of “OFFICIAL” reports is the number of written
comment pages that are on file in the PAF for the class.
Explanations for the various statistics
used on the SOTE report, how to interpret them, and potential caveats are
described herein.
The mean is the arithmetic
average of student responses. Means are reported to the first decimal place. As
noted below, caution should be used in interpreting means based on fewer than
10 students’ responses.
The standard deviation is a measure of
agreement among respondents. It indicates the variability among the responses.
That is, how much, on the average, student responses vary from the mean.
Standard deviations for most items are very close to 1.0. A large standard
deviation (greater than approximately 1.3) indicates that students frequently
do not agree about what rating should be assigned (i.e. students use three or
more descriptors for a single item). A small standard deviation (less than
approximately .7) indicates that students generally agree about what rating
should be assigned (i.e. students usually use only two adjacent descriptors for
a given item). We do not expect to often see 100% agreement among students – an
excellent teacher for one student may be only average for another student given
differential preparation or experiences of the two students.
A caveat in
interpreting means and standard deviations is that both statistics are highly
influenced by even one or two aberrant scores if the number of ratings is fewer
than about 10. Thus classes and/or items where fewer than 10 students have
responded have been flagged with an asterisk and the following sentences will
be printed directly below the rating items - *ITEM STATISTICS ARE BASED ON 10
OR FEWER STUDENTS. RESULTS SHOULD BE INTERPRETED WITH CAUTION*. Great caution
should be used when interpreting means and standard deviations of such classes
and/or items because the statistics may be unstable – check for consistency
across classes and across rating occasions. In addition, when more than 30% of
the students in a class leave an item blank or mark it “not applicable,” that
rating probably should not be interpreted.
The median is the middle ranking. A median
of 3.5 indicates that half the students gave ratings higher and half lower than
3.5. The median is helpful in cases
where outliers might influence the mean and standard deviation; e.g. cases in
which a few extremely high or extremely low ratings might push the mean score
in a direction that is not representative of the class as a whole. This is
particularly likely in smaller classes or classes with large numbers of blanks
or “not applicable” ratings.
The Norm Data
Following the
introduction of the current SOTE form, new norms were computed based on the
administration of the SOTE to all classes during the Fall 2003 semester (SOTE
forms were returned for 93% of classes that were subject to evaluation). Norms
for each item are provided at the Department/School, College, and University
level (except in cases where there were there were 12 or fewer classes
evaluated in the Department/School). At each level, responses were aggregated
to compute the means, medians, and standard deviations that serve as norms or a
referent point for making comparisons. Comparisons between the class data and
norm data are best made using the graphic display on the second page of the
report.
Norm data for the
College and University levels only are graphically displayed on page 2 of the
printout. For each item the middle 60% of ratings received by instructors was
determined. This range is displayed as a line of dashes. This line represents the usual range of
ratings received by instructors for that item. The class mean is printed as an
asterisk on the same line. Only if the class mean falls below the norm
(represented by an asterisk to the left of the dashes) or above the norm
(represented by an asterisk to the right of the dashes), can SOTE data can be
used to identify exceptional teachers (those with rating means outside the norm
average.) The usefulness and validity
of the ratings will be degraded if ratings within the norm area are interpreted
as anything other than typical.
It should be noted that students tend to “agree” with the statements on
the SOTE, indicating a highly favorable evaluation of the typical SJSU
instructor.
SOTE
interpretation should be done using trends across classes and semesters. If one item mean is consistently below
(or above) the norm then the item should be noted as important. If an item mean
is inconsistently above or below the norm, RTP committee members should
request further information from the faculty member about the classes. It is especially important to note
consistencies or inconsistencies in the same course preparation on different
occasions. Thus it is possible to note steady improvement or decline.
Factors Affecting the Ratings
Several factors were found to systematically
influence SOTE ratings in the Fall 2003 pilot.
Each is described below and references to similar findings from research
on faculty evaluation conducted elsewhere are provided. These factors should be
considered in any RTP evaluation of SOTE data. It is the responsibility of the
faculty member to assure that information about any of these factors is
included in the dossiers along with the ratings.
Ratings are slightly but positively related
to both expected and received grades (Theall, 2002). Students are asked to report their
expected grade at the time of the SOTE administration. Frequencies for each
possible grade are noted on the SOTE report, as is the actual average final GPA
grade for the class. In general, one would expect to see expected grades
distributed across the range of possible grades (citation?).
Data from the Fall 2003 norming sample indicate that students expecting higher
grades tend to rate their instructors more highly than students expecting lower
grades. When interpreting SOTE ratings RTP committees should note the
distribution of expected grades. Classes in which the majority of students
expect either low or high grades should be fairly rare (exceptions to this
would be graduate and credential classes in which a grade lower than a “B” is
often considered equivalent to a failing grade, and some classes in the
Colleges of Science and Engineering in which grades are often lower than in
other subjects).
In addition to
reporting students’ expected grades, the average grade for each class is also
reported. In general, ratings tend to be slightly but positively related to
grades (In the Fall, 2003 sample the correlation between expected grade and the
SOTE score given by the student is approximately .24). In general, it should be
expected that average grades for a class show some relationship to expected
grades. In cases where there is a wide discrepancy (e.g. 80% of the class
expects a grade of “A” while the actual average grade for the class is a 2.3)
RTP committees should request further information from the instructor.
Ratings in small
or moderate sized classes (<20) classes are higher than large (>20)
classes (Mateo and Fernandez, 1996).
Those interpreting SOTEs should consider average class sizes at the department,
college and university levels when comparing a candidate’s scores to the norms,
as class size may influence SOTE scores.
Faculty evaluation ratings can be
influenced by student level. Ratings in graduate and credential classes tend to
be higher than in undergraduate classes (Arreola, 2000). 5. Freshmen
in the current norming sample tended to give slightly higher ratings, while
seniors in the current sample gave lower ratings. The literature in this area
is mixed in its findings, (Arreola, 2000; Aleamoni and Thomas, 1980; Stewart
and Malpass, 1966).
Students who have taken a class because of
either an interest in the class or because of the instructor’s reputation tend
to rate their instructors more favorably than students who take a course
because it is required. Ratings given by students who are required to take a
class are often lower than ratings by students for whom the class is an
elective (Arreola, 2000).
Instructors in
the Colleges of Science and Engineering tend both to give lower grades and to
be rated lower than instructors in the other colleges. There were also
significant differences in rating between departments within colleges as well.
In light of this, it is important that RTP committees evaluating candidates
from different departments and colleges (University level RTP) compare
instructors to colleagues within their own departments and colleges. (Arreola,
2000).
The current SOTE includes a question about
instructor responsiveness to diversity in the class (item #7). As
indicated in the results of the Fall 2003 pilot, rRatings
for this item tend to have somewhat higher correlations with items 4, 5, and 6
(responsive to questions, established an atmosphere that facilitated learning,
and approachability of instructor) and lower correlations with the other items.
These correlations suggest that as a group, these items may measure students’
perceptions of the instructor as approachable and responsive and that
instructor responsiveness to diversity is equated with the instructor’s general
responsiveness.
APPENDIX A
Student Opinion of Teaching Effectiveness Rating Form (Adopted Fall
2003)




APPENDIX B
Student Opinion of Laboratory and Activity Teaching Effectiveness Rating
Form (SOLATE)

References
Aleamoni, L. M.,
& Thomas, G. S. (1980). A review of
the research on student evaluation and a report on the effect of different sets
of instructions on student course and instructor evaluation. Instructional
Science, 9, 67-84.
Arreola,
R.A. (2000). Developing a comprehensive faculty evaluation system.
Mateo,
M.A., & Fernandez, J. (1996).
Incidence of class size on the evaluation of university teaching
quality. Educational and Psychological Measurement, 56(5), 771-778.
Stewart, C. T.
& Malpass, L. F. (1966). Estimates
of achievement and ratings of instructors.
Journal of Educational Research,
59, 347-350.
Theall,
M. (2002) Student ratings: Myths vs. research evidence: Focus on Faculty,
Prepared by The SJSU Student Evaluation Review Board
(September 1997)
Revised by the Instruction and Student Affairs Committee
(January 1998)
This guide should be used by all SJSU faculty, all
Department/School, College, and University Retention, Tenure, and Promotion
(RTP) committee members, and all others who make judgments based on data from Student
Opinion of Teaching Effectiveness (SOTE) and Student Opinion of Laboratory and
Activity
Teaching Effectiveness (SOLATE) surveys. Information from
SOTE and SOLATE surveys are but one source of information for assessing
teaching effectiveness (see Senate Policies S91-9 and S94-6). Other sources of
information about teaching effectiveness should be employed before reaching an
RTP decision.
It is the responsibility of individual faculty members and
their colleagues to ensure that other sources of teaching effectiveness, e.g., peer evaluations, departmental or individual
instructor's course evaluations, and student letters, are collected and
included in personnel dossiers (in accordance with University policy). If an item mean is consistently above
or below the norm range, the faculty member should provide further information
about that rating. It is the responsibility of the faculty
member to ensure that information about factors that influence student
opinion ratings be included in the dossier, along with the ratings. For
example, the faculty member should note whether the class was composed
primarily of people required to take the course. (See discussion below: Factors
to be Considered When Interpreting the Ratings.)
The upper half
of the first page of SOTE and SOLATE reports provide shortened versions of each
of the 14 questions. To the right are four double columns of numerical data.
The first double column lists means and standard deviations for the class in
which opinions were gathered. The second, third, and fourth double columns list
the norming data collected in F89/S90 aggregated across the Department/School,
the College, and the University. The lower half of page 1 displays frequencies
of responses for the individual items from which the above means and standard
deviations are calculated. Also included are self-reported students' expected
grades, self-reported class level, and actual average class GPA for the class
at the time the SOTE/SOLATE report was generated. Page 2 is a graphical display
of class means superimposed upon
College and University rating "norms."
The mean is the arithmetic average of student responses in
which a score of 5 is assigned to the rating of "excellent," 4 to
"above average," 3 to "average," 2 to "below
average," and 1 to the rating of "far below average." It is
important to remember these descriptors when interpreting ratings. Means are
reported to the first decimal place. Interpretation. The extent of agreement or disagreement on an
item can be seen
directly from the frequency distribution for that item
displayed at the bottom of the page. A less sensitive gauge of agreement is
provided by the standard deviation. Most standard deviations are very close to
1.0. A large standard deviation (e.g., 1.3) indicates that students often do
not agree about what rating should be assigned. A small standard deviation
(e.g., 0.7) indicates that students generally agree about what rating should be
assigned.
Ranges of Typical Values ("Norm Data")
"Norms" for
each item are provided at the Department/School, College, and University
levels. At each level, responses are aggregated over a specified norming period
(most recently, F89/S90 for SOTE) to compute means and standard deviations
which serve as reference points for making comparisons. Comparisons between the
class data and norm data are best made using the graphic display shown on page
2 of the report.
Ranges of typical values ("norm ranges") for the
College and University levels are graphically displayed on page 2. For each
item, the middle 60% of ratings, from the 20th to the 80th percentiles, was
determined for all classes surveyed during the norming period. This range is
displayed as a line of dashes. The class mean is printed as an asterisk on the
same line. Interpretation. If the asterisk is printed within the line of
dashes, the class mean should be interpreted as no different from the norm
group. If the class mean clearly falls outside the line of dashes, it can be
concluded that the rating was below (to the left of the dashes) or above (to
the right of dashes) that of typical scores. The usefulness and validity of a
rating will be degraded
if ratings within the norm area are interpreted as anything
other than typical. It is also important to remember the initial response
descriptors when interpreting ratings (e.g., a score of "4" indicates
"above average"). The mean score of most items is approximately
"4." SOTE and SOLATE
interpretation should use data across classes and semesters.
Many factors are known, through statistical research, to
influence student opinion ratings. Therefore, ratings should always be
interpreted with caution. Several, but by no means all, of the factors which
have been shown to be consistently related to ratings are listed below.
1. On the whole,
research suggests that ratings are highly correlated to expected grades.
Therefore two items are provided; both a distribution of the students'
self-reported expected grades and the actual class GPA given by the instructor
are to be found in SOTE and SOLATE report printouts on the bottom of
page 1. A distribution of the actual class grades given can
also be routinely added to the printout by candidate faculty.
2. Ratings in
small classes tend to be higher than in large classes.
3. Ratings in
graduate classes tend to be higher than in undergraduate classes, and ratings
in upper division classes tend to be higher than in lower-division classes.
Self-reported class level is reported on the bottom of page 1.
4. Ratings given
by students who are required to take a class are often lower than ratings by
students for whom the class is an elective.
5. When a
significant number of students in class leave an item blank or mark it
"not applicable," that rating should be interpreted with caution. The
number of students indicating these responses is reported in the frequency
distribution on the bottom of page 1.
6. Ratings from
team-taught courses should be cautiously interpreted as students may be unable
to separate their experiences from one instructor to the next.