ASA HOME ASA CENTENNIAL SEARCH SITE MAP CONTACT ASA ABOUT ASA RESEARCH PUBLICATIONS PRESS ROOM MEMBERSHIP ANNUAL MEETING
American Sociological Association




Task Force on the Implications of the Evaluation of Faculty Productivity and Teaching Effectiveness

Productivity

Definitions of Productivity

Discussions of the issues that prompted the formation of the task force are complicated by the many and conflicting definitions of key terms. There is considerable variation in how productivity is defined. Most importantly, productivity can mean the productivity of an individual faculty member or it can mean the collective productivity of a department or larger academic unit.

When the focus is on the individual faculty member (as in a tenure and promotion review or a review for merit), productivity is defined to include performance in the areas of teaching, scholarship, and service. Traditionally, productivity in the area of scholarship has been given the most weight. In recent years, under the influence of the writings of Ernest Boyer (1990), the definition of scholarship itself has been broadened to include the scholarship of integration, application, and teaching as well as the traditional scholarship of discovery. Certain tensions exist among these various forms of scholarship and between faculty commitments to teaching, service, and scholarship. Faculty has limited time. Further, teaching and service tend to be activities directed inward to the institution employing the faculty member while scholarship tends to be directed to an external audience of readers, practitioners and publishers.

There is increasing evidence that productivity in the university has also come to mean the productivity of larger units, departments, divisions, and colleges. Most commonly, productivity in this sense refers to the “production of students” (i.e., to student/teacher ratios, numbers of student credit hours produced, numbers of graduates, etc.), but, particularly in research-oriented universities, it can refer to “scholarly” productivity, as, for example, when overall departmental publication records are compiled (to compare departmental “quality”) or when departments are asked to demonstrate success in attracting external funding. However defined, these supra-individual discussions of productivity are framed by talk of accountability, and the rhetoric of efficiency and the marketplace is prominent.

Measuring Scholarly Productivity

Productivity traditionally has been defined in terms of individual performance judged through the tenure and promotion process, and the focus has been on the measurement of scholarly productivity. This remains a widespread understanding of what the term means. Institutions and departments have long collected data on faculty publications, grants, citations and the like and have used these for various purposes, including decisions about promotion and tenure, discretionary or merit increases, and even workload (in some institutions, “unproductive” faculty who do not meet institutional standards for research productivity are sometimes assigned additional teaching or service responsibilities; even unionized campuses may have contracts that allow for this kind of thing).

There have always been disputes about this type of measurement of faculty productivity. Generally, these focus on how to measure (are citation rates meaningful? Should books count more or less than articles? How should journal prestige be measured? What weight should be given to journal prestige? How to weigh the relative significance of grants and publications? How should submissions with multiple authors be judged? How should articles in specialty journals be judged in comparison to articles in “mainline” journals? How should articles outside the field of sociology be judged in comparison to articles in sociology journals? What weight should be given to external reviewers? What weight should be given to grants (i.e., external funding)? what weight should be given to textbooks? What weight should be given to trade texts rather than to those published through academic or university presses? How should publications in international journals be judged?) Questions are also raised about the appropriate period of time over which to measure productivity (One year? Three years? Five years?) And there are always disputes about how the data should be used. But most faculty members appear to accept the idea that they should document their research productivity and that these data will be used, in some way, in making personnel decisions. Chairs, surveyed by the Task Force, mentioned many more positive than negative consequences of these reports. They focused on their potential value as a developmental tool, on their use in evaluating faculty for promotion, and on their value as ammunition in negotiations for additional resources. In part, faculty support for this type of productivity assessment stems from its being embedded in a collegial system of review by peers.

The tenure and promotion process also routinely includes an evaluation of teaching effectiveness and service, although the term “productivity” is not commonly associated with teaching success and service involvement. While institutions with a strong teaching focus have traditionally given teaching effectiveness considerable weight in personnel evaluation systems, recent attempts to define and expand the definition of scholarship have led many institutions, including those with a research emphasis, to give more attention to the evaluation of teaching (Donald and Denison 1996).

Measuring Teaching Effectiveness

The measurement of teaching effectiveness is complex and problematic. The most commonly used technique for assessing teaching quality is a quantitative instrument completed by students near the end of a course (Seldin 1998). These instruments are used on most university campuses, but have been the subject of much criticism. Some see such instruments as measuring little more than faculty popularity with students. Others contend that a variety of factors (class size, gender of instructor, rank of instructor, expected grade, difficulty of course, etc.) affect student responses, making the results of these evaluations difficult to interpret and/or misleading. Not surprisingly, given the controversy, considerable effort has been devoted to trying to identify what determines student responses on these questionnaires. Members of the Task Force reviewed this literature and found that it both calls into question some of the assumptions of the skeptics and leaves many questions unanswered. For example, the widespread belief that grades predict positive evaluations has not been supported (in fact, some studies find positive correlations between students' perception that a course is “difficult” and their evaluation of the instructor). Other beliefs, however, have not been dismissed. A few have been supported (for example, it appears that, within a major, required courses are less favorably evaluated than electives); others remain in dispute (for example, there are contradictory findings on the effect of instructor gender on student evaluations).

There has also been extensive research on the reliability and validity of the many existing quantitative measures of teaching effectiveness (Cashin 1995). It appears that reliable, valid instruments for measuring teaching effectiveness have been developed. However, many institutions use homegrown instruments that have been inadequately tested, so questions remain regarding the reliability and validity of the evaluation of teaching.

Qualitative analyses of teaching effectiveness are much less common. Peer reviews of teaching are time-consuming and some have questioned their reliability (Morehead and Shedd 1997). Further, there can be an unfortunate tendency for reviewers to focus on the occasional negative substantive comment, overlooking the bulk of qualitative evidence, which is, in fact, positive. As with many efforts to evaluate faculty productivity, junior faculty may be disproportionately affected by the institution of a system of peer review, as it is they who must go through the process of tenure and promotion (which, at some institutions, requires presentation of multiple measures of teaching effectiveness). They are also more likely to be asked to produce teaching portfolios, of which peer reviews form an integral part.

Measuring Service

The measurement of service is best described as crude and commonly consists of little more than a listing of committee assignments in the department, college, or university and/or of activity in professional associations at the national or regional level. Community service—a strong point for many sociologists—is not commonly given full credit in productivity studies (Glassick, Huber and Maeroff 1997). Some data suggest that women (and junior faculty) do disproportionate amounts of departmental and university service work (mentoring students, serving on committees to ensure “diversity,” etc.). Consequently, the vagueness of existing ways of assessing service disproportionately affects women and minorities in the discipline (Park 1996).

In an effort to measure faculty productivity more holistically, some departments have adopted the portfolio method (Cerbin 1994). However, portfolios are themselves time consuming to prepare and readers up the chain of command sometimes complain when narratives are not condensed into “sound bite” packages (thereby defeating the purpose of the portfolio).

Measuring Faculty Activities

In many universities, another measure of faculty productivity has come into widespread use – the routine collection of faculty activity reports. More than 90 percent of the chairs surveyed by the Task Force reported that faculty were required to report on their productivity or workload and 71 percent indicated that these data were reported to the Dean or a College committee. In these reports, each faculty member is asked to indicate how they spent their work time over a given period of time (usually an academic year), documenting publications, presentations and grants, but also indicating any service activities in which they engaged and describing their teaching and advising activities. While this kind of measurement is widespread enough to be uncontroversial, it does raise another set of issues for faculty.

First, what is “productivity” in each of the areas being measured? For research productivity, the same issues we have already reviewed arise. Teaching and service are even more complex. Faculty activity reports give considerable attention to teaching activities, but how should teaching productivity be measured – by teaching evaluations? by the number of students taught? by student-to-teacher ratios? by the development of new courses and/or new teaching techniques? by teaching effectiveness (and how is that to be measured)? by cost indices? by student completion rates? by job placement rates? Concerns over these questions have been intensified by state legislatures' recent expressions of concern that faculty spend too little time in teaching and too much time in research and other activities. Also, what should count as service? And how much weight should be given to the various activities in which faculty engage in composing an overall picture of their productivity?

Departmental Productivity and Cross-Disciplinary Disparities

When institutions routinely collect data on faculty activities, it also opens up the question of whether to measure departmental productivity (and not just individual productivity). It is obviously possible to aggregate the data on individual faculty members and create a picture of a department's overall productivity. These data can become the basis for comparisons, either within institutions or across them. For example, chairs know that Deans and other administrators sometimes gather data on departmental “productivity” and use them to help guide resource allocation decisions. Comparisons across institutions are also possible, as when a department's research productivity is compared to the productivity of competing departments at other institutions.

Responding to concerns that cross-disciplinary comparisons are inappropriate, and to more general concerns about the inaccuracy of standard measures of departmental productivity, some academic administrators have begun exploring improved ways of measuring productivity. Probably the most notable of these attempts is the so-called “Delaware Study,” a project funded, in part, by FIPSE and by TIAA-CREF, which has recently been summarized by one of its principal investigators, Michael Middaugh (2001).

This study represents a sophisticated attempt to develop an elaborate methodology for measuring productivity. Its authors try to sort through the many difficulties involved in calculating how much faculty actually teach, whether students are being taught by tenure-track faculty, how to factor in complications such as released time, administrative duties, grant buy-outs, etc. They have collected data on faculty activity from a range of national (mostly public) institutions and, using the new methodology and concepts they have developed, attempt to compute “benchmark” productivity rates by discipline. The idea is that these discipline-specific benchmarks could be used in examining productivity in specific departments and would help to discourage inappropriate cross-disciplinary comparisons.

It is unclear that this has had much effect … yet. Despite the fact that a large number of institutions have provided data for the Delaware Study, the Task Force has encountered few faculty who were even aware of the study or who knew whether or not their institution participates in it. Further, the researchers involved in the study indicate that it is not widely being used at present. However, they do talk about how it might be used.

Middaugh advocates using the benchmarks for diagnostic purposes, not to reward and or punish departments in a simple-minded way. He opposes a simple process in which a department is rated either above or below average and then rewarded accordingly. Instead, he encourages administrators to combine quantitative data on productivity with other data, including qualitative data, to explain unusual patterns, etc. For example, if a department of Sociology is producing at a level below the national benchmark for productivity, this should not be an occasion to punish that department. Instead, it should induce questions – are there specific reasons why the department is “less productive?” Is the department doing other things that the institution values which compensate for low productivity in other areas? Middaugh points out that a program that focuses on graduate education will generally appear less productive than one which teaches many undergraduates. But, this may be consistent with the institution's mission or departmental priorities; it may also lead to other kinds of productivity (high national rank for research; grant income). Before concluding that a department is performing poorly, questions such as these need to be answered.

The suggestion is, therefore, that the Delaware benchmarks be applied carefully and for diagnostic purposes. Since there appear to be few states (Middaugh mentions Utah as a possible exception) in which the Delaware data are actually being used to evaluate productivity and/or guide resource allocation decisions, it is not yet possible to determine whether this suggestion is being followed.

Implications of Measurement for Faculty and Chairs

For the most part, activity reports seem to raise few red flags for faculty, who experience them as a bureaucratic annoyance. Chairs responding to the Task Force survey reported that their principal complaints about such reports were that they were “too time-consuming” and “did not measure quality well.” Indeed, it is often the case (or at least faculty believe as much) that the reports are simply collected and filed, and that little is actually done with them. Department chairs, however, are aware that it is fairly common for deans or provosts to calculate departmental costs per student credit hour, or to use data on enrollments and sections taught to compute measures of departmental productivity (e.g., the ratio of student credit hours to FTE, a rough measure of the number of students taught per faculty member), or to measure departmental productivity in terms of generating external funding. Once these data are presented for all departments in a college or university, it is easy for a discussion to develop in which departments are identified as “efficient” or “expensive.” In a few universities, of which Ohio State is a clear example, data on departmental productivity have been used to identify “excellent” or potentially “excellent” programs and to justify steering additional resources to those departments.

Relation Between Productivity and Resources

Most of the time, however, departmental productivity data appear to have few practical consequences. In fact, some chairs complain that decisions about resource allocation are NOT based on these data. At some institutions, the complaint is that decisions about new hires are driven by accreditation concerns, not by whether departments are teaching large numbers of students with small numbers of full-time faculty. In effect, some faculty and chairs complain that administrators use productivity data only to support pre-existing priorities; a “productive” department may find that it gets no additional resources (instead, it is praised for being “efficient”) while another department whose accreditation is in jeopardy or that has been identified as an institutional priority may receive additional resources in spite of being less “productive” than other departments. Indeed, this apparent disconnect between productivity data and resource allocation breeds widespread cynicism among faculty and chairs about the productivity data themselves. It is tempting to conclude that the measurement of collective faculty productivity is just the latest in the series of “management fads in higher education” recently described by Robert Birnbaum (2000).

Still, as Birnbaum and others note, even momentary fads can alter people's way of thinking and create institutional structures and categories of thought that eventually become the basis for change. Moreover, there are significant political pressures on public universities to “get serious” about measuring productivity. In state systems, efforts abound to tie resource allocation decisions to these kinds of data, to reward those programs or units that are “productive” and to raise concerns about or even punish those that have “excessive” costs or have been “underperforming.” Some form of performance-based budgeting has been implemented in at least 36 states, according to SUNY's Rockefeller Institute of Government (Schmidt 2002). Perhaps the most notable example is South Carolina, where a panel of non-academics appointed by the state legislature developed a plan to use a complex set of performance indicators in determining how resources would be allocated within the state university system. In theory, 100 percent of university budgets were to be allocated in this way (Trombley 1998).

The reality in South Carolina, and in virtually all other states, has been somewhat different. Three percent or less of states' education budgets is tied to performance indicators. And, there is little evidence that performance pressures at the institutional level have translated into pressures on individual departments (Allen 1999; Schmidt 2002). Perhaps reflecting this, when the Task Force surveyed department chairs on the advantages and disadvantages of productivity measurement, there was no mention of concern about budgetary consequences.

It would be naïve, however, to ignore the potential consequences of productivity measurement. In an era of tight resources, university administrators are likely to experience powerful pressures to improve efficiency and to attract external resources; this creates an incentive to make use of the productivity data at their disposal in making strategic decisions. Slaughter and Leslie (1997) have described how these pressures also encourage faculty to engage in a kind of “academic capitalism,” involving various kinds of entrepreneurial activity tied to resource availability rather than intellectual criteria. Finally, pessimists point to the British case, where the collection of data on departmental productivity has become routine. British faculty feel genuine external pressure to maintain high rates of publication and external funding to prevent their departments from being downgraded or even closed (Chalkley, Fournier, and Hill 2000), as happened recently to the University of Birmingham's esteemed department of Cultural Studies.

Relation Between Productivity and Disciplinary Content

To the extent that these measurements of faculty productivity become part of the decision-making process in universities, they can also have implications for the substantive nature of the discipline.

  • Creating pressures to focus on training students in areas where students gets jobs (e.g., criminal justice)
  • Encouraging large sections of service courses (e.g., SOC 100) and giving less attention to upper division courses for majors
  • Disadvantaging graduate classes with their relatively low enrollments or, alternatively, favoring those classes because state funding formulae reward graduate programs more liberally
  • Encouraging research in areas where grants are plentiful (e.g., drug abuse prevention, homeland security) and devaluing research which is unlikely to be funded (in non-policy areas, using qualitative methods, etc.)
  • Favoring research which can find publication outlets readily rather than research that is cutting edge or controversial.

    Relation Between Productivity and Academic Freedom

    Some faculty express concerns related to academic freedom. They point to:

  • The difficulty of teaching small, writing-intensive, critical courses.
  • Concern with teaching “more” rather than teaching “better”
  • Priority on research to the detriment of teaching (although see the new emphasis on multiple modes of scholarship)
  • Concern that traditional forms of collegial faculty control are being undermined, generally.

    Many also express concern that institutional demands for high productivity fall disproportionately on junior faculty, again, because of their vulnerability to tenure and promotion decisions. Senior faculty entered the university under a different set of “rules” and is, to an extent, insulated from the new pressure to be “productive” because of that.

    However, the increased emphasis on the measurement of productivity has also led to the creation of systems of post-tenure review on many campuses. Here, the measurement of productivity at the individual level joins forces with the measurement of aggregate productivity. Thus, far, post-tenure review has not eroded tenure protections enjoyed by senior faculty (although many report that the institution of a serious post-tenure review program encourages early retirement among the senior faculty). The American Association of University Professors opposes post-tenure review, but reports few complaints of abuse from faculty. At present, the most common complaint appears to be that it is a redundant, bureaucratic exercise (Montell 2002). Aper and Fry (2003:258) find that post-tenure review in most cases is “more ritual than substantive and more driven by politics and appearance than by deeply rooted intentions to change the status of the faculty within the academy.” Their survey of institutions with graduate programs indicates that most schools that institute post-tenure review do not carefully assess the consequences of these activities, nor do they devote additional resources to them. Nevertheless, it is clear that post-tenure review is motivated by the sense among administrators and others that senior faculty are not responding to the increasing demands that faculty maintain high levels of productivity.

    Overall, it seems to be the case that the evaluation of faculty productivity has yet to have a noticeable impact on faculty lives. But there is also clearly an active discussion in administrative and legislative circles of faculty productivity, and efforts are being made both to improve institutions' ability to evaluate productivity and to use data about productivity to guide decision-making. It is reasonable to conclude that this is likely to continue (Allen 1999).

    Recommendations to Minimize Misuse of Productivity Data

    The desire to make these kinds of comparative analyses of productivity at the departmental level has revealed some new problems of measurement. We offer a series of recommendations to minimize the misuse of productivity data.

    1. Most obviously, the same standard of productivity should not be applied to all disciplines. A “one-size-fits-all” standard implies, incorrectly, that all departments should structure their programs and use resources in precisely the same ways.

    2. Similarly, caution should be exercised in comparing institutions with differing missions, histories, funding bases, and student bodies.

    3. Faculty need to be made aware of the types of data that are being collected on them and to be equally aware of the ways in which the data are being reported and used. The fact remains that collected data have the potential of being used in ways that are punitive to individuals, departments, and/or institutions.

    4. General discussions of data collection, data quality, and data use should not be divorced from institutions' existing systems of faculty governance and control. Collective faculty productivity should not be viewed as an administrative issue, while individual faculty productivity is viewed as an issue subject to peer review and collegial control. In particular, faculty collectively need to be ready to organize in opposition to pressures which undermine their ability to control and organize curricula, pedagogy, and research as they see fit given their disciplinary expertise.

    5. Faculty—with their considerable methodological expertise across many disciplines—also need to be involved in the technical discussions of how data are to be measured, whether at the departmental, institutional, or system levels.

    6. Multiple measures of productivity are inherently preferable to single measures.

    7. Non-numeric data need to be incorporated into discussions of productivity so that both context and the quality of efforts are given full weight.

    8. The time demands on faculty for data collection need to be kept to a minimum. Gathering information on productivity should not undermine productivity by distracting faculty from the traditional time demands of teaching, research, and service.

    9. Junior faculty, in particular, need to be protected from bearing the undue weight of changing pressures for faculty productivity.

    Both because they are faculty members and because their distinctive expertise qualifies them to be useful contributors to the discussion, Sociologists need to be actively involved in the discussion of measuring faculty productivity. In administrative circles, the discussion has moved beyond the question of whether to measure productivity to the question of how; but there is still a need to discuss whether or not this is an appropriate way to discuss academic work. And, if there is to be a discussion of how to measure productivity and what to do with the data so generated, sociologists need to be among those echoing Middaugh's cautions against cross-departmental comparisons, against crude measures of faculty teaching load, and against simple-minded uses of even the more sophisticated measures of productivity.


    [a] The literature on teaching evaluations is too vast to be reviewed adequately here. Interested readers may usefully consult some of the many university-based web sites devoted to the evaluation of teaching (many of which provide extensive bibliographies and links pages). One that the Task Force found useful is: http://www.indiana.edu/~best/multiop/ratings.htm

       Homepage  



    Last Updated on January 08, 2005