This paper criticizes a “quick-and-dirty” desktop model for the use of metrics in the assessment of academic research performance, and proposes a series of alternatives. It considers often used indicators: publication and citation counts, university rankings, journal impact factors, and social media-based metrics. It is argued that research output and impact are multi-dimensional concepts; when used to assess individuals and groups, these indicators suffer from severe limitations: Metrics for individual researchers suggest a “false precision”; university rankings are semi-objective and semi-multidimensional; informetric evidence of the validity of journal impact measures is thin; and social media-based indicators should at best be used as complementary measures. The paper proposes alternatives to the desktop application model: Combine metrics and expert knowledge; assess research groups rather than individuals; use indicators to define minimum standards; and use funding formula that reward promising, emerging research groups. It proposes a two-level model in which institutions develop their own assessment and funding policies, combining metrics with expert and background knowledge, while at a national level a meta-institutional agency According to this model, an The proposed model is Rather than having one meta-national agency defining what is research quality and what is not, and how it should be measured, the proposed model facilitates each institution to define its own quality criteria and internal policy objectives, and to make these public. But this Although a meta-institutional agency may help to improve an institution’s internal processes, a repeatedly negative outcome of a marginal test may have negative consequences for the institution’s research funding.
This paper discusses a subject as complex as the assessment of scientific-scholarly research for evaluative purposes. It focuses on the use of informetric or bibliometric indicators in academic research assessment. It proposes a series of analytical distinctions. Moreover, it draws conclusions regarding the validity and usefulness of indicators frequently used in the assessment of individual scholars, scholarly institutions and journals. The paper criticizes a so called desktop application model based upon a set of simplistic, poorly founded assumptions about the potential of indicators and the essence of research evaluation. It proposes a more reflexive, theoretically founded, two-level model for the use of metrics of academic research assessment.
This contribution
The paper consists of three sections. Section 2 presents an introduction to the use of indicators in research assessment, following the monograph Applied Evaluative Informetrics published by the author (
Publication and citation counts;
A university’s position in university rankings;
Journal impact factors;
Altmetrics and indicators based on full text downloads.
Section 4 proposes an application model for the use of indicators in academic research assessment. From the critical discussion in Section 3, conclusions are drawn on the way in which these measures could be used properly, and on how they should better
A first notion is that of the
Multi-dimensional output and impact.
Impact dimension | Publication based | Non-publication based |
---|---|---|
Scientific-scholarly | Scientific journal paper; book chapter; scholarly monograph; conference paper; editorial; review | Research dataset; software, tool, instrument; video of experiment; registered intellectual rights |
Educational | Teaching course book; syllabus; text- or handbook | Online course; students completed; degrees attained (e.g., doctorates) |
Economic or technological | Patent; commissioned research report | Product; process; device; design; image; spin off; registered industrial rights; revenues from commercialization of intellectual property |
Social or cultural | Professional guidelines; policy documents; newspaper article; encyclopaedia article; popular book | Interviews; events; performances; exhibits; scientific advisory work; Communication in social media, e.g., blogs, tweets |
Twelve informetric indicator families in research assessment.
Indicator | Specification; examples |
---|---|
Publication-based indicators | Publication counts by type of publication |
Citation-based indicators | Citation impact, visibility; un-citedness |
Journal metrics | Journal impact factor |
Patent-based indicators | Patents; patent citations |
Usage-based indicators | Full text downloads, html views |
Altmetrics | Mentions in social media; readership counts |
Webometrics | Web presence; Web linkages |
Indicators related to research data | Quality and accessibility of research data |
Econom(etr)ic, technology-related indicators | Efficiency (output/input); licenses; spin-offs |
Reputation-based measures | Prizes, awards |
Network-based indicators | Collaboration, migration, cross-disciplinarity |
Indicators of research infrastructure | Facilities, scale, sustainability |
Figure
Four levels of intellectual activity in research assessment.
Four levels of intellectual activity in research assessment.
Level | Key aspect and issues (examples) | Main outcome |
---|---|---|
Policy | Desirable objectives; strategies to serve them; Are objectives and strategies fair and aligned with the rules of good governance? | Policy decision based on the outcomes from the evaluation domain |
Evaluation | Evaluative framework: what is valuable and how is value to be assessed. What constitutes performance? | A judgment on the basis of an evaluative framework and the empirical evidence collected. |
Analytics | Empirical and statistical research: development of new methods; indicator validity; effectiveness of political strategies. | An analytical report as input for the evaluative domain. |
Data collection | Creation of databases with data relevant to the analytical framework; Data cleaning; assessment of data quality. | A dataset for the calculation of all indicators specified in the analytical model. |
A basic notion holds that from what
Value-free does
The choice of indicators in an assessment study does not only depend upon the type of entity and the performance dimension to be assessed, but also upon the purpose of the assessment and its broader context. Table
Key questions to be addressed in the setup of a research assessment study.
Question | Examples |
---|---|
Unit of the assessment? | A country, institution, research group, individual, research field, international network? |
Dimension of the research process should be assessed? | Scientific-scholarly impact? Social benefit? Multi-disciplinarity? Participation in networks? |
Purpose and objectives of the assessment? | Allocate funding? Improve performance? Increase regional engagement? Budget cuts? |
Relevant, general or ‘systemic’ characteristics of the units? | E.g., a national research community’s orientation towards the international research front; or phase of scientific development |
As regards the actual use of bibliometric or informetric indicators in research assessment of individuals, research group and institutions, the current author holds the following position.
Calculating indicators at the level of an individual and claiming they measure by themselves an individual’s performance suggests a
University rankings are influenced by political premisses and objectives.
The informetric evidence that journal impact factors are good indicators of the quality of the peer review system and of international orientation is weak.
“Altmetrics should not be used to help evaluate academics for anything important, unless perhaps as complementary measures” (
Performance of an individual and the citation impact of the papers he or she (co-) authored relate to two distinct levels of aggregation. Research is team work; multiple co-authorship is a rule rather than an exception, especially in the natural, life and applied sciences. A crucial issue is how one should assign the citation impact of a team’s papers to the performance of an individual working in that team. This issue cannot merely be solved in an informetric way.
The application of fractional counting based on the number of co-authors, or considering the position of an author in the author sequence in the byline of a paper, taking into account corresponding authorship, or using formal statements in research papers on author contributions, are per se interesting approaches, but they do not solve the problem of assessing the contribution of an individual to team work.
While in some departments all publications made by doctoral students are as a rule co-authored by their supervisors, in other groups supervisors may be reluctant to feature as an author in articles of their students, and therefore have a low publication output.
Members of research groups may have different functions. Some members may not conduct “pure” research activities, but nevertheless carry out essential management or fund-raising tasks that are essential for a group’s research performance. They may have low publication counts.
In institutions with oppressive working relations among colleagues, a senior member may force his or her subordinates to become co-author of their papers.
A critical analysis of “world” university rankings shows that each ranking system has its own orientation or ‘profile’, and that there is no ‘perfect’ or ‘objective’ ranking system. Their geographical coverage, rating methods, selection of indicators and indicator normalizations, have an immediate effect upon the ranking positions of given institutions.
Using a “normalized” indicator that corrects for differences in citation impact across geographical regions may cause “top” universities to be more evenly distributed among regions. A methodological decision to use such indicator boosts up the position of more regionally oriented institutions in a world ranking.
Both research productivity and graduation productivity are important, mutually dependent aspects of institutional performance. If an institution’s number of publications per academic staff increases over time, the number of graduates per staff may decline and vice versa. Considering merely one of these two aspects may easily lead to misinterpretations of an institution’s performance (e.g.
The use of journal impact factors (JIFs) and related citation based indicators of journal impact
JIFs are good measures of the quality of a journal’s manuscript peer review process.
JIFs are good measures of the international orientation of a journal’s authors and readers.
The current author holds the position that the informetric evidence in support of these assumptions is rather weak. Although several validation studies conducted in the past have reported for selected subject fields a positive (rank-) correlation between journal impact factors and peer opinions on the status or quality of journals, there is
Altmetrics relates to different types of data sources with different functions.
Wouters, Zahedi & Costas (
Although social media metrics may reflect attention of a wider public, they should not be used to measure scientific-scholarly impact. Their numbers can to some extent be manipulated, “since social websites tend to have no quality control and no formal process to link users to offline identities” (
Visibility of researchers in social media and reference managers strongly depends upon the extent to which they themselves decide to actively participate in such media.
Readership counts in scholarly reference managers depend upon readers’ cognitive and professional background, and need not necessarily be representative for a wider scientific-scholarly audience (
Downloaded articles may be selected according to their face value rather than their value perceived after reflection. Also, there is an incomplete usage data availability and a lack of standardisation across providers, and counts can to some extent be manipulated (
Bibliometric or informetric indicators are often applied in what is termed a “desktop assessment” model, that is based on a series of simplistic, seemingly plausible, but, upon reflection, questionable assumptions (
The assumptions of the desktop assessment model and alternative approaches.
Aspect | Desktop model assumptions | Alternative approaches |
---|---|---|
Availability | Widely available indicators should be used (publication counts, journal impact factors, h-indices) | Use tailor-made indicators appropriate for the |
Validity | Indicators measure well what they are supposed to measure; no confirmation from other sources is needed | Indicators should be |
Evaluative significance | The aspects measured by the indicators constitute appropriate evaluation criteria | An |
Unit of assessment | Evaluate individual researchers | Do |
Ordering principle | The higher the score, the better the performance | Use indicators to define |
Policy decision criteria | The best overall performer receives the largest support | Fund institutions according to the number of |
A multi-level application model of informetric indicators of research performance. While institutions in their internal assessment and funding processes combine indicators and expert knowledge to evaluate individuals and groups, a meta-university agency aggregates indicators at the level of an institution, and assesses the evaluation and funding
With respect to the application of informetric data and indicators, Figure
Figure
The
The genuine challenge in the responsible and fair use of informetric indicators of research performance in an academic environment does not primarily lie in the further sophistication of indicators, but in the ability to establish external, independent and knowledgeable entities who monitor the evaluation processes within institutions, acknowledging that it is the primary responsibility of the institutions themselves to conduct quality control.
The proposed model sketches the main lines of two ways in which bibliometric methods could be used in academic research assessment. A key notion underlying the model is that a meta-institutional agency could not possibly carry out evaluation at the level of groups and individual researchers without recourse to a set of single-measure indicators that would ultimately be used in mechanical ways; what is needed at this level is expertise, time, and sufficient resources to conduct an assessment that is qualitative in nature, and possibly informed by quantitative indicators.
The current paper does not present a full discussion of the conditions under which the model can be applied, and the contexts in which it fits best. It is primarily the research policy domain that deals with proper conditions and contexts. Large differences exist in academic research funding models among countries. For instance, in the US research funding tends to be dispersed based on competition among individual researchers and not principally through block grants to institutions that can be found in many continental European countries. It seems more recommendable to implement the proposed model in the latter group of countries than in the first, even though certain elements could be useful in both groups. Of course, the challenge in implementing the model is attaining a relationship of trust between the research funder and the academic research community. The proposed model will not work if there is a fundamental lack of trust between these two parties, but it can further increase trust once it is implemented.
The paper does not specify in detail which bibliometric/informetric methodologies and indicators are the most appropriate. As argued in Section 2, the choice of informetric indicators depends upon what is being assessed, and which criteria are to be used, and also upon the purpose of the assessment and its broader context. A set of indicators useful in one country may be less appropriate in another. For instance, in some countries there is a growing liaison between academic and industry research, driven through government funding. In other countries, hosting large publicly funded institutes of applied research, government priorities in the funding of academic research may have a somewhat different focus. An Expert Group on the Assessment of University-Based Research Expert Group (
The current author holds the position that the supra-institutional agency that is responsible for the assessment of internal quality processes should give a qualitative judgement, even though quantitative indicators may help the agency to form and motivate its judgement. Also, metrics can help the agency to ask relevant, critical questions to institutional managers about their internal processes. These types of use are
The model proposed is based on the insight that internal evaluations within institutions, combining indicators and expert knowledge and including views of external experts, constitute more favourable conditions for a proper use of the indicators than purely informetric-statistical use by external, meta-institutional agencies do. The model is
But this
An important issue is the optimal relationship between institutional and meta-institutional or national assessment processes. Is it defensible that the two levels apply the same indicators? How to minimize possible negative effects of meta-institutional assessment procedures upon intra-institutional behaviour and evaluation? These questions concern the effects that the application of indicators at the meta-institutional level may have upon the assessment and research practices within the institution.
The issue at stake is not whether the application of metrics lead to changes in these intra-institutional practices, but whether or not such changes reflect a genuine enhancement of the performance of an institution. If there is solid evidence that certain quantitative criteria applied by a meta-institutional agency induce at the intra-institutional level a systematic, strategic behaviour aimed to obtain a high score rather than a substantive quality enhancement, the system is counter-productive. The same conclusion holds when research groups within institutions, aware as they are of the funding formula applied at the meta-institutional level, claim an amount of funding within their institution proportional to the contribution they make to the parameters in the formula.
On the one hand, institutional self-profiling is essential. The model presupposes that institutions define their own profile and targets. For instance, an institution’s disciplinary specialization is an important characteristic; the same is true for its role in regional development. As indicated in Table
Benchmarking, this means comparing an institution with other institutions with similar profiles, can be expected to play an important role in the assessment of an institution’s quality and funding processes. In order to treat each institution in the same manner, the selection of appropriate benchmarks is crucial. Informetric tools can be useful to suggest potential benchmarks. These tools generate indicators, not of the standard publication output or citation impact, but, for instance, of the degree of similarity among the orientations of a wider set of institutions. This is another example of how bibliometric-informetric indicators can be properly used in academic research assessment.
Full version of a paper entitled “The Application Context of Research Assessment Methodologies” presented by the author during the conferral of a doctorate honoris causa from the Sapienza University of Rome, 5 September 2019, and at the ISSI2019 conference in Rome on the same day.
Table 1 in Section 2 of the current paper is a summary of Tables 3.2 and 3.3 in Moed (
For more details, the reader is referred to a report of the Expert Group on the Assessment of University-Based Research, in which the notion of a multi-dimensional research assessment matrix was introduced (
Fractional counting means for instance that if a paper is published by n authors, it is assigned for a portion 1/n to each author. But more sophisticated fractional counting schemes have been proposed as well, for instance, assigning a fraction 1/2 to the first author of a paper, 1/4 to the last author, and dividing the remaining quarter equally among the other co-authors. As a response to the difficulties with defining authorship in science, several scientific publishers implemented a contributorship model, according to which authors specify their precise role in the research described in a paper, using a classification pre-defined by the publisher. See for instance the website of the Council of Science Editors for more information (
Indicator normalizations are statistical tools to correct for particular “disturbing factors” or biases. A typical example is a citation impact indicator that corrects for differences in citation practices among scientific subfields, by dividing the citation rate of papers published by a group or journal by the world citation average in the disciplines covered by that group or journal. In this way groups or journals in mathematics may have normalized scores similar to those of molecular biologists, while citation levels in the latter discipline are much higher than in the former. For an extensive review on field normalization, see
The journal impact factor calculated and published by Clarivate Analytics (formerly Thomson Reuters and Institute for Scientific Information) in the Journal Citation Reports (JCR) is defined as the average citation rate in a particular year of articles published in the two preceding years. The San Francisco Declaration on Research Assessment (
The concept of “Altmetrics” is introduced in an Altmetrics Manifesto published on the Web in October 2010 (
The use of the term “as a whole” does not preclude the possibility to conduct assessments on a discipline-by-discipline basis, evaluating, for instance, all research in humanities, or all research in biomedicine.
In the Flemish academic system, universities are free to spend a part of their basic funding (Special Research Funds, in Dutch: Bijzonder Onderzoeksfonds (BOF)) applying their own criteria. More information on this system is given by Marc Luwel in a forthcoming contribution to this journal.
Nowadays many countries have performance-based funding processes, distributing (a part of) funds among universities. See for instance OECD (
Research assessment has both a formative and a summative function. In summative evaluation the focus is on the outcome of a process or program, such as a final judgement or a vote. Formative evaluation assesses the unit’s development at a particular time, and primarily aims to improve its performance. Several researchers have rightly underlined the value of formative evaluations, and consider the use of bibliometric or altmetric indicators predominantly as formative tools. But especially in the context of research funding, research assessment has a summative aspect as well. This is true for a supra-institutional agency assessing internal evaluation and funding processes, but also of the intra-institutional assessment of individuals and groups. But this does not imply that assessment outcomes are necessarily expressed in numbers.
Empirical research on the effects of the use of indicators in performance-based funding shed light on whether these practices have played an important role. More and more studies are being published on this subject. See for instance a thorough review by
The author wishes to thank Dr Marc Luwel, former director of the Netherlands-Flemish Accreditation Organization in The Hague, for stimulating discussions and for his useful comments on an earlier version of this paper. He is also grateful to the two reviewers for their valuable comments.
The author has no competing interests to declare.