A critical discussion is presented for the Author Metrics Database (AMD) created by Ioannides et al. (
Scopus diverges from Eugene Garfield’s original concept of the Science Citation Index, as citation impact plays a weaker role as journal selection criterion.
The transparency of the Article Metrics Database (AMD) is seriously hampered by the lack of information on whether the data were verified by scientists themselves.
A complex composite indicator in the AMD decides whether or not a particular author is included. Its components are strongly statistically dependent and are largely based on the position an author has in a paper’s author sequence but lack a sound theoretical foundation.
An assessment of an individual researcher cannot be merely based on whether or not he or she is included in the AMD.
The issue as to how to deal with multi-authored papers in research assessment of individuals can to some extent be enlightened by bibliometric indicators but cannot be solved bibliometrically. This is why the Composite Indicator suggests a false precision.
The AMD focuses almost exclusively on senior scientists. Early career scientists and emerging research groups who will shape science and scholarship in the near future hardly appear in the AMD.
Desktop bibliometrics using the AMD as a sole source of information must be rejected. Using the AMD as a starting point in a more extensive bibliometric data collection makes it
An alternative approach is an interactive, bottom-up bibliometric tool designed for self-assessment and educational purposes, showing how bibliometric indicators depend upon the way in which initial publication lists, author benchmark sets, subject delimitations, thresholds and evaluative assumptions are chosen.
Research assessment is much more than just bibliometrics. It requires an overarching evaluative framework based on normative views on what constitutes research performance and which policy objectives should be achieved.
Recently, Ioannidis, Boyack & Baas (
The AMD is beyond any doubt a valuable data source for further bibliometric research. Even though the current paper does present bibliometric analyses obtained from a statistical analysis of this database, it focuses on a different issue: What is the value of the AMD for the assessment of research performance of individual researchers? Is the information in the science-wide database actually useful? If so, in which ways?
On the one hand, the current paper profits from the transparency maintained by the creators of the AMD. On the other hand, it argues that transparency on several important issues is lacking and proposes ways to improve it. It fully acknowledges the importance of taking into account differences among subject fields and aims to fully live up to the authors’ warning that “assessing citation indicators always require caution” (
The current article discusses the “science-wide” AMD and the indicators it contains at two distinct analytical levels. Firstly, at the level of
Many scientific information scientists and research assessors may connect a citation index of scientific literature with Eugene Garfield’s vision of a multi-disciplinary core set of scientific journals selected on the basis of their citation impact, covering the best journals in science and forming the basis of his Science Citation Index (SCI), a scientific literature database launched in 1963. Soon a practice emerged that used the SCI not only for literature retrieval but also for research assessment, under the assumption that the appearance of a journal, scientific author or institution in the index can be interpreted as a sign of research quality. On many occasions, Garfield warned against over-interpretation and misuse of citation-based indicators in research assessment.
Scopus diverges from Garfield’s original model, as citation impact is not the only journal selection criterion.
Article production, national orientation and uncitedness of Scopus journals active in 2019. Active in 2019: Scopus has indexed at least one document published by the journal in 2019. The following four document types are included in the publication counts: articles, proceedings papers, reviews and short surveys. INO-P: Index of National Orientation, as expressed in the geographic location of the authors publishing in a particular journal. A journal has INO-P > 80 if there is one country accounting for more than 80 percent of all papers published in that journal. JIF3: Journal Impact Factor, based on the three-year impact factor, defined as the number of citations in a particular year (e.g., 2019) to articles published in a journal during the three preceding years (e.g., 2016-2018) divided by this number of articles.
NO. JOURNALS ACTIVE IN 2019 | AVERAGE NO. ARTICLES PER JOURNAL | % JOURNALS WITH INO-P > 80 | % JOURNALS WITH JIF3 < 0.1 | % JOURNALS WITH JIF3 < 0.2 |
---|---|---|---|---|
23,200 | 108 | 23% | 7% | 14% |
The last column in
The effect that the inclusion of nationally oriented and/or low-impact journals may have upon citation-based author metrics depends upon the type of indicators calculated. One can distinguish two main types that are sometimes denoted as size dependent and size independent or, in terms of the key statistic calculated, as average based and sum based. A third type includes hybrid indicators, which combine elements from the size dependent and size independent approaches.
Three main types of indicators.
INDICATOR TYPE | EXAMPLES | RATIONALE |
---|---|---|
Size-independent/average-based | Citations per article; Journal Impact Factor | “In view of the relation between size and citation frequency, it would seem desirable to discount the effect of size when using citation data to assess a journal’s importance” ( |
Size-dependent/sum-based | Total citation counts; Integrated Impact Indicator | “The common assumption in citation impact analysis hitherto has been normalization to the mean. In our opinion, the results are then necessarily flawed because the citation distributions are often highly-skewed. Highly productive units can then be disadvantaged because they publish often in addition to higher-cited papers also a number of less-cited ones which depress their average performance.” ( |
Hybrid (contains elements from both approaches) | H index | Performance must reflect both publication productivity and citation impact. Publication counts alone “do not measure importance nor impact of papers”; total citations “may be inflated by a small number of ‘big hits’, which may not be representative of the individual if he/she is coauthor”; citations per paper “rewards low publication productivity, penalizes high productivity.” ( |
Numerous experiences collected in the past decennia with the calculation of bibliometric indicators at the level of individuals have shown that the identification of all publications of a given individual researcher in a scientific literature database is highly sensitive to errors. The most important source of error is the occurrence of homonyms—different people with the same name, e.g., Smith, Jones, Lee, Liu, Andersen—and synonyms—different names for the same person, for instance, due to differences between full first name and nicknames, mixing up first name and family name, different transliterations of Cyrillic and other non-Latin names and name changing if a person assumes the name of a partner.
Although Ioannidis-2016 states that “Scopus author IDs were used for all author-based analyses,” they do not provide any information on how these Scopus IDs are created. Ioannidis-2020 refers to an article by Baas et al. (
However, the AMD does
The AMD indicates for each author an institutional affiliation, derived from an author’s
The current paper focuses on a composite indicator that plays a key role in the inclusion of authors in the AMD and their ranking. It is presented in
Composite indicator in the Author Metrics Database (AMD). Source: Ioannidis, Boyack & Baas (
Data were obtained from the dataset Table-S7-singleyr-2019. The key statistics are based on the absolute values of the indicators, and the correlation coefficients on their logarithmic values. NC: Total number of citations from 2019. H: H Index for the year 2019. Hm: Hm Index for 2019. NCS: Number of citations to single-authored papers. NCSF: Number of citations to single- and first-authored papers; NCSFL: Number of citations to single-, first- and last-authored papers. NP: Total number of publications between 1960 and 2019. Calculations are based on all 161,441 authors in the single-year 2019 dataset. Pearson correlation coefficients between six indicators included in the Composite Indicator (single-year dataset for 2019).
KEY STATISTICS | PEARSON CORRELATION COEFFICIENTS | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
VAR | MEAN | MEDIAN | MIN | MAX | VAR | LOG NC | LOG H | LOG HM | LOG NCS | LOG NCSF | LOG NCSFL | LOG NP |
NC | 1,148 | 719 | 13 | 67,118 | log NC | 1.00 | 0.92 | 0.57 | -0.24 | 0.23 | 0.71 | 0.57 |
H | 14 | 13 | 1 | 99 | log H | 0.92 | 1.00 | 0.65 | -0.26 | 0.21 | 0.66 | 0.49 |
Hm | 6.8 | 6.4 | 0.34 | 45.8 | log Hm | 0.57 | 0.65 | 1.00 | 0.17 | 0.28 | 0.73 | 0.46 |
NCS | 37.5 | 10 | 0 | 13,437 | log NCS | -0.24 | -0.26 | 0.17 | 1.00 | 0.24 | 0.02 | -0.09 |
NCSF | 208.1 | 137 | 0 | 28,269 | log NCSF | 0.23 | 0.21 | 0.28 | 0.24 | 1.00 | 0.44 | -0.12 |
NCSFL | 504.1 | 336 | 10 | 46,567 | log NCSFL | 0.71 | 0.66 | 0.73 | 0.02 | 0.44 | 1.00 | 0.42 |
NP | 180.0 | 134 | 2 | 4460 | log NP | 0.57 | 0.49 | 0.46 | -0.09 | -0.12 | 0.42 | 1.00 |
The citation indicator in the first component,
One may argue in favor of the Composite Indicator that an indicator based on
As in
Allowing users to assess distinct categories of papers makes sense, even though it is argued below that indicators based on author sequences have a limited validity. In addition, Ioannidis-2016 states that equal weights were given to all six log-transformed indicators included in the composite for parsimony reasons and that “if, for whatever reason, one or more of these indicators are considered more essential in a particular field, one can weigh them more compared with the others.” However, it is questionable whether this consideration provides sufficiently valid grounds for including statistics for series of partially overlapping sets in a composite indicator that plays such an important role in the AMD. After all, it is the measure on which authors are ranked and is used to expand the AMD beyond the set of the top 100,000 authors.
All six indicators included in the composite measure are log transformed and standardized. Ioannidis-2016 argue that “log-transformations ensure that there are no major outlier values.” Their standardization method gives a value of 1 to the author with the highest raw value for a particular indicator. Ioannidis-2020 rightly underlines that “comparisons of citation metrics are more meaningful when done within the same subdiscipline.” However, their standardization method uses the highest raw value across all subject fields, while there are good reasons to use subject field-dependent highest raw values.
As expected, each indicator reveals substantial differences in these maximum values across subject fields. Calculating for each author a new composite measure based on maximum values
The underlying basic assumption of the AMD is that one can derive an indication of the contribution an author has made to a multi-authored paper from the paper’s author
The situation becomes more complex when two collaborating research groups make equal contributions. If the two supervisors obtain the semi-last and last positions and the two PhD students the first and second, there is no reason to give a higher weight to the first and last authors only. The only currently available model for author weighting in multi-team collaboration gives a special status to the research group delivering the
One may claim that even if a uniform author weight parameter may be inadequate in individual cases, deviations from an assumed “true” author weight tend to cancel out if an assessed author has published a sufficiently large number of papers. However, this argument is invalid especially in the case of citation analysis, in which citation distributions are known to be skewed and only few papers are responsible for the biggest part of an author’s or a group’s citation impact. The key question then is: what is the contribution of the various author to
There is evidence that especially in Mathematics and Social Sciences & Humanities, distinct authoring conventions exist, based on lexicographical ordering of authors or on rotating first authorship. In this case, there is no justification for giving a special status to the first and last authors. This limitation is also mentioned in Ioannidis-2016. Using data from the AMD, the current author observed an over-representation of first authors in the upper part of the lexicographically ordered full author list in Visual & Performing Arts (18%), Philosophy & Theology (7%), Communication & Textual Studies (3%) and Mathematics & Statistics (2%).
Ioannidis-2020 states that in a first step, the top 100,000 authors are selected across all subject fields based on the Composite Indicator. In a second step, this set is complemented with authors not among the top 100,000 but still among the top 2 percent of their main subject field and publishing at least five papers. Although Ioannides et al. (2020) put the Composite Indicator into perspective by underlining that different components may be included or that different weights may be assigned to an indicator, it is clear that the Composite Indicator as defined in
One type of use of the AMD in the assessment of an individual researcher, for instance, for hiring or promotion purposes, is to look up the author entry in the AMD with the same name as the assessed researcher. Next, an assessment criterion is defined, for instance, being included in the AMD or having a Composite Indicator score in the top quartile of this indicator’s distribution. Finally, a decision is made purely on the basis of the thus-obtained outcome, without taking into account any other sources of information.
The AMD creators rightly point out that multiple co-authorship is a rule rather than an exception, especially in the natural and life sciences. As a consequence, publications (co-)authored by an individual researcher are often, if not always, the result of research to which other scientists have contributed as well, sometimes even dozens of them. The crucial issue is how one should relate the citation impact of a
The current author defends the position that a valid assessment of the research performance of individuals can be properly made only on the basis of sufficient knowledge of the particular role they played in the research presented in their publications, for instance, whether this role has been leading, instrumental or technical. In addition, other manifestations of research performance should be taken into account as well. Calculating indicators at the level of an individual and claiming they measure by themselves an individual’s performance, statistically sophisticated as they may be, suggests an accuracy of measurement that cannot be justified. This is especially also true for the AMD Composite Indicator. Ultimately, its validity does
The very existence of a database with “top” researchers invites evaluators and other interested users to use the information for their own evaluative purposes. The AMD creators explicitly refer to the entities analyzed in the AMD as
What is more, it would have been much more appropriate to include only scientists whose algorithmically generated publication lists were verified and who explicitly have given their consent. The fact that the statistical de-duplication of author names and assignment of documents has already taken place in Scopus and has not been contested by subjected scientists does
The analysis of Scopus content coverage revealed that this database indexes a substantial number of nationally oriented journals with a low citation impact. Although there is evidence that once they are indexed in Scopus, many of these journals internationalize and increase their citation impact (
The AMD focuses almost exclusively on
It was argued above that the issue as to how one should deal in research assessment with multi-authored papers can to some extent be
The creators of the AMD have indeed made an important step toward a bibliometric assessment tool by creating a rich intermediary dataset with bibliometric indicators, open for further analysis to all interested people. Although the Composite Indicator is the key measure, its components can be used as separate indicators as well. In addition, the AMD contains other interesting features not discussed in the current paper, such as the possibility to analyze citation counts including or excluding author self-citations.
Technically, it seems feasible to add in a follow-up version of the AMD the verification status of the publication data relating to a particular author, as this information is available in the Scopus system, or to include only authors who have validated their data. However, making the database interesting for ECRs by increasing the number of included authors and adding size independent indicators seems hardly doable within the framework of the current AMD model. The current author would like to broaden the perspective and bring in
Obviously, research assessment is much more than just bibliometrics. Research assessment requires an overarching evaluative framework based on normative views on what constitutes research performance and which policy objectives should be achieved. Informetricians should comply in their scientific work with the methodological principle to maintain a neutral position toward an assessment’s constituent policy issues, the criteria specified in the evaluative framework and the goals and objectives of the assessed subject. As professional experts, their competence lies primarily in the development and application of
The paper by Ioannidis, Boyack & Baas (
The analyses on Scopus coverage presented in this section were created by the current author using a dataset derived from Scopus kindly provided by Prof. Felix de Moya-Anegon and Prof. Vicente Guerrero-Bote from the Scimago Research Group, Spain. These are partly based on Moed et al. (
The 8,300 journals indexed in Scopus at least in one year from 1996-2018 but not active in 2019 tend to have a stronger national orientation and a lower citation impact than periodicals active in 2019 have. This outcome suggests that in a process of re-assessment of its content coverage, the Scopus team decided to remove especially nationally oriented, low-impact journals.
The percentage of journals with INO-P > 80 ranges between 12% in biomedical research to 25% in clinical medicine. The percentage of journals with JIF3 < 0.1 ranges between 3% in natural sciences to 10% in humanities and social sciences.
According to
Baas et al. (
Striking differences can be observed between the correlations obtained in
It follows that the subfield-normalized composite measure explains only 60 percent of the variance (R-square) in the Ioannidis-2020 composite indicator and the main field-based measure 74%. Ioannidis-2016 and Ioannidis-2020 are fully aware that their composite indicator does not account for differences among subject fields and that one should interpret rankings based on this measure only on a field-by-field basis, comparing an author with authors from the same subject field. The observed field dependence of their standardization has
The hypothesis of Moya et al. (
The current author adopted the following approach. In a first step, all authors were divided on a field-by-field basis into two groups of approximately equal size based on the first character of their last name. Overall, 48.5 percent of author names started with characters A-K, and 51.5 percent with characters L to Z, but there are differences among subject fields. If lexicographical ordering of authors plays a role in a field, one would expect to find among first authors a higher fraction of authors whose names start with A-K than there are in the total population of authors publishing in that field. Using the Science-Metrix main field classification, an over-representation of A-K first authors was found for the fields mentioned in the main text. For all other fields, it was zero. The outcomes do not allow one to estimate the actual number of papers using alphabetical authorship. It must be noted that an observed alphabetical order in a paper does not necessarily imply that the authors decided to order their names alphabetically.
Using a qualification often used in other domains of society, one could facetiously characterize the AMD as a dataset primarily about “(baby) boomers.”
See, for instance, the notes on size dependent versus size independent indicators related to the Leiden Ranking, available at
The application should be created by experienced and independent researchers assisted by professional developers and fully free to share whatever information they find relevant during the development process with the wider research community, including theoretical issues related to its functionality, preliminary functional designs, mock-ups, intermediate versions and the formation of test and focus groups. It would be essential to conduct well-designed tests of the tool and to monitor its use after its introduction. If it would appear that the tool leads to an even more inconsiderate use of bibliometrics in research assessment, its introduction would be counter-productive, and its design may have to be reconsidered.
The author wishes to thank two anonymous reviewers for their valuable comments on an earlier version of this paper.
The author has no competing interests to declare.
Former senior staff member (1981-2009) at the Centre for Science and Technology Studies, Leiden University, and former senior scientific advisor (2010-2014) with Elsevier, Amsterdam, the Netherlands. The author is currently an independent researcher and a scientific advisor to the SCImago Research Group, Spain.