The top 100 most-cited papers of all time

I wrote earlier about the 50th anniversary of the Science Citation Index. Recently, Nature got together with Thomson-Reuters, the publishers of the Science Citation Index (now usually known as the Web of Science), to come up with a list of the 100 most-cited papers of all time.1 It’s an interesting list, which I encourage you to take a look at. Let’s face it: top-100 lists are always fun. Who is in there? Who is not? The Nature article provides a few reflections on this. For my part, I’m going to look at what this list tells us about citation patterns in different areas of science, focusing particularly on an area of science I know well, namely density functional theory, and one with which I have a tangential acquaintance, NMR.

There are, as the Nature article pointed out, a large number of papers in the top 100 from the field of density-functional theory (DFT). I may have missed some, but here are the ones I noticed: Lee, Yang and Parr (1988)2 at #7, Becke (1993)3 at #8, Perdew, Burke and Ernzerhof (1996)4 at #16, Becke (1988)5 at #25, Kohn and Sham (1965)6 at #34, Hohenberg and Kohn (1964)7 at #39, Perdew and Wang (1992)8 at #93, and Vosko, Wilk and Nusair (1980)9 at #96.

So what is DFT, anyway? One of the great problems in electronic structure calculations for molecules is electron correlation. Electrons repel, so they tend to stay away from each other. Classic methods of electronic structure calculation don’t properly take electron correlation into account. There are ways to put electron correlation back in after the fact, but they’re either not very accurate, or they take a huge amount of computing. Another problem arises because of exchange, a strange quantum mechanical effect that causes identical electrons with the same spin to stay away from each other moreso than is the case due to simple electrostatics (i.e. more than would be the case for electrons with opposite spin). DFT is based on some theory developed by Kohn in the 1960s (in papers #34 and 39 from Nature‘s list) that essentially states that there is a functional of the electron density that describes electron correlation and the exchange interaction exactly. Modern DFT is based on approximating this functional (usually using separate correlation and exchange parts) semi-empirically. Using good DFT exchange and correlation functionals allows us to do very accurate electronic structure calculations much more quickly than is the case with older methods. The one catch is that we don’t really know what the exchange and correlation functionals should be, so there’s a lot of work to be done coming up with good functionals and validating them. Nevertheless, the current crop of functionals does a pretty good job in many cases of chemical interest.

To understand the DFT citation patterns a bit better, I used the Web of Science to count up the number of times each of these papers was cited with one of the others. Here’s what I found:

LYP 88 Becke 93 PBE 96 Becke 88 KS 65 HK 64 PW 92 VWN 80
LYP 88 48653 33303 3498 17608 3305 2917 2114 5320
Becke 93 48041 3266 11118 2718 2499 2469 4284
PBE 96 38281 2948 5405 5040 2576 1647
Becke 88 27370 2734 2332 2246 5821
KS 65 23840 15129 2028 1955
HK 64 22608 1750 1656
PW 92 13173 1260
VWN 80 12862

Hopefully the code I’m using here is clear enough: LYP 88, for example, is Lee, Yang and Parr (1988). The entries on the diagonal are the total numbers of citations to the corresponding papers. This matrix is necessarily symmetric about its diagonal, so I didn’t fill in the entries below the diagonal. Note that the total citations for each paper differ somewhat from those reported in Nature‘s spreadsheet because I performed my analysis at a later point in time, and these papers continue to accumulate citations at an astonishing rate.

A few numbers jump out from this table: The top two DFT papers, Lee, Yang and Parr (1988) and Becke (1993), are cited together with very high frequency: 68% of the papers citing Lee, Yang and Parr (1988) also cite Becke (1993). Although cited together slightly less often, Becke (1988) is also frequently co-cited with Lee, Yang and Parr (1988): 36% of the papers citing the latter also cite Becke (1988). Now if we ask how many of the papers citing Lee, Yang and Parr (1988) also cite at least one of the Becke papers, we find that an astonishing 85% do. This is, of course, not a random occurrence. One of the most popular exchange-correlation functionals around, B3LYP, combines Becke’s 1988 exchange functional, which was further studied in his 1993 paper, with the Lee, Yang and Parr correlation functional. People who use the B3LYP functional in calculations will usually cite Lee, Yang and Parr (1988) along with at least one of the Becke papers. So if one of these papers was to appear in the top-100 list, it was likely that all three would, as they do. The appearance of these papers in the top-100 list is therefore a testament to the heavy use made of the exchange-correlation functionals developed by these authors in the chemical literature. In fact, all of the DFT papers in the top-100 list describe functionals that are heavily used in applications, except for the Kohn papers which provided the underlying theory.

One of the points made by the authors of the Nature article is that papers that describe methods get cited much more than papers that introduce new ideas into science. So why do the Kohn papers appear in this list? I would argue that this is due to a quirk of citation among people who do DFT calculations. The vast majority of citations to these papers are by people who do DFT calculations, not by people further developing the Hohenberg-Kohn-Sham theory. To fully understand how strange this is, we have to consider that the overwhelming majority of people doing DFT calculations and citing these papers use software written by someone else, usually commercial software like Gaussian. Ordinary users of a computational method don’t usually “dig down” to the theory layer in their citations in this way. For example, the vast majority of modern quantum chemical calculations (including most DFT calculations) are based on Roothaan’s classic work on self-consistent-field calculations.10 These papers have been cited, respectively, 4535 and 1828 times. This is an extremely high citation rate, but it’s a tiny fraction of the literature reporting calculations based on Roothaan’s algorithms. So it’s a bit strange that Kohn’s work gets cited by DFT users at this high rate, particularly since we can find other foundational papers in quantum chemistry, such as Roothaan’s that are not as routinely cited.

Now let’s contrast the citation record of DFT with that of NMR. NMR is nuclear magnetic resonance. NMR spectroscopy is used on a daily basis by every synthetic chemistry group in the world, and by many physical and analytical chemistry laboratories as well. Although they will typically back up NMR measurements with other techniques, NMR is how chemists identify the compounds they have made, and determine their structures. One would think that we would see papers that describe fundamental NMR techniques or popular experiments make this list. They don’t. There is a single NMR-related paper in the list, one that describes a software program for analyzing both crystallography and NMR data, showing up at #69. That’s it. So why is that? It’s certainly not that there are more DFT papers than there are papers that use NMR. In fact the reverse is certainly true. However, when experiments become sufficiently common, chemists stop citing their original sources. I was just looking at a colleague’s paper in which he mentioned six different NMR experiments in addition to the usual single-nucleus spectra. A literature reference was given for only one of these experiments, presumably because he felt the others were sufficiently well-known that they didn’t need references. The equivalent practice in DFT would be not to cite anything when using the B3LYP functional, on the basis that everybody knows this functional. That’s quite a difference in citation practices between two different areas of chemistry! And the fascinating thing is that these two fields have overlapping membership: There are lots of synthetic chemists who do DFT calculations to support their experimental work. And for some reason, they behave differently when describing DFT methods than when describing NMR methods.

To understand the vast difference in citation practices between these two areas, let’s look at a specific example. In many ways, two-dimensional NMR experiments, in which signals are spread along a second dimension that encodes additional molecular information, very much parallels DFT: These methods were developed at about the same time, and hardware that could carry out these operations routinely became available to ordinary chemists around the same time in both fields, and they both opened up what could be done in their respective fields. The first two-dimensional NMR experiment, COSY, was first proposed in 1971 by Jean Jeener.11 It’s not entirely trivial to hunt down citations to papers in conference proceedings in the Web of Science because they are not cited in any consistent format. However, after doing a bit of work, and including the reprinting of these lecture notes in a collection a few decades later, I found approximately 352 citations to Jeener’s epoch-making paper. Compare that to the 23840 citations to the Kohn-Sham (1965) paper. One could argue that Jeener’s paper was published in an obscure venue, and that this depressed the number of citations to this paper, which is certainly plausible.  Jeener’s proposal was implemented by Aue, Bartholdi and Ernst in 1976.12 That paper was cited 2919 times, which is a far cry from the number of citations accumulated by the Kohn papers, or by the “applied” DFT papers in which practical functionals are described. Kohn shared the 1998 Nobel Prize in Chemistry. Ernst was awarded the 1991 Nobel Prize in Chemistry. There are a lot of ways in which the two contributions are comparable. But not in citation counts. And clearly, it’s not a matter of the popularity of the methods: I used the ACS journal web site to see how many papers in the Journal of Organic Chemistry mentioned the COSY experiment. The Journal of Organic Chemistry is a journal that, by its nature, contains mostly papers reporting the synthesis and characterization of compounds, so it’s a good place to gauge the extent to which an experimental method is used. In that one journal alone, 6351 papers mention COSY. To be fair, some of these references will be to descendants of the original COSY experiment (of which there are many), but the very large number of COSY papers and the relatively small number of citations to the early papers on COSY still speaks to wildly different citation cultures between NMR and DFT practitioners.

None of this is intended to denigrate the work of the excellent scientists whose papers have made the top-100 list. They clearly deserve a very large pat on the back. However, it does show that we have to be extraordinarily careful in comparing citation rates even between very closely related fields. And these rates will of course also affect citation-based metrics like the h-index, perhaps not in extreme cases like the highly cited papers mentioned here, but certainly in the case of authors whose papers are well cited, if not insanely well cited.

In the interests of full disclosure: Axel Becke, whose name features so prominently in the top-100 list and in this blog post, supervised my senior research project when I was an undergraduate student at Queen’s. My first scientific paper was coauthored with Axel.13 In fact, I may have benefited from the higher citation rates in DFT as this paper is by far my most cited paper. I sometimes joke that my career has all been downhill since this very first scientific contribution. But to figure out if that was true, we would have to take the citation practices of the various areas I’ve worked in into account…

1R. van Noorden, B. Maher and R. Nuzzo (2014) The top 100 papers. Nature 514, 550–553.

2C. Lee, W. Yang and R. G. Parr (1988) Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density. Phys. Rev. B 37, 785–789.

3 A. D. Becke (1993) Density-functional thermochemistry. III. The role of exact exchange. J. Chem. Phys. 98, 5648–5652.

4J. P. Perdew, K. Burke and M. Ernzerhof (1996) Generalized gradient approximation made simple. Phys. Rev. Lett. 77, 3865–3868.

5A. D. Becke (1988) Density-functional exchange-energy approximation with correct asymptotic behaviour. Phys. Rev. A 38, 3098–3100.

6W. Kohn and L. J. Sham (1965) Self-consistent equations including exchange and correlation effects. Phys. Rev. 140, A1133–A1138.

7P. Hohenberg and W. Kohn (1964) Inhomogeneous electron gas. Phys. Rev. 136, B864–B871.

8J. P. Perdew and Y. Wang (1992) Accurate and simple analytic representation of the electron-gas correlation-energy. Phys. Rev. B 45, 13244–13249.

9S. H. Vosko, L. Wilk and M. Nusair (1980) Accurate spin-dependent electron liquid correlation energies for local spin-density calculations — a critical analysis. Can. J. Phys. 58, 1200–1211.

10C. C. J. Roothaan (1951) New developments in molecular orbital theory. Rev. Mod. Phys. 23, 69–89; (1960) Self-consistent field theory for open shells of electronic systems. Rev. Mod. Phys. 32, 179–185.

11J. Jeener (1971) “Lecture notes from Ampere Summer School in Basko Polje, Yugoslavia. Reprinted in NMR and More in Honour of Anatole Abragam, Eds. M. Goldman and M. Porneuf, Les editions de physique (1994).

12W. P. Aue, E. Bartholdi and R. R. Ernst (1976) Two-dimensional spectroscopy. Application to nuclear magnetic resonance. J. Chem. Phys. 64, 2229–2246.

13A. D. Becke and M. R. Roussel (1989) Exchange holes in inhomogeneous systems: A coordinate-space model. Phys. Rev. A 39, 3761–3767.