Category Archives: Scientific miscellanea

Three ways to write a book

My third textbook, Foundations of Chemical Kinetics: A Hands-On Approach, was just published a few weeks ago. (The link in the previous sentence will be convenient if your institution has a subscription to the IOP books. If you want to buy a copy, try https://store.ioppublishing.org/page/detail/Foundations-of-Chemical-Kinetics/?K=9780750353199. And if you’re thinking of adopting this book in one of your courses, the latter page also contains a link to order an inspection copy.) The publication of this book caused me to start thinking about the very different ways my three books have come about. Those of you thinking of writing a book, or just curious about the book writing process, may want to continue reading. Otherwise, you can wait for the next installment in this increasingly irregular blog.

Adagissimo: A Life Scientist’s Guide to Physical Chemistry

When I arrived at the University of Lethbridge, I was handed the second-year thermodynamics course as one of my teaching assignments. When choosing the textbook for this course, I made the proverbial rookie mistake: I picked a book I liked, forgetting that the book is for the students, and not for me. Among other things I didn’t think about were: What background do the students bring to this class? What are their likely scientific interests? And of course, how might my perspective differ from that of a student taking their first steps in themodynamics? I won’t say what book I picked because these were my mistakes and not the author’s. Let’s just say that I picked a book that would have been great for students in a specialist program with more of a mathematics background than is required of the group of Chemistry and Biochemistry students in my class.

Once I realized my mistake, I started looking more seriously at the range of physical chemistry textbooks on the market. I was looking for a book that was approachable, that included some biochemical topics (because the majority of the students in the course were Biochemistry majors), but that still didn’t compromise too much on rigor. I wanted students to understand the principles of thermodynamics, and not just learn some formulas and how to apply them. I eventually settled on Tinoco, Sauer and Wang’s Physical Chemistry: Principles and Applications in Biological Sciences, an excellent book that, I thought, struck roughly the right balance. TSW was the required textbook in my thermo course for about three years.

I honestly don’t remember exactly when I decided to write my own book. I do remember thinking that what I was writing was a set of course notes, and that I needed to write these because Tinoco, Sauer and Wang didn’t go into quite enough depth on some topics that I thought were important. The idea that this could be a publishable book didn’t enter my head for quite a while. From the time stamps on the files, it looks like I started writing sometime during my first term of teaching in the Fall of 1995. By the Spring of 1998, the third time I taught the course, I had written a set of notes titled Practical Thermodynamics that I distributed through the UofL bookstore and that was offered to students as a supplement to Tinoco, Sauer and Wang. By the Fall of 1999, the roles were reversed: my book was required and Tinoco, Sauer and Wang was recommended. By the Fall of 2001, I had stopped recommending a second textbook.

In the meantime, I had also started teaching our chemical kinetics course, which was also a second-year offering. I used Tinoco, Sauer and Wang in Spring 2000, when I first took over the course, then switched to Laidler and Meiser’s Physical Chemistry the next year. (Laidler was a kineticist, so it shouldn’t be a great surprise that the kinetics in his physical chemistry textbook is excellent.) By 2002 I was distributing a self-written textbook through the bookstore entitled Chemical Kinetics, and not requiring a traditional textbook.

Once in a while, academics will decide to shake up a curriculum. We went through this process in the mid-2000s in order to try to sort out some problems we were having delivering our courses. In a nutshell: too many required courses, which didn’t give us a lot of flexibility in terms of teaching assignments and could be a problem when people were not available. One of the results of this curricular shakeup was the merging of the thermodynamics and kinetics courses. While this wasn’t exactly a do-over for me, it did require a lot of work to rearrange what I had, jettisoning about half of the material, resulting in something more-or-less like the book that was eventually published.

By this time, what I had was clearly a textbook. Not only did I have carefully constructed text laying out the ideas and key equations, but I had a large collection of problems from assignments and tests which I had been integrating into the book over the years. Sometime in 2009, I started looking for a publisher. One of the publishers I contacted was Cambridge University Press. Some decades earlier, they had published Morris’s A Biologist’s Physical Chemistry, a book that I thought had a lot in common with mine. Much to my delight, Cambridge agreed to be my publisher. They provided lots of support and advice along the way. The final version of the book was sent to Cambridge in May 2011.

I started working on the book (although I didn’t know it at the time) in 1995 and was done in 2011. There was progress on this book every year during that period, albeit not always at the same intensity. You can think of A Life Scientist’s Guide to Physical Chemistry as a book written at an adagisimmo tempo over a period of 16 years.

Staccato andante: Nonlinear Dynamics: A Hands-On Introductory Survey

This book also started out as a set of lecture notes, in this case for a graduate course in nonlinear dynamics that I first taught in 2004, with a second offering in 2005. These notes were posted to my web site, where they still live. And that’s where they sat for a long time.

In 2018, I received an email from Nicki Dennis, who at the time was an acquisitions editor for the Institute of Physics (IOP) Concise Physics series. Nicki had somehow run across the lecture slides (just slides, not notes) for my Foundations of Chemical Kinetics course on my web site, a course that had been offered just once, in 2012. (At small universities, what you teach and when you teach it has more to do with what the Department and the students need than with what might be optimal for the instructor. Advanced courses in particular can be taught at very long intervals.) She thought I might want to turn this course into a book. I didn’t say no, but I knew that turning the Foundations course into a book would be a lot of work, especially since I was keen to revise the course after offering it once. As I always tell junior faculty members, the second time you offer a course is often when you put the most work into it because by then, you actually know what you want to do. Getting back to the story, I didn’t say no, but I didn’t say yes. Instead, I pitched Nicki the idea that I would turn my nonlinear dynamics lecture notes into a book, and that we could talk about the Foundations of Chemical Kinetics book later. Nicki and the IOP agreed, and I got to work revising my notes and turning them into a short book. That process took a very short time. I added some examples to my notes, expanded the treatment in a few places, converted assignments from the course into problem sections in the book, and in just a few months I was done.

You can think of this one as a stop-and-start (staccato) time investment, with each period of work on the book lasting just a few months. So a book produced andante, even though there is a long period from the first time I set fingers to keyboard to the completion of the manuscript.

Presto: Foundations of Chemical Kinetics: A Hands-On Approach

In May 2021, I received an email from John Navas at the IOP, who mentioned the possibility of a new edition of the nonlinear dynamics book. I took this as an opportunity to bring up the kinetics book again, since I was going to be teaching my Foundations course the following Fall term. I had in mind a complete reworking of the course with, as the title of the book suggests, more hands-on instruction and exercises than existing textbooks in the area provided. So my utterly daft plan was to write the book as the course was unfolding (with some work done in the summer to get ahead of the lectures). I would feed the chapters to the students as they were completed.

If you’ve ever taken a single-term University course, you will know that the term goes by quickly. It feels even faster for the course instructor, and not just because we’re older. The students quickly caught up to the little bit of a head start I had built up in the summer, and then the chapters were coming out just before we covered the material in class, and eventually a little bit after. But I got through it! The result of this initial round of writing was, as you can imagine, not very polished, but over a few months, I cleaned it up, and now it’s out into the world!

Definitely a book written at a presto tempo. Perhaps even vivace. I’m not sure I would recommend writing a book this way to anyone else. But it’s doable, provided you allow a few months afterwards to clean up your first draft.

Some reflections

I’m sure there are many other ways to write books, but these three definitely span the range of timescales over which one might write something worth reading: slowly refined over many years, written and refined over multiple short bursts, or the strike while the iron is hot approach of Foundations in Chemical Kinetics. In the end, I think that there are a few keys to writing a book that all of these different scenarios share:

You won’t write anything if you don’t actually sit down and start typing. Perhaps you don’t even intend to write a book, but anything you type and preserve is potentially material for a book, even if it’s just a set of lecture slides, or some original problems that you designed for your students.
It may not be when you first start out, but you eventually need to develop a clear concept of the book, who it’s for, what approach you will take, and the style you intend to use. My books tend to use an informal style and, as the titles of my two most recent books suggest, to include a fair bit of hands-on practice. I’m particularly keen on teaching students computing skills which, weirdly at this point in the 21st century, is often a neglected dimension of their educations. Both the Nonlinear Dynamics and Foundations of Chemical Kinetics books include instruction in some general computational skills (e.g. programming in Matlab/Octave or symbolic computing in Maple) and some instruction in discipline-specific software (Xppaut in one case, Gaussian in the other).
At some point, you need to decide that you’re ready to crank out a book. When you contact a publisher, they’re looking for something they can publish sooner rather than later. At that point, you need to be able to set time aside to meet mutually agreed deadlines. The more is already done, the better shape you will be in to deliver. And note that they will generally want to see sample chapters before they offer you a contract, unless they already have a relationship with you.
There are going to be some long nights, no matter what your starting point.

In praise of the late H. T. Banks

H. Thomas Banks is one of those people I wish I had had a chance to meet. Unfortunately, he died December 31st of last year, so that won’t be happening. Given that I greatly admired his work, and on the assumption that some young scientists read this blog, I thought I would say a few words about some of Banks’ papers that I particularly enjoyed.

H. T. Banks, for those of you who may not have heard of him, was an outstanding applied mathematician. He had wide interests, but most interesting to me was his extensive work on delay-differential equations, given my own interest in the subject.

The first Banks paper I read was a 1978 joint paper with Joseph Mahaffy on the stability analysis of a Goodwin model. Looking for oscillations in gene expression models was a popular pastime in those days. In some ways, it still is. This paper stood out for me as a careful piece of mathematical argument showing that a certain class of models could not oscillate. The paper also contained a solid discussion of the biological relevance of the results. Discovering oscillations in a model may be fun for those of us who enjoy a good bifurcation diagram, but most gene expression networks probably evolved not to oscillate. How much of that lovely discussion was due to Banks, and how much to Mahaffy, I cannot say. But a lot of Banks’ work was just as careful about the relevance of the results to the real world.

Much more recently, Banks was involved in a lovely piece of mathematics laying down the foundations for sensitivity analysis of systems with delays, particularly for sensitivity with respect to the delays. Sensitivity analysis is a key technique in a lot of areas of modelling. The basic idea is to calculate a coefficient that tells us how sensitive the solution of a dynamical system is to a parameter. There are many variations on sensitivity analysis, which you can read about in a nice introductory paper by Brian Ingalls. The Banks paper provided a basis for doing this with respect to delays, and was a key foundation stone for our work work on this topic.

Some years ago, we developed a method for simulating stochastic systems with delays. Our intention was for this method to be used to model gene expression networks. I was therefore pleased and surprised when I discovered that Banks had used our algorithm to study a pork production logistics problem. That just shows what an applied mathematician with broad interests can do with a piece of science developed in another context. Banks and his colleagues went a bit further than just studying one model, looking a models with different treatments of the delays, and finding that these led to different statistical properties, which would of course be of great interest if you were trying to optimize a supply chain.

The few examples above show a real breadth of interests, both mathematically and in terms of applications. You can get an even better idea of how broad his interests were by scanning his list of publications. There are papers there on control theory, on HIV therapeutic strategies, on magnetohydrodynamics, on acoustics, … Something for just about every taste in applied mathematics. There is a place for specialists in science, but often it’s the people who straddle different areas who can make the most important contributions by connecting ideas from different fields. I think that Banks was a great example of a mathematician who cultivated breadth, and was therefore able to have a really broad impact.

So I’m really sorry I never got to meet H.T. Banks. I think I would have enjoyed knowing him.

(If you’re wondering why I’m so late with this blog post: I found out about Banks’ passing from an obituary in the June SIAM News, which because of the pandemic I didn’t get my hands on until about a month ago.)

50 years of Physical Review A

In the beginning, there was the Physical Review, and it was good. So good in fact that it soon started to grow exponentially. At an event celebrating the 100th anniversary of the Physical Review in 1993, one unnamed physicist quipped that “The theory of relativity states that nothing can expand faster than the speed of light, unless it conveys no information. This accounts for the astonishing expansion rate of The Physical Review” (New York Times, April 20, 1993). (At the risk of sounding like Sheldon Cooper, if this physics joke went over your head, this post is probably not for you.) As a result of the rapid growth of the Physical Review, in 1970, it was split into four journals, Physical Review A, B, C and D. One factor that drove this split was that many scientists had personal subscriptions to print journals at that time. (I still have one, although not to a member of the Physical Review family.) In its last year, the old Physical Review published 60 issues averaging over 400 pages each. That’s another 400-page issue roughly every 6 days. Most of the material in each issue would have been completely irrelevant to any given reader. You can imagine the printing and shipping costs, the problem of storing these journals in a professor’s office, not to mention the time needed to identify the few items of interest in these rapidly accumulating issues. So splitting the Physical Review, which in some sense had started in 1958 when Physical Review Letters became a standalone journal, was perhaps inevitable.

The new journals spun out of the Physical Review were to be “more narrowly focused”, which is, of course, a relative thing. Four journals were still to cover the entire breadth of physics. Each of the sections was correspondingly broad: PRB covered solid-state physics, C covered nuclear physics, D covered particles and fields, and Phys. Rev. A covered… everything else: the official subtitle of PRA at the time was “General Physics”, which included atomic and molecular physics, optics, mathematical physics, statistical mechanics, and so on.

Physical Review A, now describes itself as “covering atomic, molecular, and optical physics and quantum information”, other topics having over time been moved out to other journals. Physical Review E in particular was split out from PRA in 1993 to cover “Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics”. (That description has changed over the years as well, as the process of splitting and redefining journal subject matter continues. PRE is now said to cover “statistical, nonlinear, biological, and soft matter physics”. Physical Review Fluids was born in 2016 to pick up some of the material that would formerly have been in PRE.) Despite the evolution of PRA, one thing that hasn’t changed is that it has been an important journal for chemical physics right from the day it was born. This year marking the 50th anniversary of Physical Review A, and given that I trained in chemical physics at Queen’s and at the University of Toronto, I thought it would be a good time for me to write a few words about this journal. As with all of my blog posts, this will be a highly idiosyncratic and personal history.

I thought it would be fun to start by looking at the contents of the very first issue of PRA. Atomic and molecular physics featured prominently in this issue, with several papers either reporting on the results of theoretical calculations, or on the development of computational methods for atomic and molecular physics. Interestingly, the entire issue contained just one experimental paper. I suspect that this is an artifact of the period of time in which this first issue appeared. The atomic and molecular spectroscopy experiments that could be done using conventional light sources had mostly been done, and lasers, which would revolutionize much of chemical physics in the decades to follow, were not yet widely available in physics and chemistry laboratories.

One of the things that struck me on looking at this first issue is how short papers were in 1970. Excluding comments and corrections, the first issue contained 27 papers in 206 pages, so the average length of a paper in this issue was just under 8 pages. The papers in the first issue ranged from just 2 pages to 16. Eleven of these papers ran to four pages or less. And remember, Physical Review Letters was spun out more than two decades earlier, so there was already a venue for short, high-priority communications. Other than in letters journals like PRL, we don’t see many short papers anymore, and even in PRL, two- or three-page papers are a rarity. The “least publishable quantum” has grown over time, and the ease with which graphics can be generated has resulted in an explosion of figures in modern papers. I suspect, too, that concise writing isn’t as highly valued now as it was in 1970.

As is often the case in anniversary years, Phys. Rev. A has created a list of milestone papers. This list includes several classic papers on laser-cooling of atoms, a technique for obtaining ultra-cold atoms in atom traps, i.e. atoms very close to their zero-point energy within the trap. Because this almost entirely eliminates thermal noise, this technique allows for very high precision spectroscopic measurements, and therefore for very sharp tests of physical theories. Interestingly, in ion traps, the mutual repulsion of the ions causes them to crystallize when they are sufficiently cooled, which was the topic of one of my two papers in Phys. Rev. A.

The list of milestone papers also includes Axel Becke’s classic paper on exchange functionals with correct asymptotic behaviour. I have mentioned Becke’s work in this blog before, in my post on the 100 most-cited papers of all time, a list on which two of his papers appear. And as I mentioned there, Axel Becke was the supervisor of my undergraduate senior research project, resulting in my first publication, which also appeared in Phys. Rev. A. If you pay any attention at all to lists of influential papers and people, Axel’s name keeps popping up, and not without reason. He has been one of the most creative people working in density-functional theory for some decades now. Interestingly, Axel has only published three times in PRA, and I’ve just mentioned two of those papers. (Axel’s favourite publication venue by far has been The Journal of Chemical Physics.) His only other paper in PRA, published in 1986, was on fully numerical local-density-approximation calculations in diatomic molecules.

Many beautiful papers in nonlinear dynamics were published in Phys. Rev. A before the launch of Phys. Rev. E. I will mention just one of the many, many great papers I could pick, namely a very early paper on chaotic synchronization by Pecora and Carroll. Chaotic synchronization, which has potential applications in encrypted communication, became a bit of a cottage industry after the publication of this paper. I believe that the Pecora and Carroll paper was the first to introduce conditional Lyapunov exponents, which measure the extent to which the response to chaotic driving is predictable.

Currently, my favourite Phys. Rev. A paper is a little-known paper on radiation damping by William L. Burke, from volume 2 of the journal. This is a wonderful study in the correct application of singular perturbation theory that also contains a nice lesson about what happens when the theory is applied incorrectly. If you teach singular perturbation theory, this might be a very fruitful case study to introduce to your students.

I could go on, but perhaps this is a good place to stop. PRA has been a central journal for chemical physics throughout its 50 years. While PRE picked up many topics of interest to chemical physicists, PRA remains a key journal in our field. Until the Physical Review is reconfigured again, I think it’s safe to say that PRA will continue to be a central journal in chemical physics.

SIAM Review 60th volume

This year marks the publication of the 60th volume of the venerable SIAM Review. As has become traditional when journals mark anniversaries, the editors of SIREV have compiled a list of the journal’s 10 most read articles. These lists are always interesting, both for what shows up and for what is missing (from my purely subjective point of view).

Number 1 on the list is a modern classic, The Structure and Function of Complex Networks by Mark Newman. At the time this paper appeared in 2003, network science was just getting hot. Newman’s review, which laid out all of the foundational ideas of the field in a very clear way, quickly became the standard reference for definitions and basic results about various kinds of networks. It didn’t hurt that Newman had recently made a splash in the scientific community by analyzing scientific collaboration networks: given that everyone’s favorite topic is themselves, scientists were naturally intrigued by a quantitative study of their own behavior. All kidding aside, Newman’s SIAM Review article has been hugely influential. All kinds of networks have been analyzed using these methods, ranging from social networks to protein interaction networks. As if having the number 1 paper in this list wasn’t enough, Newman is also a coauthor of the 2009 paper Power-Law Distributions in Empirical Data, which is number 6 on the list. The latter paper deals with statistical methods for determining whether or not a data set fits a power-law distribution.

Desmond Higham has the singular distinction of having two singly authored papers on this list, both of them from the Education section of SIAM Review, but both wonderful introductions to their topics for young scientists, or for old scientists who need to learn new tricks. At number 3 on the list, we have An Algorithmic Introduction to Numerical Simulation of Stochastic Differential Equations, which presents the simplest introduction to stochastic differential equations I have ever had the pleasure to read. Then at number 4, we have Modeling and Simulating Chemical Reactions. In the latter, Higham walks us through three levels of description of chemical equations, as Markov chains in the space of species populations, then using the chemical Langevin equation, and finally in the bulk mass-action limit. He derives each method from the preceding one, essentially by focusing on computational methods for simulating them, and then showing that these methods simplify as various assumptions are introduced. I think that these two papers of Higham’s have been successful not only because of his exceptionally clear writing, but because he also provided Matlab code for all his examples. The interested reader can therefore go from reading these papers to doing their own calculations rather quickly. I learned a lot from these papers myself, and I’ve used both of them in a graduate course on stochastic processes. They’re just fantastic resources.

One paper that didn’t appear, and that I had guessed would be there before I looked at the list, is the classic 1978 paper Nineteen Dubious Ways to Compute the Exponential of a Matrix by Cleve Moler (original developer of Matlab, and founder of MathWorks, the company that sells Matlab) and Charles Van Loan (author with the late Gene Golub of the book Matrix Computations, known by people in numerical analysis simply as “Golub and Van Loan”). It’s possible that it didn’t make the list because an expanded version of the original was published in the SIAM Review in 2003, and that this paper’s reads are therefore split between the two versions. However, it’s still a surprise. This is one of those papers that is often mentioned, in part I’m sure because of its mischievous (if accurate) title, but also because it discusses an important problem—matrix exponentials show up all over the place—and does so with exceptional clarity.

There have been lots of papers on singular perturbation theory and the related boundary-layer problems in the SIAM Review over the years, which is perhaps not surprising given how central these methods are to a lot of applied mathematics. In fact, in 1994, the SIAM Review published an issue that contained a collection of papers on singular perturbation methods. I would have thought that at least one paper on this topic would have made the list. My all-time favorite SIREV paper is in fact Lee Segel’s Simplification and Scaling, which I routinely assign as reading to graduate students who need an introduction to the basic ideas of singular perturbation theory, followed closely by Lee Segel and Marshall Slemrod’s The Quasi-Steady-State Assumption: A Case Study In Perturbation, which derives the steady-state approximation for the Michaelis-Menten mechanism using the machinery of singular perturbation theory. The full power of these methods is made evident when they derive a more general condition for the validity of the steady-state approximation than had previously been obtained. The late Lee Segel was one of the great pioneers of mathematical biology. He worked on every important problem in the field, from oscillators to pattern formation, and left us some beautiful applied mathematics. He also left us an absolutely wonderful book, Mathematics Applied to Deterministic Problems in the Natural Sciences, coauthored with Chia-Chiao Lin, who has sadly also left us. Marshall Slemrod is, fortunately, still very much alive. Marshall is probably best known for his elegant work in fluid dynamics, but he has worked on quite a variety of problems in applied mathematics over his long and distinguished career.

It’s interesting to compare SIAM list of “most read” papers to the most cited papers from SIREV (Web of Science search, Oct. 8, 2018). Here they are:

Mark Newman’s The Structure and Function of Complex Networks, cited 8333 times, more than twice as often as any other paper published in SIREV. No great surprise there.
Fractional Brownian Motions, Fractional Noises and Applications by Benoit Mandelbrot and John van Ness (3554 citations). Perhaps this one should have been on my radar, although I’ll admit that I have never read it. I’ll put it on my reading list now.
Power-Law Distributions in Empirical Data, Newman’s other entry on the most-read list, which interestingly comes out much higher in the most-cited ranking that in the most-read list, where it occupies the number 6 spot, with 2885 citations.
Semidefinite Programming by Lieven Vandenberghe and Stephen Boyd (2086 citations)
Tensor Decompositions and Applications by Tamara G. Kolda and Brett W. Bader (2042 citations, number 2 on the most-read list)
Analysis of Discrete Ill-Posed Problems by Means of the L-Curve by Per Christian Hansen (1870 citations)
The Mathematics of Infectious Diseases by Herbert W. Hethcote (1813 citations)
Atomic Decomposition by Basis Pursuit by Scott Shaobing Chen, David L. Donoho, and Michael A. Saunders (1647 citations)
On Upstream Differencing and Godunov-Type Schemes for Hyperbolic Conservation Laws by Amiram Harten, Peter D. Lax, and Bram van Leer (1493 citations). This is the kind of paper we often see on most-cited lists because it discusses practical issues in the numerical solution of PDEs.
Mixture Densities, Maximum Likelihood and the EM Algorithm by Richard A. Redner and Homer F. Walker (1256 citations)

It’s interesting, and perhaps a little surprising, how little overlap there is between the most-read and most-cited lists. Just three papers show up on both lists! This is another manifestation of the well-known problem of trying to use any single metric to determine the influence of a paper.

There are other SIREV papers that I really love, even though I wouldn’t have expected them to make this list, sometimes because of their real-world applications, and sometimes just because they describe very clearly some beautiful applied mathematics.

Bryan and Leise’s The $25,000,000,000 Eigenvector: The Linear Algebra behind Google, explains the mathematics behind the Google search engine. It’s both a great educational article on large, sparse matrix eigenvector calculations, and an interesting peek into the workings of one of the most important technologies of our time.

James Keener’s article on The Perron-Frobenius Theorem and the Ranking of Football Teams is a great read, and a fun way to introduce students to the powerful Perron-Frobenius theorem. James Keener has been one of the leading figures in mathematical biology over the last several decades, and is the author, with James Sneyd, of the highly regarded textbook Mathematical Physiology.

I also really enjoyed Diaconis and Freedman’s Iterated Random Functions, which describes some lovely mathematics that connects together Markov chains and fractals, among other things. Persi Diaconis is perhaps best known for his analysis of card shuffling and other games of chance. In fact, another paper of his in the SIAM Review (with Susan Holmes and Richard Montgomery) on Dynamical Bias in the Coin Toss is also a fantastic read.

I could go on, but I think I’ll stop here.

You may have noticed some recurring themes in this post. One is that there is some great writing in the SIAM Review. In fact, I would say that this is a hallmark of SIREV. Regardless of the author or topic, the final published paper always seems to be a great piece of scientific literature. Of course, I might be a little bit biased, having published a Classroom Note in the SIAM Review myself. Another theme of this post is the number of outstanding scientists who have written for SIREV. SIREV makes room for up and comers, but it also regularly gives us the benefit of reading papers by people who have spent decades deepening their knowledge of their respective areas.

So happy birthday, SIAM Review, and many happy returns!

What exactly do you mean by “stable”?

Stability is a highly context-dependent concept, and so it often leads to confusion among students, and sometimes among professional chemists, too.

If I say that a certain molecule is “stable”, I might mean any of a number of things:

It’s possible to make it, and it won’t spontaneously fall apart.
It’s possible to isolate a pure sample of the substance.
It won’t react with other things. This is often qualified, for example when we say that something is “stable in air”.

The trick is to pick up which one is meant from context. A recent example arose on a test question in my Chemistry 2000 class, where I asked, in a question on molecular orbital (MO) theory, if argon hydride, ArH, is a stable molecule. In this case, the “context” was in fact a lack of context: I simply asked about the stability of this molecule, without any mention of holding it (the isolable substance definition) or of bringing it into contact with anything else. Thus, I was relying on the first definition of stability. Unexpectedly, simple pen-and-paper MO theory predicts that ArH has a bond order of ½, and so is predicted to be stable, although clearly not by much. This ought to be quite a surprise to anyone who has studied chemistry since we normally think of noble gases like argon as being quite unreactive (stable in the third sense), and so unlikely to form compounds. And when we do get compounds of noble gases, they are usually compounds with very electronegative elements such as fluorine. Moreover, ArH would violate the octet rule. Students do run across non-octet compounds from time to time, but the octet rule is deeply ingrained from high school. Finally, ArH would be a radical, and students are often taught to think that radicals are “unstable”, in the sense that they are highly reactive.

As it turns out, the simple MO theory we learned in class is sort of right: excited states of argon hydride are stable enough to be studied spectroscopically—in fact the first such study was carried out at Canada’s National Research Council by JWC Johns¹—but the ground electronic state is unstable in the first sense: it dissociates into H and Ar atoms. So our chemical instinct is right about this compound, too. Welcome to the nuances of chemistry.

For the sake of argument, suppose that ArH had a stable ground electronic state, as predicted by simple MO theory. It would fail to be stable in the second sense because the meeting of two ArH molecules would result in the energetically favorable reaction 2 ArH → 2 Ar + H₂. And of course, ArH would react with a great many substances. In fact, we could think of this compound as a source of hydrogen atom radicals.

Before we move on from ArH, let’s talk about some of the reflexes that would have led us to predict it to be unstable. The fact that a material is normally unreactive doesn’t mean it won’t form a compound with something else under the right conditions. If I want to make ArH, I won’t try to react argon with hydrogen molecules because the atoms in H₂ are held together by a strong bond, so it would be energetically unfavourable to swap that bond for an Ar-H bond. I will need a source of hydrogen atoms. If I do expose argon atoms to hydrogen atoms, the very reactive radical hydrogen atoms may well react with the normally unreactive argon, which is in fact what happens. But none of that is directly relevant to the question of the stability of the ArH molecule. If I ask about that, I just want to know if the thing will hold together assuming it has been made.

The octet rule is deeply embedded into the psyches of anyone who has studied chemistry. It is, indeed, an excellent rule of thumb in many, many cases, especially in organic chemistry. But students are soon exposed to non-octet compounds, so clearly the octet rule is not an absolute. And yet we often hear people talk about an octet as being a “stable electronic configuration”. There’s that word again! But what do people mean when they say that? The answer is, again, highly dependent on context. In s- and p-block atoms, an octet fills a shell, and so the next available atomic orbital is quite high in energy, and it will likely be energetically unfavourable to fill it. In molecules, the octet rule just happens to often result in electronic configurations with an excess of bonding over antibonding character, so they are stable in the first sense. And because eight is an even number, the resulting molecules often have all of their electrons paired, so they are less reactive than they might have been if they had an odd number of electrons. But you may recall that oxygen, on which more below, has two unpaired electrons, even though its Lewis structure satisfies the octet rule. We should always remember then that it’s the octet rule, and not the octet law. Arguing that something is especially stable because it has an octet is just not a very good explanation. Now having said that, the octet rule generally holds for compounds from the second period, largely because trying to add more electrons to these small atoms is energetically unfavourable. But even that is a contingent statement since it depends on where those electrons are coming from and whether they have anywhere else to go. Certainly, you can measure an electron affinity for many molecules with octet-rule structures.

As for the argument that radicals are “unstable” (which you will hear from time to time), it’s not true. Many radicals are very reactive. But a great many radicals are stable in the first and often in the second sense, too. This includes many of the nitrogen oxides, notably nitric oxide, which is stable enough to serve as a neurotransmitter, and can be stored in a gas cylinder, but is conversely reactive enough to be used as part of your body’s immune response. Again we see that stability and reactivity do not necessarily coincide, even though the word “stability” is sometimes used in the sense of “reactivity”.

Of course, ArH is an extreme, and NO is not a terribly familiar compound to most of us, even though our bodies make it. So let’s talk about a more mundane molecule. Oxygen has not one but two unpaired electrons. So despite its Lewis diagram, oxygen is a radical. Nevertheless, oxygen is certainly stable in the first and second senses. There are lots of oxygen molecules in the atmosphere, and they don’t just fall apart on their own. (They do fall apart if supplied with enough energy, for example in the form of an ultraviolet photon, but that is another question altogether.) You can store oxygen in a gas cylinder, so it is certainly isolable. But oxygen is highly reactive, in part because of its unpaired electrons, at least towards some substances and in some circumstances. It’s a fairly strong oxidizing agent for example. Many metals, if left standing in air, will become coated very quickly in a layer of their oxide. And if provided with a little heat, oxygen will react vigorously with many materials. We call these reactions of oxygen “fire”.

The very different meanings of “stable” mean that we have to think when we hear this word. Ideally, we would also banish the third meaning mentioned above in favour of more specific language, such as “reactive towards”. Conflating questions of stability and reactivity just makes it harder to think precisely about what we mean when we say that a molecule or substance is stable.

References:
¹J. W. C. Johns (1970) A spectrum of neutral argon hydride. J. Mol. Spectrosc. 36, 488–510.

How Eugene Garfield (1925–2017) changed the lives of working scientists

Eugene Garfield died Feb. 26th. Ever heard of him? No? And yet, he has had a huge influence on how scientists work today. Garfield is the person who brought to life the Science Citation Index, which you may know as the Web of Science. This has allowed us to efficiently search the literature forward in time, and has also spawned a cottage industry in trying to measure the impact of scientific studies, of the journals they were published in, and of the scientists who carried them out. One way or another, if you’re a scientist, Garfield has changed your career.

Garfield started out as a chemist, but by his own account, he wasn’t very good at it. One thing led to another, and he ended up creating a citation index for science. You can read the story elsewhere. It’s interesting, but I want to talk about his impact on science and scientists.

The practical impact of the Science Citation Index and of its modern descendant, the Web of Science, has been enormous. Have you found a key paper in your field? By doing a citation search, you can find out how people followed up the original idea. If you’re not already doing citation searches regularly to find relevant material in the literature, you are living in a state of sin. If you don’t know how, run to your university library, and ask a friendly librarian to show you. This is an indispensable skill for a scientist, and one that you can learn in just a few minutes.

Once we had a citation index, it became easy to count the number of citations a paper, or a scientist’s total output, was getting. And eventually some bright spark decided that counting citations was a good way of deciding how important a scientist’s work is. Citation counting is a tricky business because citation rates are affected by a whole host of non-scientific factors, including different cultures in different disciplines. Still, used wisely, citation statistics can help round out the picture when trying to assess a scientist’s work, especially if we allow a paper to mature a bit before we start counting.

At some point, Garfield hit on the idea of trying to use citation data to measure the impact of journals, and thus was born the Journal Impact Factor. The Impact Factor is the number of citations to articles in a journal from a two-year window divided by the number of articles published in that period, so it’s essentially an average number of citations per article in a very narrow window of recent time. Although the impact factor was intended mostly as a tool for librarians to use in allocating resources, it has been widely abused as a proxy for journal quality. If you ever apply for a grant, someone is bound to look at the impact factors of the journals you have published in to try to assess how important your work is. Yes, that’s right, if you get into the right journal, you can bask in reflected glory. On the other hand, if you publish in a journal with a small impact factor or, heaven forbid!, a journal that isn’t indexed in the Web of Science and that therefore doesn’t have an impact factor, then you really have to track other measures of quality carefully because people will automatically assume the work is of lesser quality. Garfield hated this way of using impact factors, but sometimes you just can’t control the monster you created.

So whether you’re searching the literature or applying for money, you’re living in Eugene Garfield’s rather large shadow. A giant has passed. We owe him a great debt. If you want to honor his legacy, use citation searches and teach them to others, and try to make sensible use of citation statistics and impact factors.

A possibly useful way to use impact factors

Journal impact factors are much abused. Originally developed to help librarians make rational decisions about subscriptions, they are increasingly used to judge the worth of a scientist’s output. If we can place a paper in a high-impact-factor journal, we bask in the reflected glory of those who have gone before us, whether our paper is really any good or not. On the other hand, if we publish in lower-impact-factor journals, it’s guilt by association.

If you write grant applications, or have to apply for tenure or promotion, someone is likely to look at the impact factors (IFs) of the journals you have published in, particularly if the papers were published relatively recently and haven’t had the time to accumulate many citations. They are particularly likely to do that if they aren’t experts in your field, and aren’t sure about the quality of the journals you publish in. Like it or not, you are going to have to face the IF gauntlet. The problem is that IFs vary widely by field. What you need to do is to provide some perspective to the people reading your file so that they don’t assume that the standards of their field apply to yours.

I recently reviewed a grant application whose author found a nice way to address this issue: Each journal in the Thomson-Reuters database is assigned to one or more categories based on the area(s) of science they cover. For each category, the Journal Citation Reports provides a median impact factor as well as an aggregate impact factor, the latter being the impact factor you would calculate for all the articles published in the journals concerned as if they came from a single journal. If you want to put the impact factor of a particular journal in perspective, what you do is that you compare that impact factor either to the median or to the aggregate impact factor for the category (or categories) that the journal belongs to.

If you’re going to do this, I would suggest that you, first, be consistent about which statistic you use and, second, give this statistic for all the categories that a given journal belongs to. This will avoid accusations that you are cherry-picking statistics.

For example, my most recent paper was published in Mathematical Modelling of Natural Phenomena (MMNP), a journal with an impact factor of 0.8, which doesn’t seem impressive on the surface. This journal has been classified by Thomson-Reuters as belonging to the following categories:

Category	Median IF	Aggregate IF	Quartile
Mathematical & Computational Biology	1.5	2.5	4
Mathematics, Interdisciplinary Applications	1.1	1.5	3
Multidisciplinary Sciences	0.7	5.3	2

This, I think, puts Mathematical Modelling of Natural Phenomena in perspective: It’s not a top-of-the-table journal, but its 0.8 impact factor isn’t ridiculously small either.

A closely related strategy would be to indicate which quartile of the impact factor scale a journal belongs to in its category. This information is also available in Journal Citation Reports, and I have provided these data for MMNP in the table above.

The main point I’m trying to make is that, if at all possible, you should provide an interpretation of your record and not let others impose an interpretation on your file. If you are in a position to fight the IF fire with fire, i.e. with category data from the Journal Citation Reports, it may be wise to do that.

All of that being said, some of the statistics for MMNP shown above demonstrate how crazy IF statistics are. If we look at the quartile placement of this journal in different categories, they range from the 2nd quartile, which should be suggestive of a pretty good journal, to the 4th, which makes this journal look pretty weak. In an ideal world, I would not suggest that you include such flaky statistics in your grant applications. But we don’t live in an ideal world. Referees and grant panel members discuss IFs all the time, so if it happens that you can tell a positive story based on the analysis of IFs, it’s just smart to do so.

How should we decide whether or not to accept a peer-review invitation?

In a recent commentary published in the journal Science and Engineering Ethics, José Derraik has proposed two criteria for deciding whether one should accept a peer-review invitation. Quoting directly from his article, these are

If a given scientist is an author in x manuscripts submitted for publication in peer-reviewed journals over y months, they must agree to peer-review at least x manuscripts over the same y months.
The perceived status of the journal requesting input into the peer-review process must not be the primary factor affecting the decision to accept or decline the invitation.

As a member of the editorial board of a small open-access journal that is trying to do some good in the world, BIOMATH, I fully concur with Derraik’s second point. If someone has submitted a paper in good faith to a scientific journal, and that journal is seeking expert advice on the quality of the paper, that advice should not be withheld without good reason. Prestige of the journal shouldn’t even be a consideration. I’m not talking about shady journals here, and in any event, the shady journals don’t typically look for peer reviewers.

I also have some sympathy for Derraik’s first point. We all receive too many requests to referee papers. At some point, you have to decide that you have done enough. I’m not sure about the simple equality between published papers and refereed papers that Derraik suggests. I think this is likely to lead to an undersupply of qualified referees. His argument relies on the fact that most papers have multiple authors, but at least in the fields I follow closely, most of those authors are students. While a student can co-referee a paper with a senior scientist as a training exercise, the senior scientist still has to take primary responsibility for the review. In order for the system to work properly, I suspect that most of us have to referee twice or three times as many papers as we write. The multiplier might be smaller (perhaps as small as 1) for people who write a lot of papers with many coauthors, but those folks are outliers. Nevertheless, I think Derraik is right that there has to be some proportionality between output and contribution to refereeing.

I think there’s another principle that we should add to Derraik’s list:

3. If you can’t think of very many alternative referees who are as qualified as you are to review the submission you have received, then you should accept the invitation.

This happens more often than you might think. Authors suggest referees based on the people they know in the field doing similar work. Editors similarly work hard to match the paper with appropriate referees, so it does happen fairly often that you’re the ideal referee for something you have received. In those cases, you should assume your responsibilities and do the work if it’s at all possible.

The flip side of Derraik’s list, which he doesn’t tackle directly, is the question of when you should refuse a referee assignment. To me, it comes down to a few things:

I do consider whether I have been doing too much refereeing lately. There is only so much time, and at some point you need to write papers rather than read other people’s stuff all the time.
I always ask myself if I can easily think of other qualified referees. If the answer is yes, I’m more inclined to decline the invitation. That doesn’t mean I automatically decline such invitations, only that I worry less if I feel I have to decline based on other considerations. And of course, I always pass along a list of potential referees to the journal when I do decide to decline an invitation on this basis.
Sometimes, you receive papers you’re just not that qualified to review. Then you should definitely turn down the invitation.
On occasion, you receive something and realize that other time commitments will make it impossible for you to complete the refereeing assignment in a reasonable span of time. Note that journals increasingly request a return of referee reports on unreasonable timetables. (Two weeks? Get real!) I have to admit that I sometimes turn down refereeing requests because the journal is proposing unreasonable timelines. I simply refuse to jump just because somebody says so. In other cases, I ask the editor if he/she would be willing to receive a report within x weeks, where x is a value chosen to work around other commitments, with x typically less than or equal to 4. They almost always say yes to these requests. There are times though when I’m so busy that I really could not read the paper and return the report for many, many weeks. In these cases, it’s best to decline the invitation right away.

Refereeing papers is a largely thankless job (although you may want to check out Publons, which is working to change that). That doesn’t make it less important, but it does mean that you have to balance the time you put into that against other commitments. To me, the overriding consideration is expertise: Am I the right person for the job? If the answer is yes, and you’re not completely overwhelmed with other duties, you really should accept the assignment.

Farewell, Oktay Sinanoğlu (1935–2015)

I’m a long-time fan of Oktay Sinanoğlu. I use the word “fan” quite deliberately: I don’t think there’s any other way to describe my relationship to the man. We’ve never met, or even exchanged emails. But I read some of his papers in graduate school and was immediately drawn in. I was therefore sad when I learned recently that he had died. One more scientific hero I’ll never meet…

Sinanoğlu had a long and productive career at Yale. Nevertheless, he was almost certainly better known in Turkey, where he became something of a national hero, than in the Western world. His papers covered a very wide cross-section of theoretical chemistry, including electronic structure, atomic clusters, solvent effects on chemical reactions, spectroscopy, automated generation of synthetic pathways, irreversible thermodynamics, dissipative structures, graph theoretical methods for studying the stability of reaction networks, and model reduction methods. It was the latter two topics that attracted my attention to Sinanoğlu when I was a graduate student. They intersected nicely with my interests at the time, which revolved around the dynamical systems approach to chemical kinetics.

My main research interest at the time was model reduction. Sinanoğlu, with his student Ariel Fernández, was among the first people to consider the construction of attracting manifolds for reaction-diffusion systems.^1,2 This is a very difficult problem that is still a very active area of research. When I look back on the Fernández-Sinanoğlu papers on this topic, it seems to me that they anticipate later work on inertial manifolds.³ Because there weren’t many people following the field at the time, I don’t think that these papers are as well known as they deserve to be. Fernández and Sinanoğlu were just a bit ahead of their time. Had this work been published in the 1990s rather than the mid-1980s, I’m sure these papers would have received a great deal more attention.

Although I wasn’t working on these problems myself at the time, I became very interested in applications of graph theory in chemical kinetics while still a graduate student. It would be many years before I made any contributions to this topic myself, in association with my then-postdoc Maya Mincheva.^4–6 Among the papers I read way back then were a pair written by Sinanoğlu in which chemical reaction networks were conceptualized as graphs.^7,8 This allowed Sinanoğlu to enumerate all graphs corresponding to reactions with given numbers of reactions and species.⁷ A subsequent paper contained a conjecture about a topological feature of the graphs of chemical mechanisms capable of oscillations,⁸ thus attempting to tie together the structural features of his graphs and the dynamics generated by the rate equations. This is the theme we picked up many years later, although we followed a line of research initiated by Clarke⁹ and Ivanova¹⁰ rather than Sinanoğlu’s theory.

So, Oktay, thanks for inspiring a young graduate student. Rest in peace.

¹A. Fernández and O. Sinanoğlu (1984) Global attractors and global stability for closed chemical systems. J. Math. Phys. 25, 406–409.
²A. Fernández and O. Sinanoğlu (1984) Locally attractive normal modes for chemical process. J. Math. Phys. 25, 2576–2581.
³A. N. Yannacopoulos, A. S. Tomlin, J. Brindley, J. H. Merkin and M. J. Pilling (1995) The use of algebraic sets in the approximation of inertial manifolds and lumping in chemical kinetic systems. Physica D 83, 421–449.
⁴M. Mincheva and M. R. Roussel (2006) A graph-theoretic method for detecting potential Turing bifurcations. J. Chem. Phys. 125, 204102.
⁵M. Mincheva and M. R. Roussel (2007) Graph-theoretical methods for the analysis of chemical and biochemical networks. I. Multistability and oscillations in ordinary differential equation models. J. Math. Biol. 55, 61–86.
⁶M. Mincheva and M. R. Roussel (2007) Graph-theoretical methods for the analysis of chemical and biochemical networks. II. Oscillations in Networks with Delays. J. Math. Biol. 55, 87–104.
⁷O. Sinanoğlu (1981) 1- and 2-topology of reaction networks. J. Math. Phys. 22, 1504–1512.
⁸O. Sinanoğlu (1993) Autocatalytic and other general networks for chemical mechanisms, pathways, and cycles: their systematic and topological generation. J. Math. Chem. 12, 319–363.
⁹B. L. Clarke (1974) Graph theoretic approach to the stability analysis of steady state chemical reaction networks. J. Chem. Phys. 60, 1481–1492.
¹⁰A. N. Ivanova (1979) Conditions for the uniqueness of the stationary states of kinetic systems, connected with the structures of their reaction mechanisms. 1. Kinet. Katal. 20, 1019–1023.

The most-cited work of all time

In my last blog post, I discussed the list of the 100 most-cited papers of all time compiled by Thomson-Reuters for Nature to celebrate the 50th anniversary of the Science Citation Index. In the same Nature article, there is a brief mention of a similar list compiled by Google based on their Google Scholar database. Unlike the Thomson-Reuters/Science Citation Index (SCI) list, the Google list includes books. This is partly a byproduct of the way the two databases are structured—Thomson-Reuters has separate databases for journals and books while Google has a single database that includes journal articles, books and “selected web pages”¹—and, I suspect, partly a conscious choice by the Nature editors to focus on the most-cited papers. Certainly, the article focuses on the SCI list rather than the Google list which, as mentioned above, is different in composition. This provides us with an interesting opportunity to think a little harder about why things get cited and how we go about the business of counting citations and thereby trying to measure impact.

The most striking thing in the Google list is the number of books among the most highly cited work. 64 of the 100 most highly cited works are books, according to Google. Many of these books are technique-oriented, as one might expect from the kinds of papers that made the SCI list discussed in my last post. For example, the most highly cited book on Google’s list, and 4th most cited work in the overall list, is Molecular Cloning: A Laboratory Manual by Sambrook, Fritsch and Maniatis. The same book, but with a different permutation of authors (Maniatis, Fritsch and Sambrook), also shows up as number 15 on Google’s list. How can this be? This book has gone through a number of editions, with changing authorship. The book at #4 on Google’s list is the second edition, while #15 is the first edition. This highlights one of the key difficulties in compiling such a list: Books are often inconsistently cited, and changing editions pose a challenge in terms of combining or not combining citations. Since different editions with a simple permutation of authorship are actually an evolution of the same book, it seems to me that we should combine the citation counts for entries 4 and 15 (and later editions that don’t show up on this list as well). That would vault Molecular Cloning to #1 on Google’s list. If we take citations as a measure of impact, this book would be the most important scientific work of all time (so far). However, I think we can all agree that there is something funny about that statement. The number of citations indicates that this is clearly a very useful book, but it’s a compendium of methods developed by many, many other groups. It is highly cited because of its convenience. The original papers are not cited as often as this book (at least by Google’s count), but clearly it’s the original scientific work used by thousands of labs around the world that has had the impact, not this manual. So here we have a work that is very highly cited (and therefore, by any reasonable definition, important) but where it’s obvious that the very large citation count is not measuring scientific impact so much as utility as a reference.

The same sort of argument could be applied to scientific papers. Take, for example, the density functional theory papers discussed in my previous post. I would argue that the two papers by Walter Kohn in the SCI list have had more impact than any of the other DFT papers in this list since they enabled all the subsequent work developing the theory into practical methods. But they are not cited as often as some of the papers that describe functionals used in quantum chemical calculations. Citations therefore measure something—utility?—but it isn’t impact as I would understand the term.

There are some books on Google’s list that do describe original contributions to the literature. Among other things, there are those I would characterize as “big idea” books, in which new, influential ideas were first described. Number 7 on Google’s list is Thomas Kuhn’s The Structure of Scientific Revolutions. This is not a book that contains practical advice on carrying out particular experiments or particular types of calculations. It’s a contribution to the history and philosophy of science. But Kuhn’s ideas about the way in which science progresses have really struck a chord, so this book is cited a lot, across a wide range of fields, most of which have nothing to do with the history or philosophy of science.

The Google list also contains works from fields outside of hard-core science, which we don’t see in the Science Citation Index list. Thus, number 6 on Google’s list is Case Study Research: Design and Methods by Robert K. Yin, a book covering research methods used mostly in business studies. The Google list includes a number of other works from business studies, as well as from the social sciences. It’s sometimes useful to be reminded that “research” and “scientific research” are not synonymous.

But this is a blog about science, so back to science we go. An interesting question we could ask is how the books on Google’s list would have fared if they had been included in the Thomson-Reuters effort. To try to answer this question, I looked at another highly cited book, #5 in the Google list, Numerical Recipes by Press, Teukolsky, Vetterling and Flannery. Looking up citations to books in the Science Citation Index is not trivial. Because books don’t have records in the SCI database, there is no standard format for the citation extracted from citing papers. Moreover, people often make mistakes in formatting citations. Authors are left out, or the order of authorship is permuted. Additionally, people often cite a particular chapter or page when citing a book, and each of these specific citations is treated as a citation of a different work in the database. Anyhow, here’s what I did: I searched for citations to works by “Press W*” entitled “Num*”. This generated a list of 4761 cited works. This large number of distinct hits to the Numerical Recipes books makes it impossible to complete the search for citing articles. All we can tell is that there are more than 4761 citations to Numerical Recipes in the Web of Science database. In fact, the number must be much larger since it’s plain to see even from the small sample I looked at that some of the variations are cited dozens or even hundreds of times. But an accurate method of counting them in the Web of Science evades me.

Numerical Recipes is a bad case. There are many editions with slightly different titles (“in Fortran”, “in C”, etc.), the subtitle is sometimes included (“The Art of Scientific Computing”), multiple authors, and so on. Maybe if we try a book with one author and a limited number of editions? I then tried to do a citation search for Kuhn’s The Structure of Scientific Revolutions. Here, we find a different problem: The results are highly sensitive to details such as whether or not we include the word “The” from the title. And, although there are far fewer hits than for Numerical Recipes, there are still hundreds of them to sift through. Again, I’ve had to admit defeat: There does not appear to be a simple way to count citations to heavily cited books in the Web of Science.

Of course, citation counting is a tricky business at the best of times, and the problem afflicts both the Thomson-Reuters and Google Scholar databases. Errors in citations, which are fairly frequent, may deflate the citation count of a paper unless one is very careful about structuring the search. But beyond that, some papers are just hard to chase down in the database. Take Claude Shannon’s first of two classic papers published in 1948 on information theory, number 9 on the Google list, and nowhere to be found on the SCI list. It’s actually very difficult to find this paper in the Google database. I have found many lightly cited variants on this citation, but the version that Google Scholar reports as having been highly cited is actually a 2001 corrected reprint in the ACM journal Mobile Computing and Communications Review. It’s not clear to me that this is correct—has this paper really been cited more often than the 1948 original?—but then I’m not sure how Google’s database is structured. For the record, the Web of Science reports that the 1948 paper has been cited 9847 times, while the 2001 reprint has been cited 278 times. Quirks of a database can make the apparently simple act of counting citations tricky, all the more so for highly cited papers.

We all wish that we could quantify scientific output so that we could make better decisions about funding, prizes, and so on. It would sure make all of our lives much easier if this were possible. However, the problems that plague the apparently simple task of trying to round up and interpret a list of the most cited work—high citation rates for work that provides a convenient reference for an idea or technique but is not particularly original (books on methods, review papers), inconsistent database entries and citation errors—also affect citation counts for work that has accumulated a more normal number of citations. None of this is to deny the importance of a good book or review paper, nor are my comments intended to mean that there isn’t a clear difference in impact between two research papers in the same field whose citation counts differ by an order of magnitude. But there are enough subtleties in counting citations and in interpreting the results that I would not want to try to rank papers or scientists on this basis.

¹R. Vine (2006) Google Scholar. J. Med. Libr. Assoc. 94, 97–99.

Marc Roussel's blog

Department of Chemistry and Biochemistry, University of Lethbridge