Preparing for your thesis defense

The thesis defense. The very name suggests a confrontational event. And yet, in my experience, it doesn’t have to be. A well-prepared student should have nothing to fear from their defense. Indeed, it’s often more like a conversation among peers than like an examination. However, if you’re not well prepared, it can turn into a highly unpleasant grilling.

You will of course wonder how you can prepare for an event that seems, to many students, unpredictable. And yet, it’s not.

Know your thesis.

The examiners are going to ask you questions about your thesis. This sounds obvious, and yet it often seems as if students show up unprepared for this eventuality. You should be the foremost expert in the room on the material in your thesis, with the possible exception of your supervisor, who ought to be on your side. (And if he or she is not, then you have problems that predate your defense. But that might be a topic for a future blog post.) A fairly common question, usually prompted by genuine curiosity about a passage an examiner didn’t quite understand, runs along the lines of “You wrote such and such on page x. What did you mean by that?” If you can’t answer this question, or if you give an obviously incorrect answer, you are in trouble. This should be an easy question for you to field since you wrote your thesis and should know exactly what every word and phrase in it means. Examiners think of such a question as a gimme, and often ask questions of this sort early in the defense to relax the candidate. Again, you’re supposed to be able to answer these easily.

So why can’t some people answer such questions? Putting aside brain freeze, there are only two answers I can think of:

  1. You didn’t actually write parts of your thesis. In today’s world where a thesis is often a collection of papers, it’s not unusual for a student’s thesis to contain text that they didn’t actually write, which is OK (if your university allows a thesis that is compiled in this way) provided you acknowledge it properly (which students sometimes fail to do). Of course, you may simply have plagiarized some of your text from somewhere. If it’s the latter, you’re in serious trouble now, because the examiners can smell a rat and will likely find you out. You should have read my posts about plagiarism (here and here) before you started writing your thesis, or at least before you submitted.
  2. You wrote it, but you borrowed some wording from somewhere (a paper, your supervisor) that you didn’t fully understand.

Whether parts of your thesis were written collaboratively or you used a clever turn of phrase you heard somewhere, there’s simply no excuse for not knowing what the words in your thesis mean. In either case, you need to make sure that you have read and understood every line in your thesis. If someone else wrote some of the text and you’re not sure what it means, ask them. If you ended up using a phrase that sounded good but that you didn’t quite understand, find out what it means. Again, questions like this often come early, so you’re getting off on the wrong foot if you can’t answer them, or if you answer them in a way that shows you have no idea what parts of your thesis mean.

A closely related issue is understanding the techniques used in your thesis. This is often a problem in collaborative work. It’s fine if other students or collaborators carried out an experiment that is discussed in your thesis. What is not fine is not understanding the experiment. You really ought to be able to explain every experiment described in your thesis, as well as the results of those experiments.

Yet another related point is that you ought to be able to explain results or experiments from other papers that you mention in your thesis. If you say “property X was measured by method Γ”, then you ought to understand how Γ works.

Be broadly aware of your field.

I always tell my students that they need to read papers in their general area of research. This goes double for students preparing for a defense. A lot of the questions at your defense will ask you to think about how your work connects with other work, or about how your work fits in to the bigger picture in your field. You can’t do that unless you have read papers in your general area outside of the specific, narrow topic of your thesis. You should be taking opportunities to broaden your knowledge base throughout your time as a graduate student by reading, but also by attending seminars and conferences. Even if you have been doing that, the gap between thesis submission and the day of the defense is definitely a time you should use to do some extra reading and to think about how your work fits into a bigger picture.

Because this is actually where most of the time in your thesis examination is likely to be spent, I’m going to emphasize this again: Be prepared for big picture questions. Think about what kinds of big picture questions the examiners might ask. Read materials that will help solidify your own understanding of the big picture in which your research is embedded.

Know your examiners.

The identities of a student’s examiners aren’t a secret, and yet some students take no account of their examination committee when preparing for their defense. If one or more of your examiners works in your field (typically the case for your external examiner, for example), you need to read a few of their papers. Perhaps you have already, and you may even have cited these papers in your thesis. For the other examiners, you should at least find out what they do, say from their web pages, and think a bit about questions they might ask given the perspective they bring to your defense. If one of your examiners is a protein structure person, even if it’s only tangentially related to your thesis, that person may ask structural questions about some of the molecules you mention. If you can ace those questions, you get lots of bonus credit. These may not be critical pass/fail questions, but knowing a little bit about an examiner’s interests can help the whole event go more smoothly.

Check out another student’s thesis defense.

At the University of Lethbridge, thesis defenses are open events, meaning that anyone can sit in on a defense. If that’s the case where you are, make sure to sit in on one or two thesis defenses before your big day. This will give you an idea of what happens at these things, and might help you get past the fear many students experience regarding their defense. It’s really not that bad.

Get some sleep!

You’ve had the pedal to the metal for several months leading up to your thesis submission. You’ve been burning the midnight oil, in fact, burning the candle at both ends. You’ve been busy as a beaver, working like a dog, going at it hammer and tongs. You’re tired.

Now that you have submitted, take some time for you. Make sure you get your sleep, eat well, and exercise. A well rested candidate is one who is likely to be able to think on his or her feet. A tired candidate is likely to flub the easy questions.

Some closing thoughts

When you were an undergraduate, you didn’t know what questions the professor was going to put on the exam. You did not let the uncertainty paralyze you. You studied, reasoning that if you had a good understanding of the material, you would do OK. Maybe you used what you knew about the professor to guess where the major emphasis might be. A thesis defense is not all that different. The material consists of your thesis and of closely related areas of science. Knowing who the examiners are, you can guess what their general areas of questioning might be. Now all you have to do is to study the material. If you made it all the way to thesis submission, you’ll be fine.

In a future blog post, I will talk about the thesis defense itself. Stay tuned.

Passing arguments to a Matlab script

It’s sometimes useful to run the same program with different parameters many times, for example when we want to systematically vary a parameter in a model to see how it affects the behavior of the model. This is really easy in C and related languages, which have excellent command-line processing facilities. It turns out to be easy in Matlab as well. Since I had to hunt around to figure out how to make this trick work, I thought I would present it here.

Matlab has a -r command-line switch that allows you to execute a Matlab command right after Matlab starts up. For example, here is a Matlab command-line version of the classic “Hello, world!” program:

matlab -r "disp('Hello, world.')"

(As a parenthetical remark, the mysteries of bash command-line interpretation defeated my attempts to put an exclamation mark at the end of the sentence, as is traditional.) If you type this command at a shell prompt, Matlab will print “Hello, world.” just before the Matlab command-line prompt appears. You can of course imagine initializing one or more variables in similar manner:

matlab -r "x=1; y=2;"

OK, but what if you want to initialize some variables used by a Matlab program? There are a couple of options here. One is to include the program name in the quoted command string. If your program is called prog.m, for example, you could type

matlab -r "x=1; y=2; prog"

Alternatively, you could redirect standard input:

matlab -r "x=1; y=2;" < prog.m

The above will work provided the variables x and y are not defined inside prog.m. If you always want to define these variables from the command-line, then that’s fine. However, you might want to have variables that have default values that you can override from the command line. This is also fairly easy. Consider the following program, which we will assume has been saved into a file called prog2.m:

if ~exist('x')
    x = 0.1;
end
x

The command line matlab -r "x=1; prog2" displays an x value of 1, while the command line matlab -r prog2 sets x to 0.1, as you might have expected.

A possibly useful way to use impact factors

Journal impact factors are much abused. Originally developed to help librarians make rational decisions about subscriptions, they are increasingly used to judge the worth of a scientist’s output. If we can place a paper in a high-impact-factor journal, we bask in the reflected glory of those who have gone before us, whether our paper is really any good or not. On the other hand, if we publish in lower-impact-factor journals, it’s guilt by association.

If you write grant applications, or have to apply for tenure or promotion, someone is likely to look at the impact factors (IFs) of the journals you have published in, particularly if the papers were published relatively recently and haven’t had the time to accumulate many citations. They are particularly likely to do that if they aren’t experts in your field, and aren’t sure about the quality of the journals you publish in. Like it or not, you are going to have to face the IF gauntlet. The problem is that IFs vary widely by field. What you need to do is to provide some perspective to the people reading your file so that they don’t assume that the standards of their field apply to yours.

I recently reviewed a grant application whose author found a nice way to address this issue: Each journal in the Thomson-Reuters database is assigned to one or more categories based on the area(s) of science they cover. For each category, the Journal Citation Reports provides a median impact factor as well as an aggregate impact factor, the latter being the impact factor you would calculate for all the articles published in the journals concerned as if they came from a single journal. If you want to put the impact factor of a particular journal in perspective, what you do is that you compare that impact factor either to the median or to the aggregate impact factor for the category (or categories) that the journal belongs to.

If you’re going to do this, I would suggest that you, first, be consistent about which statistic you use and, second, give this statistic for all the categories that a given journal belongs to. This will avoid accusations that you are cherry-picking statistics.

For example, my most recent paper was published in Mathematical Modelling of Natural Phenomena (MMNP), a journal with an impact factor of 0.8, which doesn’t seem impressive on the surface. This journal has been classified by Thomson-Reuters as belonging to the following categories:

Category Median IF Aggregate IF Quartile
Mathematical & Computational Biology 1.5 2.5 4
Mathematics, Interdisciplinary Applications 1.1 1.5 3
Multidisciplinary Sciences 0.7 5.3 2

This, I think, puts Mathematical Modelling of Natural Phenomena in perspective: It’s not a top-of-the-table journal, but its 0.8 impact factor isn’t ridiculously small either.

A closely related strategy would be to indicate which quartile of the impact factor scale a journal belongs to in its category. This information is also available in Journal Citation Reports, and I have provided these data for MMNP in the table above.

The main point I’m trying to make is that, if at all possible, you should provide an interpretation of your record and not let others impose an interpretation on your file. If you are in a position to fight the IF fire with fire, i.e. with category data from the Journal Citation Reports, it may be wise to do that.

All of that being said, some of the statistics for MMNP shown above demonstrate how crazy IF statistics are. If we look at the quartile placement of this journal in different categories, they range from the 2nd quartile, which should be suggestive of a pretty good journal, to the 4th, which makes this journal look pretty weak. In an ideal world, I would not suggest that you include such flaky statistics in your grant applications. But we don’t live in an ideal world. Referees and grant panel members discuss IFs all the time, so if it happens that you can tell a positive story based on the analysis of IFs, it’s just smart to do so.

How should we decide whether or not to accept a peer-review invitation?

In a recent commentary published in the journal Science and Engineering Ethics, José Derraik has proposed two criteria for deciding whether one should accept a peer-review invitation. Quoting directly from his article, these are

  1. If a given scientist is an author in x manuscripts submitted for publication in peer-reviewed journals over y months, they must agree to peer-review at least x manuscripts over the same y months.
  2. The perceived status of the journal requesting input into the peer-review process must not be the primary factor affecting the decision to accept or decline the invitation.

As a member of the editorial board of a small open-access journal that is trying to do some good in the world, BIOMATH, I fully concur with Derraik’s second point. If someone has submitted a paper in good faith to a scientific journal, and that journal is seeking expert advice on the quality of the paper, that advice should not be withheld without good reason. Prestige of the journal shouldn’t even be a consideration. I’m not talking about shady journals here, and in any event, the shady journals don’t typically look for peer reviewers.

I also have some sympathy for Derraik’s first point. We all receive too many requests to referee papers. At some point, you have to decide that you have done enough. I’m not sure about the simple equality between published papers and refereed papers that Derraik suggests. I think this is likely to lead to an undersupply of qualified referees. His argument relies on the fact that most papers have multiple authors, but at least in the fields I follow closely, most of those authors are students. While a student can co-referee a paper with a senior scientist as a training exercise, the senior scientist still has to take primary responsibility for the review. In order for the system to work properly, I suspect that most of us have to referee twice or three times as many papers as we write. The multiplier might be smaller (perhaps as small as 1) for people who write a lot of papers with many coauthors, but those folks are outliers. Nevertheless, I think Derraik is right that there has to be some proportionality between output and contribution to refereeing.

I think there’s another principle that we should add to Derraik’s list:

3. If you can’t think of very many alternative referees who are as qualified as you are to review the submission you have received, then you should accept the invitation.

This happens more often than you might think. Authors suggest referees based on the people they know in the field doing similar work. Editors similarly work hard to match the paper with appropriate referees, so it does happen fairly often that you’re the ideal referee for something you have received. In those cases, you should assume your responsibilities and do the work if it’s at all possible.

The flip side of Derraik’s list, which he doesn’t tackle directly, is the question of when you should refuse a referee assignment. To me, it comes down to a few things:

  1. I do consider whether I have been doing too much refereeing lately. There is only so much time, and at some point you need to write papers rather than read other people’s stuff all the time.
  2. I always ask myself if I can easily think of other qualified referees. If the answer is yes, I’m more inclined to decline the invitation. That doesn’t mean I automatically decline such invitations, only that I worry less if I feel I have to decline based on other considerations. And of course, I always pass along a list of potential referees to the journal when I do decide to decline an invitation on this basis.
  3. Sometimes, you receive papers you’re just not that qualified to review. Then you should definitely turn down the invitation.
  4. On occasion, you receive something and realize that other time commitments will make it impossible for you to complete the refereeing assignment in a reasonable span of time. Note that journals increasingly request a return of referee reports on unreasonable timetables. (Two weeks? Get real!) I have to admit that I sometimes turn down refereeing requests because the journal is proposing unreasonable timelines. I simply refuse to jump just because somebody says so. In other cases, I ask the editor if he/she would be willing to receive a report within x weeks, where x is a value chosen to work around other commitments, with x typically less than or equal to 4. They almost always say yes to these requests. There are times though when I’m so busy that I really could not read the paper and return the report for many, many weeks. In these cases, it’s best to decline the invitation right away.

Refereeing papers is a largely thankless job (although you may want to check out Publons, which is working to change that). That doesn’t make it less important, but it does mean that you have to balance the time you put into that against other commitments. To me, the overriding consideration is expertise: Am I the right person for the job? If the answer is yes, and you’re not completely overwhelmed with other duties, you really should accept the assignment.

Why you should join a scientific society

Again, this post is mostly addressed to students, since I assume that most scientists with a permanent job are already members of at least one scientific society. I will keep these comments general, although I will mention specific societies as examples from time to time.

Scientific societies vary greatly in focus, size, organization, and, yes, personality. Some, like the American Association for the Advancement of Science (AAAS), cover all the sciences and are, accordingly, massive—Wikipedia says that the AAAS had over 125,000 members in 2008. Others, such as the Canadian Society for Chemistry, target a major scientific discipline. Still others, like the 750-member Society for Mathematical Biology, narrow their focus to a specialized field. As some of their names suggest, scientific societies tend to be organized at the national level, although almost all of them will have significant numbers of foreign members, and many explicitly think of themselves as international societies. The larger societies tend to be run much more like businesses, with large complements of staff responsible for day-to-day operations. The smaller societies typically have few or no staff, and run on the labor of volunteers.

Scientific societies serve many, many purposes. Selfishly, they are conduits for information and provide networking opportunities for their members. In the case of societies organized at the national level, especially the larger ones, they are often important lobbying organizations that make sure that governments regularly hear scientists’ perspectives on various issues. The good societies are communities whose conferences are gatherings of people with common interests, even when those interests are uncommon as in the case of the small, specialized societies.

That last point is the one I want to emphasize: Joining a scientific community, in the ideal case, is joining a community. A member of a scientific society, whether a student or a famous professor, is “one of us” to other members of that society. And there are benefits to being a member, in the full sense of the word, of a group.

Some of the benefits are obvious: Every professional society has some kind of regular newsletter. These can vary from very simply reproduced amateur newsletters to professional-looking newspapers or magazines. These newsletters typically contain news stories about what is going on in the field, society news, profiles of members, conference announcements and job ads. Modern scientific societies will also have mailing lists that are restricted to their members. These are typically used to disseminate more time-sensitive information including, again, job ads and conference announcements, but can contain a variety of other content, as is the case for the public mailing lists discussed in my last blog post. Scientific societies usually hold conferences, and members always get discounted conference fees. And of course, attending a society conference is an ideal way to meet other members of the society.

In fact, society conferences can be invaluable networking opportunities. The people you meet there may one day be in a position to offer you a job. Even if that doesn’t happen, being known within your field means that people who make decisions about your career, about things like scholarship applications or grants for example, are likely to know you. Now we try really hard to screen out our biases when we’re refereeing grants or papers, but the truth is that it’s much easier to be a harsh judge when we don’t know the person whose file we’re judging.

In addition, many scientific societies have mentorship programs, as well as a variety of professional development events, often during their conferences, or in the days immediately preceding or following a conference. The latter can be technical seminars (for example, chemical safety mini-courses run by chemical societies), or they can be oriented toward career building, such as workshops on job interviews. The latter can be extremely worthwhile to young scholars.

But, you might say, I’m not interested in a career in academia. Then you should think hard about which society you join, but you should still join a society. Find one that has many non-academic members. Chemical societies, for example, typically have many members from industry. Some societies, like the Canadian Applied and Industrial Mathematics Society, try really hard to bridge the academic-industrial divide, and could be expected to have a number of industrial members, or at the very least some programs intended to help connect those two worlds.

Hopefully, I have convinced you that you should join a scientific society. But how do you choose one? Do you join your national society or a larger American society, for instance? Low student membership fees and reciprocal membership arrangements, in which members of a national society get reduced fees in another national society at a reduced rate, may make this a false dichotomy. However, money is tight for many students, so you may have to make an initial decision. Advice from your supervisor can be helpful here, but you should do a bit of thinking, too. What are your career objectives, and how can one society or another help you get there? What society do most of the people in your field and in the geographic area where you would eventually like to work belong to? What conferences do you want to attend? These are all factors you should consider. In the end, you are looking for a society in which you will feel comfortable, and find fellow travelers.

Beyond the society itself, the larger societies (and even some of the not-so-large ones) often have divisions to create smaller communities within the large community. For example, the Society for Industrial and Applied Mathematics (SIAM) has several highly active activity groups. Often, the real community is found at the level of these divisions. They would typically have their own mailing lists and conferences. Most societies with divisions will allow you to choose one division for free as part of the overall cost of membership. So if the really big societies seem intimidating to you, they need not be, provided they have a strong division in your area of interest.

The good news for students is a student membership fees are usually really low. Some societies, like SIAM, even allow their full members to nominate a certain number of students for free memberships. Scientific societies really want student members, because today’s student member is tomorrow’s full member.

So talk to your boss, and do a bit of research and thinking on your own. Join a society. It’s a small step towards building your career, but potentially a really pivotal one.

Useful mailing lists for mathematical biology

One of the toughest things to do as an academic is to keep informed about what is going on out there, in a professional sense. An earlier blog post addressed the issue of keeping up with the literature. But there are other things you need to know about: upcoming conferences and workshops, calls for papers, funding programs, new software, books published in the field, and of course, especially for younger scholars, Ph.D., postdoctoral, and job opportunities. So where do you find all of this stuff? A great place to start is to get on a few key mailing lists in your field. Yes, it’s old fashioned, but it’s a really effective way to have important announcements come to you. Perhaps in a few years, Twitter or other social media mechanisms will replace mailing lists. For now though, a lot of the people who have information you need are of my generation, and they’re sending their postings to mailing lists.

There’s a bit of vocabulary to learn to make effective use of mailing lists. Some mailing lists allow postings to flow directly to users as soon as they are received (or approved, in the case of moderated lists). Others function as digests, which means that contents are collected for a certain period of time (which can vary according to the list) and are then sent out in one larger email. Some lists offer the option of either getting postings immediately or as daily or weekly digests. Do look at the options when you subscribe to a mailing list.

I will be focusing particularly here on mailing lists for people in mathematical biology, since that is the community I most closely associate with. If you work in another field, ask your supervisor about mailing lists you should join. He or she should be a good resource person on this topic.

With that out of the way, here are some mailing lists I recommend for mathematical biologists:

  • SMB Digest: SMB Digest is a mailing list of the Society for Mathematical Biology. It is easily the most useful mailing list for mathematical biologists. It’s also highly unusual in that it’s a society mailing list that is open to non-members. (Most societies treat their mailing lists as a perk of membership. I will have more to say on joining scientific societies in a later blog post.) As a result of the SMB Digest being open to anyone, almost everyone will post items of interest to the community here. To join this mailing list, go to https://www.smb.org/smb-digest-community-forum-how-to/ for instructions. If you’re in mathematical biology, you simply must subscribe to this mailing list.
  • Non Linear Science Network Digest: Strictly speaking, this isn’t a mathematical biology mailing list, but many of us work on biological problems for which the appropriate methods come from nonlinear dynamics, so there is a lot of overlap between the audience for this list and the mathematical biology community. You can join this mailing list at http://www.maia.ub.es/cgi-bin/mailman/listinfo/nls-net.
  • NIMBIOS Newsletter: This one is a bit different. The other mailing lists mentioned above are intended to distribute information of general interest. The NIMBIOS Newsletter on the other hand is a publication of the National Institute for Mathematical and Biological Synthesis (hence the acronym) whose purpose is to publicize NIMBIOS activities and programs. This is however a very active institute with many interesting programs (visiting fellowships, postdoctoral fellowships, workshops, etc.), so I think it’s worthwhile being on this mailing list even if you have no direct plans and no immediate interest in visiting them. You can join this list here: http://www.nimbios.org/press/newsletter.
  • University of Lethbridge theoretical biology mailing list: This mailing list will only be of interest to people at or near the UofL. We use it to distribute information about seminars, courses, or other items of strictly local interest. If you want to join this list, go to http://listserv.uleth.ca/mailman/listinfo/theor-biol-l. The volume on this list is very low, although I always hope that more list members will share what’s going on in their area through this list.

If you know of other mailing lists that are useful for mathematical biologists, let me know and I may add them to this post.

Farewell, Oktay Sinanoğlu (1935–2015)

I’m a long-time fan of Oktay Sinanoğlu. I use the word “fan” quite deliberately: I don’t think there’s any other way to describe my relationship to the man. We’ve never met, or even exchanged emails. But I read some of his papers in graduate school and was immediately drawn in. I was therefore sad when I learned recently that he had died. One more scientific hero I’ll never meet…

Sinanoğlu had a long and productive career at Yale. Nevertheless, he was almost certainly better known in Turkey, where he became something of a national hero, than in the Western world. His papers covered a very wide cross-section of theoretical chemistry, including electronic structure, atomic clusters, solvent effects on chemical reactions, spectroscopy, automated generation of synthetic pathways, irreversible thermodynamics, dissipative structures, graph theoretical methods for studying the stability of reaction networks, and model reduction methods. It was the latter two topics that attracted my attention to Sinanoğlu when I was a graduate student. They intersected nicely with my interests at the time, which revolved around the dynamical systems approach to chemical kinetics.

My main research interest at the time was model reduction. Sinanoğlu, with his student Ariel Fernández, was among the first people to consider the construction of attracting manifolds for reaction-diffusion systems.1,2 This is a very difficult problem that is still a very active area of research. When I look back on the Fernández-Sinanoğlu papers on this topic, it seems to me that they anticipate later work on inertial manifolds.3 Because there weren’t many people following the field at the time, I don’t think that these papers are as well known as they deserve to be. Fernández and Sinanoğlu were just a bit ahead of their time. Had this work been published in the 1990s rather than the mid-1980s, I’m sure these papers would have received a great deal more attention.

Although I wasn’t working on these problems myself at the time, I became very interested in applications of graph theory in chemical kinetics while still a graduate student. It would be many years before I made any contributions to this topic myself, in association with my then-postdoc Maya Mincheva.4–6 Among the papers I read way back then were a pair written by Sinanoğlu in which chemical reaction networks were conceptualized as graphs.7,8 This allowed Sinanoğlu to enumerate all graphs corresponding to reactions with given numbers of reactions and species.7 A subsequent paper contained a conjecture about a topological feature of the graphs of chemical mechanisms capable of oscillations,8 thus attempting to tie together the structural features of his graphs and the dynamics generated by the rate equations. This is the theme we picked up many years later, although we followed a line of research initiated by Clarke9 and Ivanova10 rather than Sinanoğlu’s theory.

So, Oktay, thanks for inspiring a young graduate student. Rest in peace.

1A. Fernández and O. Sinanoğlu (1984) Global attractors and global stability for closed chemical systems. J. Math. Phys. 25, 406–409.
2A. Fernández and O. Sinanoğlu (1984) Locally attractive normal modes for chemical process. J. Math. Phys. 25, 2576–2581.
3A. N. Yannacopoulos, A. S. Tomlin, J. Brindley, J. H. Merkin and M. J. Pilling (1995) The use of algebraic sets in the approximation of inertial manifolds and lumping in chemical kinetic systems. Physica D 83, 421–449.
4M. Mincheva and M. R. Roussel (2006) A graph-theoretic method for detecting potential Turing bifurcations. J. Chem. Phys. 125, 204102.
5M. Mincheva and M. R. Roussel (2007) Graph-theoretical methods for the analysis of chemical and biochemical networks. I. Multistability and oscillations in ordinary differential equation models. J. Math. Biol. 55, 61–86.
6M. Mincheva and M. R. Roussel (2007) Graph-theoretical methods for the analysis of chemical and biochemical networks. II. Oscillations in Networks with Delays. J. Math. Biol. 55, 87–104.
7O. Sinanoğlu (1981) 1- and 2-topology of reaction networks. J. Math. Phys. 22, 1504–1512.
8O. Sinanoğlu (1993) Autocatalytic and other general networks for chemical mechanisms, pathways, and cycles: their systematic and topological generation. J. Math. Chem. 12, 319–363.
9B. L. Clarke (1974) Graph theoretic approach to the stability analysis of steady state chemical reaction networks. J. Chem. Phys. 60, 1481–1492.
10A. N. Ivanova (1979) Conditions for the uniqueness of the stationary states of kinetic systems, connected with the structures of their reaction mechanisms. 1. Kinet. Katal. 20, 1019–1023.

The most-cited work of all time

In my last blog post, I discussed the list of the 100 most-cited papers of all time compiled by Thomson-Reuters for Nature to celebrate the 50th anniversary of the Science Citation Index. In the same Nature article, there is a brief mention of a similar list compiled by Google based on their Google Scholar database. Unlike the Thomson-Reuters/Science Citation Index (SCI) list, the Google list includes books. This is partly a byproduct of the way the two databases are structured—Thomson-Reuters has separate databases for journals and books while Google has a single database that includes journal articles, books and “selected web pages”1—and, I suspect, partly a conscious choice by the Nature editors to focus on the most-cited papers. Certainly, the article focuses on the SCI list rather than the Google list which, as mentioned above, is different in composition. This provides us with an interesting opportunity to think a little harder about why things get cited and how we go about the business of counting citations and thereby trying to measure impact.

The most striking thing in the Google list is the number of books among the most highly cited work. 64 of the 100 most highly cited works are books, according to Google. Many of these books are technique-oriented, as one might expect from the kinds of papers that made the SCI list discussed in my last post. For example, the most highly cited book on Google’s list, and 4th most cited work in the overall list, is Molecular Cloning: A Laboratory Manual by Sambrook, Fritsch and Maniatis. The same book, but with a different permutation of authors (Maniatis, Fritsch and Sambrook), also shows up as number 15 on Google’s list. How can this be? This book has gone through a number of editions, with changing authorship. The book at #4 on Google’s list is the second edition, while #15 is the first edition. This highlights one of the key difficulties in compiling such a list: Books are often inconsistently cited, and changing editions pose a challenge in terms of combining or not combining citations. Since different editions with a simple permutation of authorship are actually an evolution of the same book, it seems to me that we should combine the citation counts for entries 4 and 15 (and later editions that don’t show up on this list as well). That would vault Molecular Cloning to #1 on Google’s list. If we take citations as a measure of impact, this book would be the most important scientific work of all time (so far). However, I think we can all agree that there is something funny about that statement. The number of citations indicates that this is clearly a very useful book, but it’s a compendium of methods developed by many, many other groups. It is highly cited because of its convenience. The original papers are not cited as often as this book (at least by Google’s count), but clearly it’s the original scientific work used by thousands of labs around the world that has had the impact, not this manual. So here we have a work that is very highly cited (and therefore, by any reasonable definition, important) but where it’s obvious that the very large citation count is not measuring scientific impact so much as utility as a reference.

The same sort of argument could be applied to scientific papers. Take, for example, the density functional theory papers discussed in my previous post. I would argue that the two papers by Walter Kohn in the SCI list have had more impact than any of the other DFT papers in this list since they enabled all the subsequent work developing the theory into practical methods. But they are not cited as often as some of the papers that describe functionals used in quantum chemical calculations. Citations therefore measure something—utility?—but it isn’t impact as I would understand the term.

There are some books on Google’s list that do describe original contributions to the literature. Among other things, there are those I would characterize as “big idea” books, in which new, influential ideas were first described. Number 7 on Google’s list is Thomas Kuhn’s The Structure of Scientific Revolutions. This is not a book that contains practical advice on carrying out particular experiments or particular types of calculations. It’s a contribution to the history and philosophy of science. But Kuhn’s ideas about the way in which science progresses have really struck a chord, so this book is cited a lot, across a wide range of fields, most of which have nothing to do with the history or philosophy of science.

The Google list also contains works from fields outside of hard-core science, which we don’t see in the Science Citation Index list. Thus, number 6 on Google’s list is Case Study Research: Design and Methods by Robert K. Yin, a book covering research methods used mostly in business studies. The Google list includes a number of other works from business studies, as well as from the social sciences. It’s sometimes useful to be reminded that “research” and “scientific research” are not synonymous.

But this is a blog about science, so back to science we go. An interesting question we could ask is how the books on Google’s list would have fared if they had been included in the Thomson-Reuters effort. To try to answer this question, I looked at another highly cited book, #5 in the Google list, Numerical Recipes by Press, Teukolsky, Vetterling and Flannery. Looking up citations to books in the Science Citation Index is not trivial. Because books don’t have records in the SCI database, there is no standard format for the citation extracted from citing papers. Moreover, people often make mistakes in formatting citations. Authors are left out, or the order of authorship is permuted. Additionally, people often cite a particular chapter or page when citing a book, and each of these specific citations is treated as a citation of a different work in the database. Anyhow, here’s what I did: I searched for citations to works by “Press W*” entitled “Num*”. This generated a list of 4761 cited works. This large number of distinct hits to the Numerical Recipes books makes it impossible to complete the search for citing articles. All we can tell is that there are more than 4761 citations to Numerical Recipes in the Web of Science database. In fact, the number must be much larger since it’s plain to see even from the small sample I looked at that some of the variations are cited dozens or even hundreds of times. But an accurate method of counting them in the Web of Science evades me.

Numerical Recipes is a bad case. There are many editions with slightly different titles (“in Fortran”, “in C”, etc.), the subtitle is sometimes included (“The Art of Scientific Computing”), multiple authors, and so on. Maybe if we try a book with one author and a limited number of editions? I then tried to do a citation search for Kuhn’s The Structure of Scientific Revolutions. Here, we find a different problem: The results are highly sensitive to details such as whether or not we include the word “The” from the title. And, although there are far fewer hits than for Numerical Recipes, there are still hundreds of them to sift through. Again, I’ve had to admit defeat: There does not appear to be a simple way to count citations to heavily cited books in the Web of Science.

Of course, citation counting is a tricky business at the best of times, and the problem afflicts both the Thomson-Reuters and Google Scholar databases. Errors in citations, which are fairly frequent, may deflate the citation count of a paper unless one is very careful about structuring the search. But beyond that, some papers are just hard to chase down in the database. Take Claude Shannon’s first of two classic papers published in 1948 on information theory, number 9 on the Google list, and nowhere to be found on the SCI list. It’s actually very difficult to find this paper in the Google database. I have found many lightly cited variants on this citation, but the version that Google Scholar reports as having been highly cited is actually a 2001 corrected reprint in the ACM journal Mobile Computing and Communications Review. It’s not clear to me that this is correct—has this paper really been cited more often than the 1948 original?—but then I’m not sure how Google’s database is structured. For the record, the Web of Science reports that the 1948 paper has been cited 9847 times, while the 2001 reprint has been cited 278 times. Quirks of a database can make the apparently simple act of counting citations tricky, all the more so for highly cited papers.

We all wish that we could quantify scientific output so that we could make better decisions about funding, prizes, and so on. It would sure make all of our lives much easier if this were possible. However, the problems that plague the apparently simple task of trying to round up and interpret a list of the most cited work—high citation rates for work that provides a convenient reference for an idea or technique but is not particularly original (books on methods, review papers), inconsistent database entries and citation errors—also affect citation counts for work that has accumulated a more normal number of citations. None of this is to deny the importance of a good book or review paper, nor are my comments intended to mean that there isn’t a clear difference in impact between two research papers in the same field whose citation counts differ by an order of magnitude. But there are enough subtleties in counting citations and in interpreting the results that I would not want to try to rank papers or scientists on this basis.

1R. Vine (2006) Google Scholar. J. Med. Libr. Assoc. 94, 97–99.

The top 100 most-cited papers of all time

I wrote earlier about the 50th anniversary of the Science Citation Index. Recently, Nature got together with Thomson-Reuters, the publishers of the Science Citation Index (now usually known as the Web of Science), to come up with a list of the 100 most-cited papers of all time.1 It’s an interesting list, which I encourage you to take a look at. Let’s face it: top-100 lists are always fun. Who is in there? Who is not? The Nature article provides a few reflections on this. For my part, I’m going to look at what this list tells us about citation patterns in different areas of science, focusing particularly on an area of science I know well, namely density functional theory, and one with which I have a tangential acquaintance, NMR.

There are, as the Nature article pointed out, a large number of papers in the top 100 from the field of density-functional theory (DFT). I may have missed some, but here are the ones I noticed: Lee, Yang and Parr (1988)2 at #7, Becke (1993)3 at #8, Perdew, Burke and Ernzerhof (1996)4 at #16, Becke (1988)5 at #25, Kohn and Sham (1965)6 at #34, Hohenberg and Kohn (1964)7 at #39, Perdew and Wang (1992)8 at #93, and Vosko, Wilk and Nusair (1980)9 at #96.

So what is DFT, anyway? One of the great problems in electronic structure calculations for molecules is electron correlation. Electrons repel, so they tend to stay away from each other. Classic methods of electronic structure calculation don’t properly take electron correlation into account. There are ways to put electron correlation back in after the fact, but they’re either not very accurate, or they take a huge amount of computing. Another problem arises because of exchange, a strange quantum mechanical effect that causes identical electrons with the same spin to stay away from each other moreso than is the case due to simple electrostatics (i.e. more than would be the case for electrons with opposite spin). DFT is based on some theory developed by Kohn in the 1960s (in papers #34 and 39 from Nature‘s list) that essentially states that there is a functional of the electron density that describes electron correlation and the exchange interaction exactly. Modern DFT is based on approximating this functional (usually using separate correlation and exchange parts) semi-empirically. Using good DFT exchange and correlation functionals allows us to do very accurate electronic structure calculations much more quickly than is the case with older methods. The one catch is that we don’t really know what the exchange and correlation functionals should be, so there’s a lot of work to be done coming up with good functionals and validating them. Nevertheless, the current crop of functionals does a pretty good job in many cases of chemical interest.

To understand the DFT citation patterns a bit better, I used the Web of Science to count up the number of times each of these papers was cited with one of the others. Here’s what I found:

LYP 88 Becke 93 PBE 96 Becke 88 KS 65 HK 64 PW 92 VWN 80
LYP 88 48653 33303 3498 17608 3305 2917 2114 5320
Becke 93 48041 3266 11118 2718 2499 2469 4284
PBE 96 38281 2948 5405 5040 2576 1647
Becke 88 27370 2734 2332 2246 5821
KS 65 23840 15129 2028 1955
HK 64 22608 1750 1656
PW 92 13173 1260
VWN 80 12862

Hopefully the code I’m using here is clear enough: LYP 88, for example, is Lee, Yang and Parr (1988). The entries on the diagonal are the total numbers of citations to the corresponding papers. This matrix is necessarily symmetric about its diagonal, so I didn’t fill in the entries below the diagonal. Note that the total citations for each paper differ somewhat from those reported in Nature‘s spreadsheet because I performed my analysis at a later point in time, and these papers continue to accumulate citations at an astonishing rate.

A few numbers jump out from this table: The top two DFT papers, Lee, Yang and Parr (1988) and Becke (1993), are cited together with very high frequency: 68% of the papers citing Lee, Yang and Parr (1988) also cite Becke (1993). Although cited together slightly less often, Becke (1988) is also frequently co-cited with Lee, Yang and Parr (1988): 36% of the papers citing the latter also cite Becke (1988). Now if we ask how many of the papers citing Lee, Yang and Parr (1988) also cite at least one of the Becke papers, we find that an astonishing 85% do. This is, of course, not a random occurrence. One of the most popular exchange-correlation functionals around, B3LYP, combines Becke’s 1988 exchange functional, which was further studied in his 1993 paper, with the Lee, Yang and Parr correlation functional. People who use the B3LYP functional in calculations will usually cite Lee, Yang and Parr (1988) along with at least one of the Becke papers. So if one of these papers was to appear in the top-100 list, it was likely that all three would, as they do. The appearance of these papers in the top-100 list is therefore a testament to the heavy use made of the exchange-correlation functionals developed by these authors in the chemical literature. In fact, all of the DFT papers in the top-100 list describe functionals that are heavily used in applications, except for the Kohn papers which provided the underlying theory.

One of the points made by the authors of the Nature article is that papers that describe methods get cited much more than papers that introduce new ideas into science. So why do the Kohn papers appear in this list? I would argue that this is due to a quirk of citation among people who do DFT calculations. The vast majority of citations to these papers are by people who do DFT calculations, not by people further developing the Hohenberg-Kohn-Sham theory. To fully understand how strange this is, we have to consider that the overwhelming majority of people doing DFT calculations and citing these papers use software written by someone else, usually commercial software like Gaussian. Ordinary users of a computational method don’t usually “dig down” to the theory layer in their citations in this way. For example, the vast majority of modern quantum chemical calculations (including most DFT calculations) are based on Roothaan’s classic work on self-consistent-field calculations.10 These papers have been cited, respectively, 4535 and 1828 times. This is an extremely high citation rate, but it’s a tiny fraction of the literature reporting calculations based on Roothaan’s algorithms. So it’s a bit strange that Kohn’s work gets cited by DFT users at this high rate, particularly since we can find other foundational papers in quantum chemistry, such as Roothaan’s that are not as routinely cited.

Now let’s contrast the citation record of DFT with that of NMR. NMR is nuclear magnetic resonance. NMR spectroscopy is used on a daily basis by every synthetic chemistry group in the world, and by many physical and analytical chemistry laboratories as well. Although they will typically back up NMR measurements with other techniques, NMR is how chemists identify the compounds they have made, and determine their structures. One would think that we would see papers that describe fundamental NMR techniques or popular experiments make this list. They don’t. There is a single NMR-related paper in the list, one that describes a software program for analyzing both crystallography and NMR data, showing up at #69. That’s it. So why is that? It’s certainly not that there are more DFT papers than there are papers that use NMR. In fact the reverse is certainly true. However, when experiments become sufficiently common, chemists stop citing their original sources. I was just looking at a colleague’s paper in which he mentioned six different NMR experiments in addition to the usual single-nucleus spectra. A literature reference was given for only one of these experiments, presumably because he felt the others were sufficiently well-known that they didn’t need references. The equivalent practice in DFT would be not to cite anything when using the B3LYP functional, on the basis that everybody knows this functional. That’s quite a difference in citation practices between two different areas of chemistry! And the fascinating thing is that these two fields have overlapping membership: There are lots of synthetic chemists who do DFT calculations to support their experimental work. And for some reason, they behave differently when describing DFT methods than when describing NMR methods.

To understand the vast difference in citation practices between these two areas, let’s look at a specific example. In many ways, two-dimensional NMR experiments, in which signals are spread along a second dimension that encodes additional molecular information, very much parallels DFT: These methods were developed at about the same time, and hardware that could carry out these operations routinely became available to ordinary chemists around the same time in both fields, and they both opened up what could be done in their respective fields. The first two-dimensional NMR experiment, COSY, was first proposed in 1971 by Jean Jeener.11 It’s not entirely trivial to hunt down citations to papers in conference proceedings in the Web of Science because they are not cited in any consistent format. However, after doing a bit of work, and including the reprinting of these lecture notes in a collection a few decades later, I found approximately 352 citations to Jeener’s epoch-making paper. Compare that to the 23840 citations to the Kohn-Sham (1965) paper. One could argue that Jeener’s paper was published in an obscure venue, and that this depressed the number of citations to this paper, which is certainly plausible.  Jeener’s proposal was implemented by Aue, Bartholdi and Ernst in 1976.12 That paper was cited 2919 times, which is a far cry from the number of citations accumulated by the Kohn papers, or by the “applied” DFT papers in which practical functionals are described. Kohn shared the 1998 Nobel Prize in Chemistry. Ernst was awarded the 1991 Nobel Prize in Chemistry. There are a lot of ways in which the two contributions are comparable. But not in citation counts. And clearly, it’s not a matter of the popularity of the methods: I used the ACS journal web site to see how many papers in the Journal of Organic Chemistry mentioned the COSY experiment. The Journal of Organic Chemistry is a journal that, by its nature, contains mostly papers reporting the synthesis and characterization of compounds, so it’s a good place to gauge the extent to which an experimental method is used. In that one journal alone, 6351 papers mention COSY. To be fair, some of these references will be to descendants of the original COSY experiment (of which there are many), but the very large number of COSY papers and the relatively small number of citations to the early papers on COSY still speaks to wildly different citation cultures between NMR and DFT practitioners.

None of this is intended to denigrate the work of the excellent scientists whose papers have made the top-100 list. They clearly deserve a very large pat on the back. However, it does show that we have to be extraordinarily careful in comparing citation rates even between very closely related fields. And these rates will of course also affect citation-based metrics like the h-index, perhaps not in extreme cases like the highly cited papers mentioned here, but certainly in the case of authors whose papers are well cited, if not insanely well cited.

In the interests of full disclosure: Axel Becke, whose name features so prominently in the top-100 list and in this blog post, supervised my senior research project when I was an undergraduate student at Queen’s. My first scientific paper was coauthored with Axel.13 In fact, I may have benefited from the higher citation rates in DFT as this paper is by far my most cited paper. I sometimes joke that my career has all been downhill since this very first scientific contribution. But to figure out if that was true, we would have to take the citation practices of the various areas I’ve worked in into account…

1R. van Noorden, B. Maher and R. Nuzzo (2014) The top 100 papers. Nature 514, 550–553.

2C. Lee, W. Yang and R. G. Parr (1988) Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density. Phys. Rev. B 37, 785–789.

3 A. D. Becke (1993) Density-functional thermochemistry. III. The role of exact exchange. J. Chem. Phys. 98, 5648–5652.

4J. P. Perdew, K. Burke and M. Ernzerhof (1996) Generalized gradient approximation made simple. Phys. Rev. Lett. 77, 3865–3868.

5A. D. Becke (1988) Density-functional exchange-energy approximation with correct asymptotic behaviour. Phys. Rev. A 38, 3098–3100.

6W. Kohn and L. J. Sham (1965) Self-consistent equations including exchange and correlation effects. Phys. Rev. 140, A1133–A1138.

7P. Hohenberg and W. Kohn (1964) Inhomogeneous electron gas. Phys. Rev. 136, B864–B871.

8J. P. Perdew and Y. Wang (1992) Accurate and simple analytic representation of the electron-gas correlation-energy. Phys. Rev. B 45, 13244–13249.

9S. H. Vosko, L. Wilk and M. Nusair (1980) Accurate spin-dependent electron liquid correlation energies for local spin-density calculations — a critical analysis. Can. J. Phys. 58, 1200–1211.

10C. C. J. Roothaan (1951) New developments in molecular orbital theory. Rev. Mod. Phys. 23, 69–89; (1960) Self-consistent field theory for open shells of electronic systems. Rev. Mod. Phys. 32, 179–185.

11J. Jeener (1971) “Lecture notes from Ampere Summer School in Basko Polje, Yugoslavia. Reprinted in NMR and More in Honour of Anatole Abragam, Eds. M. Goldman and M. Porneuf, Les editions de physique (1994).

12W. P. Aue, E. Bartholdi and R. R. Ernst (1976) Two-dimensional spectroscopy. Application to nuclear magnetic resonance. J. Chem. Phys. 64, 2229–2246.

13A. D. Becke and M. R. Roussel (1989) Exchange holes in inhomogeneous systems: A coordinate-space model. Phys. Rev. A 39, 3761–3767.

How to not find a graduate supervisor

Several times a week, I receive emails from prospective graduate students. The overwhelming majority of these emails get a boilerplate “no thanks” response from me. (I have actually automated these responses so I can send them with just a few quick mouse clicks.) Most of my colleagues don’t even bother to respond. Why? Because the emails I (and my colleagues) get almost always look like mass emails sent to (probably) hundreds of scientists worldwide without any indication that the student knows what I do or, worse, with clear indications that they don’t know what I do.

To those of you sending these emails: If you don’t want to go to graduate school, stop reading this post and keep sending those emails. My colleagues and I will keep deleting them.

Here’s what a typical one of these emails looks like, with my comments in square brackets:

Dear Professor, [What, you couldn’t be bothered to find out my name?]

I have read your website, and I am really excited about your research. [It would be a nice touch if you actually included some text in your email that showed that you knew what that research was.] I would like to join your group as a Ph.D. student starting in September.

I have an M.Sc. from U. of Wherever, where I completed a project in organic synthesis with Professor Whoever. [First you tell me you looked at my website. Now you tell me that you have experience in organic synthesis which is completely irrelevant to me. What you’re really saying is that you have not read my website and have no idea what I do. The email started off badly. Now I’m annoyed at you for wasting my time.] I think this background prepares me to contribute to your research.

I look forward to a positive response.

Sincerely,

A. Student

Look, students. It’s never been easier to figure out what a professor does. We all have websites that contain detailed descriptions of our research because we all want to find good graduate students. All you have to do is to look at those web sites and write emails that contain specific details relating to a particular professor’s interests. Sending out several hundred generic emails won’t get you a response even from people who might otherwise be interested. If you’re too lazy to look at my web site and to write an email that has been customized to my interests, I’m not likely to take your email very seriously.

If you like rejection, go ahead and send those generic emails. If you actually want to go to graduate school, do some research, write a few targeted emails to people who are actually in your area of interest and explain to them how you’re excited about their research (mentioning actual details of what they do), and how your background is, you think, good preparation for work in that person’s lab. Professors actually answer emails that have been written to them, and not just written to a professor. So if you’re not getting answers to your emails to professors, the problem isn’t the professors. It’s your emails.