Statistical Mechanics

Modern science has revealed that matter is comprised of atoms, and the number of atoms present in even a tiny piece of matter is huge. A small beaker with only 18 c.c. of water will contain approximately 1.8 million, billion, billion atoms. It follows that any attempt to describe the physical properties of a material in terms of the motion of its individual atoms is doomed to failure. However, such astronomical numbers are perfect for a statistical treatment, by means of which the bulk properties (as opposed to individual atomic properties) may be determined with great accuracy. This is where the term statistical mechanics comes from: it is a fusion of mechanics with statistics. Fortunately, since it is the properties of bulk materials that are usually of interest, this treatment is exactly what is required.

Without statistical mechanics, little meaningful information could be obtained from molecular dynamics simulations. It provides a many powerful mathematical tools for drawing out useful properties from a mass of data. The most important tools are:

Ensemble Averaging:
This is important if we want to calculate the bulk properties of a system. Bulk properties are inevitably average properties, describing the average effect of all the atoms acting together. Ensemble averaging provides the prescription for calculating the averages properly.
Correlation Functions:
These are particularly valuable for drawing out relationships between the observable properties of a bulk system. Spatial correlations reveal how the system structure is organised, while time correlations reveal the dynamical processes that occur.
Fluctuations:
These describe how much a system with fixed average properties (i.e a system in equilibrium) can vary from the average at any instant. This variation is subtly linked to the physical processes that operate in the system.
Distribution Functions:
These describe how the properties of the bulk system are shared out among the component atoms. Equivalently they describe the probability of finding the bulk system in a given state.

Ensemble averages

The ensemble is a central concept in statistical mechanics. Imagine that a given molecular system is replicated many times over, so that we have an enormous number of copies, each possessing the same physical characteristics of temperature, density, number of atoms and so on. Since we are interested in the bulk properties of the system, it is not necessary for these replicas to have exactly the same atomic positions and velocities. In other words the replicas are allowed to differ microscopically, while retaining the same general properties. Such a collection of replicated systems is called an ensemble.

Because of the way the ensemble is constructed, if a snapshot of all the replicas is taken at the same instant, we will find that they differ in the instantaneous values of their bulk properties. This phenomenon is called fluctuation. Thus the true value of any particular bulk property must be calculated as an average over all the replicas. This is what is meant by an ensemble average, and the instantaneous values are said to fluctuate about the mean value.

Molecular dyamics proceeds by a numerical integration of the equations of motion. Each time step generates a new arrangement of the atoms (called a configuration) and new instantaneous values for bulk properties such as temperature, pressure, configuration energy etc. To determine the true or thermodynamic values of these variables requires an ensemble average. In molecular dynamics this is achieved be performing the average over successive configurations generated by the simulation. In doing this we are making an implict assumption that an ensemble average (which relates to many replicas of the system) is the same as an average over time of one replica (the system we are simulating). This assumption is known as the Ergodic Hypothesis. Fortunately it seems to be generally true, provided a long enough time is taken in the average. However it has not yet been rigorously proved mathematically.

Examples of properties that can be calculated as ensemble averages include:

Temperature; Pressure; Density; Configuration energy; Enthalpy; Structural correlations; Time correlations; Elastic properties; In fact, almost everything of interest that can be obtained through a molecular dynamics simulation, requires the calculation of some kind of ensemble average!

Correllating Functions

Correlation functions are a powerful means of analysing time dependent data. In molecular dynamics they are the primary means by which the average time dependent properties of a system of atoms are determined. (Though it is important to note that the usefulness of correlation functions is not exclusive to time dependent phenomena, and correlations in space are equally important. In fact, in the following description, the time dependence of the functions may be replaced by a distance dependence, without violating the principles.)

Suppose there are two time dependent properties of a system, which can be written as A(t) and B(t) to show they are functions of time. Suppose also that a connection between these two functions is suspected - the value of one of them at a particular time influences the value of the other at some time later. In other words there is a cause-and-effect relationship operating between them. How can this be proved? Calculating the correlation function is a way of doing it.

The best known example of a correlation function is the velocity autocorrelation function, which correlates the velocity of an atom with itself (hence the term autocorrelation). What this means is that functions A(t) and B(t) are both the particle’s velocity. How can this be useful? Well, by correlating the velocity of an atom at a given time, with its velocity at a later time, it reveals what effect the interatomic forces have had on the atom’s motion. (If there were no forces, the atom’s velocity would never change, and the correlation would stay at a fixed vaue for all time.) In fact at short time the forces don’t have much effect and the correlation is high, but as the atom interacts with its neighbours, the velocity changes and the correlation is reduced. The correlation function therefore reveals the time scale for changes in the atomic motion, revealing, for example, the average time between atom-atom collisions.

insert image

A typical plot of a velocity autocorrelation function (above) shows how the correlation decays rapidly from complete correlation at zero time, to become negative. After a short descent to a maximum negative vaue it rises again (in some cases becoming positive again). The long-time behaviour is a progression towards zero correlation. The first descent to negative correlation is a sign that the average atom collides with a neighbouring atom and bounces back - giving the atom a velocity in the opposite direction (with opposite sign).

Calculating correlation functions in a molecular dynamics simulation is a tricky procedure, but can be thought of as a form of ensemble average.

Calculating Correlation Functions

A time dependent correlation function is calculated as follows.

First a molecular dynamics simulation is used to generate a series of time-sequenced values of properties A and B. These form two sets of data (A(i) and B(i)), where the index (i) specifies the time step at which each value was calculated, and it is supposed that i ranges from 1 to some huge number N, which may be many thousand. Such a set is usually called an array, and can be thought of as a long column of numbers. A third array (C(i)) can now be defined, which will store the ordered values of the correlation function. The value of the first array element C(1) is defined as:

add equation

which means that each array value of A(i) is multiplied by the corresponding value of B(i) for every value in the arrays and the result summed to a single value, which is divided by the number of values N. In other words it is the average value of all the products, given by:

add equation

For the next value of the corrlation array (C(2)) a simular procedure is followed, except that instead of taking the products of A(i) and B(i) with the same index, the two indices differ by 1:

add equation

which is the same as

add equation

It is clear that the average is now over (N-1) values, because there is no value corresponding to B(N+1), as B(N) is the last in the list.

Other values of the array C(i) can be constructed in a simular way, by taking the sum of products A(i)B(i+2) to make C(3), A(i)B(i+3) for C(4) and so on. This prescription can be summarised as:

add equation

The result of all this arithmetic, (which of course requires a computer!) is a time-ordered array C(i), which represents the correlation function. What can such a function mean?

Clearly the first element C(1) is just the average of the products of A(i) and B(i) taken at the same time. If these two properties have no connection whatsoever and suppose that they may take both positive and negative values (which can always be arranged by subtracting the mean value of A from the instantaneous value A(i), and likewise for B(i)), then the average will result from a sum of random numbers with positive and negative values, which will sum to zero. If, on the other hand, A(i) and B(i) are completely related, then a given value of A(i) would imply a related value of B(i), meaning the product would always have the same sign and the sum would be a large positive or negative number. So a nonzero value of C(1) indicates that there is some relationship between functions A and B. The two functions are then said to be correlated.

What if A and B are related, but there is a time lag between the value of A and the corresponding value of B? In this case the stronest correlation will occur when A(i) is compared with B(i+j), where j represents the time lag. In other words, the correlation shows up strongest in C(j), and not C(1). It follows that if the whole correlation function is constructed (i.e. with all possible values of j considered), it will be seen at a glance whether there is any correlation at all between two functions A and B, and precisely what the time lag of the correlation is. Any correlation of this nature, revealed by the correlation function, is strong evidence for the lagging function being somehow dependent on the leading function.

Fluctuations

Most of the properties that we calculate for a molecular system are averages. Well known properties like temperature, pressure and density are calculated as ensemble averages, and in the real world they are treated as fixed, measurable quantities, which they generally appear to be. However all averages are obtained by summing over many numbers, and it would be very unusual (even pointless) if all the individual numbers summed had exactly the same value. Thus in practice we expect the average to show some dispersion - individual contributions are scatteerd about the mean value. In statistical thermodynamics this dispersion about the average value is known as fluctuation and it is both a subtle and important property of all physical systems.

When calculating an ensemble average (of say, pressure at fixed temperature and density), we take an instantaneous snapshot of a very large set of replicas of the system concerned and compute the average from the sum of the individual values taken from each replica. Even though each replica represents the same system at the same pressure, their individual, instantaneous values differ slightly, because the molecules that bombard the vessel surfaces to create the pressure are not in synchronisation between each replica and cannot possibly give rise to precisely the same surface forces at the same instant. Thus, with pressure, we expect some fluctuation about the mean value and indeed, similar arguments can be made for all the bulk properties of the system.

Fluctuations are of fundamental importance in statistical mechanics because they provide the means by which many physical properties of a molecular system can happen. For instance, the density of a liquid at equilibrium is a fixed, uniform quantity and we feel justified in considering the system to be isotropic - the same at all points within its bulk. Yet we know that the molecules in the system are undergoing diffusion and can easily travel throughout the bulk of the liquid. It is diffcult to imagine how this diffusion can take place if the environment each molecule is in is completely isotropic. If however we consider the density to be fluctuating minutely from the mean value at different points in the bulk, we can readily see that such fluctuations would provide a means by which the diffusion may take place. It is a surprising fact, but most of the physical properties of a bulk system are driven by fluctations, and indeed can be calculated directly from them. For this reason it is possible to view fluctations as even more fundamental than the average value.

A good example of the importance of fluctuation is provided by the Fluctuation-Dissipation theorem, which is a theorem of great power in statistical mechanics. This theorem proposes that the mechanism underpinning the response of a system to an external perturbation, is precisely the same mechanism by which equilibrium fluctuations are held close to the average bulk vaue. Thus for example, a molecule vibrationally excited by an infrared photon, will lose (i.e. dissipate) that energy to the rest of the system by the same mechanism by which normal vibrational energies are exchanged (i.e. fluctuate) between molecules at equilibrium. This insight is the basis of a theoretical description of solution spectroscopy.

The Distribution Function

The concept of a distribution function is at the heart of statistical mechanics. The energy distribution function, for example, describes how the atoms in a system share out the energy between them. In mathematical terms it defines the probability of finding an atom with a given amount of energy. (Strictly speaking the distribution function defines the probability density - the probability of a molecule possessing an energy in a very narrow energy range.)

Imagine a system composed of many identical atoms, under normal circumstances it will be found that different individual atoms will possess different amounts of energy. This is inevitable because atoms are constantly colliding and exchanging energy with each other, and just as money ends up unevenly distributed among a population of people, energy ends up unevenly shared between molecules: most will have very little, some will have a lot. The distribution function expresses this mathematically.

The earliest example of a distribution function in science was the Maxwell velocity distribution function for the molecules in a gas. By an ingenious argument Maxwell (in 1860) showed that the distribution function in this case is a well known bell-shaped function called a Gaussian:

add equation and image

This function shows immediately that molecules with very large velocities are not very common, (see how the function approaches zero at large positive and negative velocities) but molecules with small velocities are common. Also the symmetry of this function shows that positive and negative velocities are equally probable. It follows that the average velocity of the molecules in a gas is zero. (Thus must be so, otherwise the air molecules in a room might suddenly all leap to one side, creating a vacuum at one end!) Incidentally, the average speed of a molecule (as opposed to its velocity) is not zero - why do you think this is?

The distribution function is central to statistical mechanics because once it has been determined, all the important properties of the system in bulk can be calculated, without worrying about what individual molecules may be doing.