Skip to content

Physics predicts why and where biological cells are regulated

It can seem that cells make choices about behavior that is often hard to explain, but often mimic social interactions. Game theory, in particular, has been insightful for understanding the dynamics of populations of microbes. Unfortunately, an unintended consequence of applying social concepts such as game theory to cells is the use of language that frames our thinking. Bet hedging strategies, cooperators, cheaters, noncooperators, altruistic cells, etc. are common descriptions of cell behavior found in the literature. Who doesn’t think of a cancer cell as being a selfish cell – undergoing unlimited growth despite the fact that it will ultimately kill its host? But this anthropomorphic view doesn’t help us cure cancer, either. (Image courtesy of Nathan Johnson and Pacific Northwest National Laboratory).

In comparison, physics-based theories about life, which have been around for 100 years, have garnered little attention. Lotka proposed in 1922 that natural selection is about who can harvest the required energy for reproduction from the environment the fastest. We now know that biological cells are part of the same phenomena as hurricanes and tornadoes, called dissipative systems. As the lower atmosphere heats up in the summer due to radiation from the sun, winds form that move the hot air to the cooler upper atmosphere. Under normal conditions, the winds are relatively random – a gust here, a gust there and seemingly changing directions from moment to moment. But when the temperature difference between the upper and lower atmosphere becomes large enough, the winds become correlated in an attempt to reduce the temperature difference as quick as possible. Circular wind patterns form, with heat being taken from the hot lower atmosphere and dumped in the cooler upper atmosphere. When these circular winds rotate sideways, they become tornadoes.

From the viewpoint of physics, the atmosphere is trying to redistribute energy equally. The fastest way to do that is to have correlated motions (convection cells) to do that.

Biological cells serve the same purpose, but now the correlated processes are enabled by visible chemical structures. Sunlight comes to the cell in the form of high energy radiation, and that radiation is captured and turned into lower energy chemical compounds. The driving force again is the need to distribute energy equally in nature. Except now, those lower energy chemical compounds form the cell structures needed to capture the sunlight. The more sun that is captured by a cell, the more cells that can be produced – to capture more sunlight. This is the process of metabolism and growth.

The chemical compounds produced need to be in just the right proportions to form structures that can capture energy, however. This is where things can get out of control. If some compounds are produced in too great of a quantity, the energy-capturing structures can’t form. Cells regulate this process. Dysregulation can cause more than just cells not working well – it can cause cancer – unregulated growth to the detriment of the host. In our new study, we have shown that known sites of metabolic regulation in cells, enzyme regulation, can be predicted based on these very principles.

Read about the technical details in the article, Enzyme activities predicted by metabolite concentrations and solvent capacity in the cell, published in the Journal of The Royal Society Interface This work was paid for by the National Institute of Biomedical Imaging and Bioengineering and the U.S. Department of Energy, Office of Biological and Environmental Research.

On the Reunification of Chemical and Biochemical Thermodynamics

For 26 years, it has been assumed by some that the thermodynamics of open-system biochemical reactions must be executed by performing Legendre transformations on the terms involving the species whose concentrations are being held fixed. In contrast, standard nontransformed thermodynamics applies to chemical processes. However, it has recently been shown that such biochemical reactions may be accurately examined using either method. The papers that report this finding use the hydrolysis of ATP at fixed pH and pMg as an example. This biochemical process comprises 14 equilibrium reactions involving 17 chemical species. Consequently, the chemical and mathematical complexity is so high that the underlying principles leading to the equivalence of the two methods tend to become lost. Furthermore, the details of such an example are too complex for classroom presentation. This paper makes these principles abundantly clear by the thermodynamic examination of the simple case of a unimolecular isomerization conducted under both open and closed conditions. For the open system, the analysis is conducted using both Legendre-transformed and nontransformed methods. The results are shown to be identical provided that the chemical potentials of the terms on which the transform is performed are held constant. More importantly, the analysis makes the underlying reasons for the equivalence of the two methods very clear and shows when they will not be equivalent. The model is ideally suited for classroom presentation because of its chemical and mathematical simplicity.

Circadian Proteomic Analysis Uncovers Mechanisms of Post-Transcriptional Regulation in Metabolic Pathways

Transcriptional and translational feedback loops in fungi and animals drive circadian rhythms in transcript levels that provide output from the clock, but post-transcriptional mechanisms also contribute. To determine the extent and underlying source of this regulation, we applied newly developed analytical tools to a long-duration, deeply sampled, circadian proteomics time course comprising half of the proteome. We found a quarter of expressed proteins are clock regulated, but >40% of these do not arise from clock-regulated transcripts, and our analysis predicts that these protein rhythms arise from oscillations in translational rates. Our data highlighted the impact of the clock on metabolic regulation, with central carbon metabolism reflecting both transcriptional and post-transcriptional control and opposing metabolic pathways showing peak activities at different times of day. The transcription factor CSP-1 plays a role in this metabolic regulation, contributing to the rhythmicity and phase of clock-regulated proteins.


Comparison of Optimal Thermodynamic Models of the Tricarboxylic Acid Cycle from Heterotrophs, Cyanobacteria and Green Sulfur Bacteria

We have applied a new stochastic simulation approach to predict the metabolite levels, material flux and thermodynamic profiles of the oxidative TCA cycles found in E. coli and Synechococcus sp. PCC 7002, and in the reductive TCA cycle typical of chemolithoautotrophs and phototrophic green sulfur bacteria such as Chlorobaculum tepidum. The simulation approach is based on modeling states using statistical thermodynamics and employs an assumption similar to that used in transition state theory. The ability to evaluate the thermodynamics of metabolic pathways allows one to understand the relationship between coupling of energy and material gradients in the environment and the self-organization of stable biological systems, and it is shown that each cycle operates in the direction expected due to its environmental niche. The simulations predict changes in metabolite levels and flux in response to changes in cofactor concentrations that would be hard to predict without an elaborate model based on the law of mass action. In fact, we show that a thermodynamically unfavorable reaction can still have flux in the forward direction when it is part of a reaction network. The ability to predict metabolite levels, energy flow and material flux should be significant for understanding the dynamics of natural systems and for understanding principles for engineering organisms for production of specialty chemicals.

Read more:

Concepts, Challenges and Successes in Modeling Thermodynamics of Metabolism

The modeling of the chemical reactions involved in metabolism is a daunting task. Ideally, the modeling of metabolism would use kinetic simulations, but these simulations require knowledge of the thousands of rate constants involved in the reactions. The measurement of rate constants is very labor intensive, and hence rate constants for most enzymatic reactions are not available. Consequently, constraint-based flux modeling has been the method of choice because it does not require the use of the rate constants of the law of mass action. However, this convenience also limits the predictive power of constraint-based approaches in that the law of mass action is used only as a constraint, making it difficult to predict metabolite levels or energy requirements of pathways.
An alternative to both of these approaches is to model metabolism using simulations of states rather than simulations of reactions, in which the state is defined as the set of all metabolite counts or concentrations. While kinetic simulations model reactions based on the likelihood of the reaction derived from the law of mass action, states are modeled based on likelihood ratios of mass action. Both approaches provide information on the energy requirements of metabolic reactions and pathways. However, modeling states rather than reactions has the advantage that the parameters needed to model states (chemical potentials) are much easier to determine than the parameters needed to model reactions (rate constants). Herein we discuss recent results, assumptions and issues in using simulations of state to model metabolism.

Simulating Metabolism with Statistical Thermodynamics

New methods are needed for large scale modeling of metabolism that predict metabolite levels and characterize the thermodynamics of individual reactions and pathways. Current approaches use either kinetic simulations, which are difficult to extend to large networks of reactions because of the need for rate constants, or flux-based methods, which have a large number of feasible solutions because they are unconstrained by the law of mass action. This report presents an alternative modeling approach based on statistical thermodynamics. The principles of this approach are demonstrated using a simple set of coupled reactions, and then the system is characterized with respect to the changes in energy, entropy, free energy, and entropy production. Finally, the physical and biochemical insights that this approach can provide for metabolism are demonstrated by application to the tricarboxylic acid (TCA) cycle of Escherichia coli. The reaction and pathway thermodynamics are evaluated and predictions are made regarding changes in concentration of TCA cycle intermediates due to 10- and 100-fold changes in the ratio of NAD+:NADH concentrations. Finally, the assumptions and caveats regarding the use of statistical thermodynamics to model non-equilibrium reactions are discussed.

See the complete article:

Mathematical Modeling of Microbial Community Dynamics: A Methodological Review

Microorganisms in nature form diverse communities that dynamically change in structure and function in response to environmental variations. As a complex adaptive system, microbial communities show higher-order properties that are not present in individual microbes, but arise from their interactions. Predictive mathematical models not only help to understand the underlying principles of the dynamics and emergent properties of natural and synthetic microbial communities, but also provide key knowledge required for engineering them. In this article, we provide an overview of mathematical tools that include not only current mainstream approaches, but also less traditional approaches that, in our opinion, can be potentially useful. We discuss a broad range of methods ranging from low-resolution supra-organismal to high-resolution individual-based modeling. Particularly, we highlight the integrative approaches that synergistically combine disparate methods. In conclusion, we provide our outlook for the key aspects that should be further developed to move microbial community modeling towards greater predictive power.
Read more:

The Limits of Big Data in Science

Biotechnology plants its analytic head deep into the cloud, deploying algorithms to derive meaning from a flood of information. But what’s the difference between “big data” and simply having lots of information? Sometimes we get enamored with the data itself and forget that it’s not just big data that’s important but meaningful data—data that we can accept or reject hypotheses with and make a significant step forward in our knowledge about the science.

Read the full article at Scientific American… 

pGraph: Efficient Parallel Construction of Large-Scale Protein Sequence Homology Graphs

Wu, C., Kalyalaraman, A., Cannon, W.R. ,  . IEEE Trans. Par. Dist. Sys. 2012


Detecting sequence homology between protein sequences is a fundamental problem in computational molecular biology, with a pervasive application in nearly all analyses that aim to structurally and functionally characterize protein molecules. While detecting the homology between two protein sequences is relatively inexpensive, detecting pairwise homology for a large number of protein sequences can become computationally prohibitive for modern inputs, often requiring millions of CPU hours. Yet, there is currently no robust support to parallelize this kernel. In this paper, we identify the key characteristics that make this problem particularly hard to parallelize, and then propose a new parallel algorithm that is suited for detecting homology on large data sets using distributed memory parallel computers. Our method, called pGraph, is a novel hybrid between the hierarchical multiple-master/worker model and producer-consumer model, and is designed to break the irregularities imposed by alignment computation and work generation. Experimental results show that pGraph achieves linear scaling on a 2,048 processor distributed memory cluster for a wide range of inputs ranging from as small as 20,000 sequences to 2,560,000 sequences. In addition to demonstrating strong scaling, we present an extensive report on the performance of the various system components and related parametric studies.

VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data

Peterson E.S., McCue L.A., Schrimpe-Rutledge A.C., Jensen J.L., Walker H., Kobold M.A., Webb S.R., Payne S.H., Ansong C., Adkins J.N., Cannon W.R, Webb-Robertson B.J., VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data. BMC Genomics. 2012 Apr 5;13:131. doi: 10.1186/1471-2164-13-131

The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates.

VESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (Yersinia pestis Pestoides F and Synechococcus sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data.

VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at

Proteotyping of Microbial Communities by Optimization of Matches from Tandem Mass Spectrometry

Hugo, A., Baxter, D. J., Kulkarni, G., Kalyanaraman, A., Callister, S. J., and Cannon, W. R. Pac. Symp. BioComputing, 2012, pp 225-234, 10.1142/9789814366496_0022.


We report the development of a novel high performance computing method for the identification of proteins from unknown (environmental) samples. The method uses computational optimization to provide an effective way to control the false discovery rate for environmental samples and complements de novo peptide sequencing. Furthermore, the method provides information based on the expressed protein in a microbial community, and thus complements DNA-based identification methods. Testing on blind samples demonstrates that the method provides 79-95% overlap with analogous results from searches involving only the correct genomes. We provide scaling and performance evaluations for the software that demonstrate the ability to carry out large-scale optimizations on 1258 genomes containing 4.2M proteins.

A MapReduce Implementation of a Hybrid Spectral Library- Database Search Method for Peptide Identification

Kalyanaraman, A., Latt, B., Baxter, D. J. and Cannon, W.R.. Bioinformatics (2011) 27 (21): 3072-3073. doi: 10.1093/bioinformatics/btr523


A MapReduce based implementation called MR-MSPolygraph for parallelizing peptide identification from mass spectrometry data is presented. The underlying serial method, MSPoly-graph, uses a novel hybrid approach to match an experimental spectrum against a combination of a protein sequence database and a spectral library. Our MapReduce implementation can run on any Hadoop cluster environment. Experimental results demonstrate that, relative to the serial version, MR-MSPolygraph reduces the time to solution from weeks to hours, for processing tens of thousands of experimental spectra. Speedup and other related performance studies are also reported on a 400-core Hadoop cluster using spectral data sets from environmental microbial communities as inputs.

Analyzing Data for Systems Biology: Working at the Intersection of Thermodynamics and Data Analytics

photosynthesis_cyanoCannon, W.R., and Baxter, D. J., SciDAC 2011 Proceedings, Denver CO


Many challenges in systems biology have to do with analyzing data within the framework of molecular phenomena and cellular pathways. How does this relate to thermodynamics that we know govern the behavior of molecules? Making progress in relating data analysis to thermodynamics is essential in systems biology if we are to build predictive models that enable the field of synthetic biology. This report discusses work at the crossroads of thermodynamics and data analysis, and demonstrates that statistical mechanical free energy is a multinomial log likelihood. Applications to systems biology are presented.

Team ratchets up accuracy for identifying protein bits

Anyone who has tried to match an unfamiliar bird’s features to its field guide portrait knows that reality rarely provides a perfect comparison to the ideal specimen.

Scientists have faced a similar problem when attempting to decode protein patterns found in living cells – a field known as proteomics. Using mass spectrometry, the technology of choice for protein identification, scientists try to match protein fragments, or peptides, against idealized patterns in peptide databases. These databases often provide a poor

But using bioinformatics techniques, researchers at Pacific Northwest National Laboratory (PNNL) have developed a pattern-matching algorithm that improves the accuracy of peptide identification by between 50 and 150 percent, compared with standard approaches.correspondence – the industry standard for positive peptide identification is usually a dismal 15 to 20 percent.

– See more at:

Large Improvements in MS/MS Based Peptide Identification Rates using a Hybrid Analysis.

Cannon, W.R., Rawlins, M. M., Baxter, D., J., Lipton, M., Callister, S., and Bryant, D. A., J. Proteome Res., 2011, 10 (5), pp 2306–2317, DOI: 10.1021/pr101130b


We report a hybrid search method combining database and spectral library searches that allows for a straightforward approach to characterizing the error rates from the combined data. Using these methods, we demonstrate significantly increased sensitivity and specificity in matching peptides to tandem mass spectra. The hybrid search method increased the number of spectra that can be assigned to a peptide in a global proteomics study by 57−147% at an estimated false discovery rate of 5%, with clear room for even greater improvements. The approach combines the general utility of using consensus model spectra typical of database search methods with the accuracy of the intensity information contained in spectral libraries. A common scoring metric based on recent developments linking data analysis and statistical thermodynamics is used, which allows the use of a conservative estimate of error rates for the combined data. We applied this approach to proteomics analysis of Synechococcus sp. PCC 7002, a cyanobacterium that is a model organism for studies of photosynthetic carbon fixation and biofuels development. The increased specificity and sensitivity of this approach allowed us to identify many more peptides involved in the processes important for photoautotrophic growth.