Institutional home pages
Arrival and Departure
Arrival: 26 May 2023 (Fri)
Depart: 06 June 2023 (Ts)
In the 2022 workshop, our discussion of scientific ethics touched on the many ways that science has “unwritten rules”. Here I provide a few supplementary resources.
One way that unwritten rules impact who succeeds in science is sometimes referred to as the “hidden curriculum”. These are best practices for success in STEM that first-generation and underrepresented minority students must navigate but are not taught in classrooms. With this in mind, we recently wrote a paper called “Ten simple rules for succeeding as an underrepresented STEM undergraduate” to help make explicit the hidden curriculum of science.
Many of the unwritten rules of science are the same as those that operate in society at large, including racism operating within people who adhere to egalitarian attitudes. In the essay “Science in the Belly of the Beast: my Career in the Academy” Joseph L. Graves, Jr. describes his personal experience with the unwritten rules of the academy, including the “one and one-quarter rule”.
PAML lab Materials
The lab exercises (PAML demo) are available via small website (link below). The site contains some additional resources that are worth a look when you have time. Please note that slides may change a little prior to the lab. I will post modified PDFs as required.
PAML demo: PAML Lab website
PAML demo resources: webpage
PAML demo slides: slides (PDF) (updated for 2023)
If you want doing the lab independently of the workshop (at home, on your own time and on your own computer), then you can do this by downloading all the necessary files from an archive here, or you can download the files individually for each exercise as you need them here.
NOTE: If you are doing the PAML Lab at the workshop, then use the VM and the symlink in your home directory named “moledata” to obtain the course data files!!!
I am changing the lecture content for 2022 and beyond. This lecture will provide a more general background on evolutionary forces, and the Neutral and Nearly-Neutral theories of molecular evolution. Some details about fitting codon models to real data, have been moved to the “PAML Lab” lecture.
2023 Lecture slides (Part 1), Intro to Neutral & Nearly Neutral Theories of Molecular Evolution: slide set 1 (updated)
2023 Lecture slides (Part 2), Intro to Codon Models: slide set 2 (updated)
Some material from previous workshops
I have included links to the 2019 slides below. This update includes more information on mechanistic processes of codon evolution (via the MutSel framework). Also, some might be interested in parts 3 and 4, which cover more advanced statistical topics, such as the requirements for likelihood inference and “phenomenological load” on parameter estimates. These topics will not be covered in 2022.
I updated the lecture slides on codon models for 2017. Because the older slides tend to have more details about fitting codon models to real data, I have included links to the 2015 and 2016 slides below; these slides provide more information about the powers and pitfalls of inference under codon models.
2016 Lecture slides, Part 1: 2016 PDF file1
2016 Lecture slides, Part 2: 2016 PDF file2
2015 Lecture slides, Part 1: 2015 PDF file1
2015 Lecture slides, Part 2: 2015 PDF file2
Key papers related to the lecture material:
A novel phenotype+genotype codon-model (PG-BSM) formulated to test and identify sites within a gene involved in phenotypic adaptation. This method does NOT require dN/dS>1 to infer adaptive molecular evolution!:
(Jones, C. T., Youssef, N., Susko, E., & Bielawski, J. P. (2020). A Phenotype–Genotype Codon Model for Detecting Adaptive Evolution. Systematic biology, 69(4), 722-738.)
Phenomenological load (PL) and biological conclusions under codon models:
(Jones C.T., Youssef N., Susko E., Bielawski J.P., 2018. Phenomenological Load on Model Parameters Can Lead to False Biological Conclusions. Mol Biol Evol. 35(6):1473-1488.)
Review of major inference challenges under codon models:
(Jones C.T., Susko E., Bielawski J.P., 2019. Looking for Darwin in genomic sequences: validity and success depends on the relationship between model and data. In Evolutionary Genomics: Statistical and Computational Methods. Maria Anisimova (ed.) 2nd edition, Human press.)
Positive selection, purifying selection, shifting balance & fitness landscapes:
(Jones, C., Youssef, N., Susko, E. and Bielawski, J., 2017. Shifting balance on a static mutation-selection landscape: a novel scenario of positive selection. Molecular Biology and Evolution, 34(2):391-407.)
Improved inference of site-specific positive selection under a generalized parametric codon model when there are multi-nucleotide mutations and multiple nonsynonymous rates:
(Dunn KA, Kenney T, Gu H, Bielawski JP. Improved inference of site-specific positive selection under a generalized parametric codon model when there are multinucleotide mutations and multiple nonsynonymous rates. BMC Evol Biol. 2019 Jan 14;19(1):22.)
ModL: restoring regularity when testing for positive selection:
(Mingrone J, Susko E, Bielawski JP. ModL: exploring and restoring regularity when testing for positive selection. Bioinformatics. 2019 Aug 1;35(15):2545-2554.)
Smoothed Bootstrap Aggregation (SBA) for assessing and correcting parameter estimate uncertainty in codon models:
(Mingrone, J., Susko, E. and Bielawski, J., 2016. Smoothed bootstrap aggregation for assessing selection pressure at amino acid sites. Molecular Biology and Evolution, 33(11):2976-2989.)
Protocols, experimental design, and best practices for inference under complex codon models:
(Bielawski, J.P., Baker, J.L. and Mingrone, J., 2016. Inference of episodic changes in natural selection acting on protein coding sequences via CODEML. Current Protocols in Bioinformatics, pp.6-15.)
Alternative Lab (advanced topics)
If you have some experience with codon models, and want to try out a tutorial for more advanced materials then use the link below to download an archive for a complete different set of PAML activities. This tutorial focuses on detecting episodic protein evolution via Branch-Site Model A. The tutorial also includes activities about (i) detecting MLE instabilities, (ii) carrying out robustness analyses, and (iii) use of smoothed bootstrap aggregation (SBA). The protocols for each activity are presented in Protocols in Bioinformatics UNIT 6.15. The included PDF file for UNIT 6.16 also presents recommendations for “best practices” when carrying out a large-scale evolutionary survey for episodic adaptive evolution by using PAML. The files required for this “alternative lab” are available via Bitbucket repository. The repository link is given below.
Advanced PAML demo: Bitbucket repository
codeml_SBA: a program that implements Smoothed Bootstrap Aggregation (SBA) for assessing selection pressure at amino acid sites. https://github.com/Jehops/codeml_sba
DendroCypher: a program to assist labelling the branches of a Newick-formatted tree-file for use with a “branch model” or a “branch-site codon model”: Bitbucket repository
“Best practices” in large-scale evolutionary surveys
Large-scale evolutionary surveys are now commonplace. But with the use of progressively more complex codon models, these surveys are fraught with perils. Complex models are more prone to statistical problems such as MLE irregularities, and some can be quite sensitive to model misspecification. UNIT 6.16 (see above) provides some recommended “best practices” for a 2-phase approach to quality control and robustness in evolutionary surveys. We have applied these to a large scale survey for functional divergence in nuclear receptors during homing evolution, and we used experimental approaches to investigate hypotheses about the role of a particular nuclear receptor (NR2C1) as a key modulator of developmental pluripotnetiality during hominid evolution. The paper that illustrates the power of such an evolutionary surgery, and the importance of an experimental design having explicit protocols for “best practices”, is given below.
Example large-scale survey: PDF
Alternative software for codon models in the ML framework
HyPhy: comparative sequence analysis using stochastic evolutionary models; http://www.hyphy.org/
DataMonkey: a server that supports a variety of HYPHY tools at no cost; http://www.datamonkey.org/
COLD: a program that implements a general-purpose parametric (GPP) codon model. Most codon models are special cases of the GPP codon model. https://github.com/tjk23/COLD
codeml_SBA: a program that implements smoothed Bootstrap Aggregation for Assessing Selection Pressure at Amino Acid Sites.https://github.com/Jehops/codeml_sba
ModL: a program for restoring regularity when testing for positive selection using codon models https://github.com/jehops/codeml_modl