Main Page

From IPC_Public

Jump to: navigation, search


Contents

[edit] Welcome to the iPlant Collaborative's Community Wiki!

Here you can start discussions, make comments, and ask questions of the community, by adding new articles or topic on this page or by starting a discussion at the "discussion" tab above. --Rajorgensen 17:51, 5 March 2008 (MST)

Note: Login and password for this community wiki are currently independent of the iPlant web portal. You do not have to have an account to edit this Wiki, but in order to ensure that you're human and not a spambot, we will ask you to solve a simple sum or perform some other task for each edit you make. Registering for an account (which is very simple to do) allows you to bypass this requirement. If you're interested in making an account here, please click on the "Login/Create Account" link (top right corner) to create an account on our Public Wiki.

[edit] Documents for Public Discussion

There are no documents for public discussion at this time.

[edit] iPlant Collaborative Team Pages

[edit] Grand Challenges

[edit] Grand Challenge Workshops Recommended by iPC Board of Directors

What Grand Challenges are most compelling, tractable, and require computational approaches and cyberinfrastructure development?

Briefly summarize below a grand challenge that interests you, and then post the full description at Community Portal. Creating links here to your description at the community portal will help readers find your contributions there. For assistance in using the wiki, email us --Rajorgensen 10:29, 13 March 2008 (MDT)

Remember, the community decides what challenges the Collaborative should work on, not the project participants. So self-forming teams and leadership will need to emerge from the community, before any suggestions here will influence the Collaborative's cyberinfrastrucuture development activities. We will organize logistics and provide funding for community workshops on grand challenges, as prioritized by the external Board of Directors. See the website for details. --Rich J 08:15, 31 March 2008 (MDT)


[edit] PROPOSED GRAND CHALLENGE WORKSHOP FOR COMMUNITY COMMENT - Posted by David E Salt June 6, 2008

[edit] The organizing team will consider comment from the community via this WIKI page up until June 13th. At that time the proposal will be finalized for a June 16th submission. Please add your comments at the end of the document.

[edit] GRAND CHALLENGE  WORKSHOP: MECHANISTIC BASIS OF PLANT ADAPTATION

ORGANIZERS
Justin Borevitz(University of Chicago, Department of Ecology and Evolution, Tel 773-702-5948, Email borevitz@uchicago.edu). Research & Outreach: How do organisms shape their environment, and how does environmental change shape the resident organisms? The Borevitz laboratory is pursuing the adaptive genetic events enabling ecological succession, in annuals (Arabidopsis thaliana/lyrata) and perennials (Aquilegia, prairie grasses). We work in the Indiana Dunes National Lakeshore as a model ecosystem. Outreach includes teaching summer courses for high school bio-teachers and an undergraduate PrairieEcosystems.org course that works with community and school groups.
Edward Buckler(USDA-ARS Research Geneticist, Cornell University, Tel 607-255-4520, Email esb33@cornell.edu). Research & Outreach: Interested in the basis of complex traits with with an emphasis on dissecting abiotic stress tolerance in maize. Our research is involved in quantitative and statistical genetics, genomics, and field genetics. We also have projects ongoing in wild and domesticated grape and switchgrass. Outreach conducted through the development of informal science exhibits.
Susan McCouch (Cornell University, Department of Plant Breeding & Genetics, Tel 607-255-0420, Email srm4@cornell.edu). Research & Outreach: Interested in the evolution of population sub-structure in rice and the genetic basis of ecological adaptation, grain quality and agronomic characteristics as the basis for plant improvement. Research activities include molecular genetics and genomics, population development, molecular and whole-plant phenotyping, with extensive international collaboration. Outreach activities include a field-based short course on rice research and production in the Philippines, middle school curriculum development, high school biology research and teaching and undergraduate summer internships.
David E Salt(Purdue University, Horticulture Department, Tel 765-496-2112, Email dsalt@purdue.edu). Research & Outreach: Interested in the genetics basis controlling the ionome and its relationship to adaptation in Arabidopsis thaliana, Thlaspi and Astragalus species, and translation of this information into rice for improved rice grain mineral nutrient quality. Outreach conducted through the development of informal science exhibits.
John Willis (Duke University, Biology Department, Tel. 919-660-7340, email jwillis@duke.edu). Research & Outreach: Interested in the genomic basis of standing complex trait variation, adaptation, and speciation in Mimulus species. A current focus is on the evolution of traits related to drought and edaphic adaptation. As part of a large collaborative effort, we are developing genomic resources for Mimulus, including a whole genome sequence for M. guttatus, that are greatly aiding in the mechanistic analysis of adaptation and speciation.



Designated contact person for the group: David E Salt (Tel 765-496-2112, Email dsalt@purdue.edu).

 

GRANT CHALLENGE WORKSHOP DESCRIPTION

Statement of the scientific problem being addressed.
There is clear and broad agreement within the scientific community that one of the most pressing grand challenges in biology is to understand how plants adapt to their complex and often unpredictable biotic and abiotic environments. Genetic variation is the foundation for understanding plant adaptation, as genotypes may differ in their response to immediate environmental challenges due to differences in cellular physiology and development. Genetically distinct populations, subspecies, and/or species have adapted to unique or extreme environments via evolution by both natural and artificial selection. Establishing what combination of existing genome-wide diversity and/or new mutations underlie plant adaptation at a mechanistic, molecular level will provide breakthroughs in our understanding of plant physiology, ecology and evolution and will allow us to explore community and ecosystem responses to global climate change. Importantly, such fundamental discoveries will greatly advance our ability to conserve and subsequently to exploit genetic diversity to produce new crops and plant communities with greater resilience to emerging changes in pathogen, water, temperature, salinity and mineral nutrient stress, and higher level environmental impacts.

 

Why the problem requires cutting-edge computer science, bioinformatics, computational biology, statistical or modeling tools, rather than off the shelf solutions.
Understanding how genome-wide variation relates to plant adaptation will require major advances in bioinformatics, computational biology, statistical analysis and modeling. These tools will need to be able to handle the deluge of genomic sequence data from hundreds, thousands, or even millions of plants that is about to result from advances in next generation genome sequences, and to relate it to massively detailed phenomic (diverse morphological, developmental, cellular, physiological, metabolomic, transcriptomic, and proteomic phenotypes scored in native and experimental settings), geographical, climatologial, and geological databases constructed for those individuals and populations. Extensive genomic and phenomic data has already been collected for several mapping populations and accessions derived from natural populations in well established model species like Arabidopsis, and major crops like maize, rice, wheat and barley across a variety of environmental conditions. Data for these plant systems is increasing exponentially. In addition, there are several emerging plant systems, such as Mimulus, Aquilegia, Populus, and Arabidopsis relatives that will be especially useful for studying mechanisms of plant adaptation and evolution. 

 

Genomic and phenotypic data from these emerging ecological models, as well as that from less studied crops, are rapidly accumulating and need to be pipelined. The use of common quality control standards and data structures will facilitate comparisons among species and environments. Parallel experiments across multiple species make it possible to start to develop models about how populations and species evolve and change, and how communities function, and will rapidly move us away from the outmoded concept of model organisms.


Certainly some cyberinfrastructure tools have already been developed that can handle population genomic, phenomic, and environmental data. Others exist to statistically analyze the genetic architecture of complex traits across environments. However ,these tools are not yet integrated into publically available archives that make it possible to coordinate and compare the sheer volume and variety of the data anticipated in the near future.

 

A more fundamental problem to achieving the types of advances required is that most of the existing bioinformatics, computational, and statistical approaches for dealing with data were developed for specific applications for specific species, and therefore have limited use for other, more general situations. In terms of genomics, most analysis tools were developed with Sanger sequencing of a "reference" human genome in mind. We need next generation pipelines to deal with massive amounts of data from new sequencing technologies in plants with species level variation that is 20-50 times higher than in humans. We also need ways of representing genome diversity that are not anchored in the misleading concept of a single reference genome. The current algorithms and approaches need to be remodeled and openly packaged. In terms of phenomics, the plant community can collect data on millions of individuals and robust pipelines must be created to deal with large sample sizes, complex image processing, biochemical and metabolite profiles and other types of data. This processing needs to be integrated with geographic information systems (GIS) and be capable of tracking complex interactions between genotype, phenotype, and the environment over time.

 

Perhaps some of the most challenging problems lie in the arena of computational and statistical approaches to studying how genetic variation relates to plant adaptation. Traditional QTL mapping and association mapping methods can be improved beyond the analysis of simplified breeding designs to include realistic or empirical species’ population genetic structures, as well as interactions among multiple genes and alleles which underlie the genetic basis of complex traits. Implementation of improved models can be evaluated on known true positives across independent populations and experiments. It is becoming clear that within a species different sets of genes contribute to trait variation in different environments or in different genetic backgrounds. It is also becoming clear that many traits relevant to adaptation are themselves heterogeneous and complex, in the sense that a given trait, say yield response of a crop to drought stress, has a different underlying allelic, genetic, and thus physiological basis at different stages in the life of a plant and in different environments. And yet currently our statistical methods are designed to focus on a single genetic model that provides the “best” or maximum likelihood explanation of the data. In the future it would be highly desirable to adjust our statistical methods to look for multiple genetic models in the context of epistasis and environmental conditionality. Finally, given the diversity of species and environments under study in the plant community, model comparison methods are needed.

 

In addition to data analysis, management tools will be needed to allow for efficient sharing of seeds and samples across laboratories, along with communication tools to facilitate collaboration and outreach. Incorporation of advanced computer science concepts such as context-aware database searches, schema matching, crowd sourcing and automatic data trawling would also enhance the functionality of such an integrated system. Technologies are available that address individual components of such a system. Currently, there is no off-the-shelf system available that address all of the needs described above.

 

A general description of data sets currently available or that will be available to the community during the next 6-24 months.

Non crop plant full genome sequences are available for Arabidopsis thaliana (Col-0), Selaginella moellendorffii, Populus trichocarpa and Physcomitrella patens. Genome sequences will also shortly be available through the JGI community sequencing project for Arabidopsis lyrata, Capsella rubella, Mimulus guttatus, Aquilegia caerulea, Thellungiella halophila and Panicum virgatum (Switchgrass). There are also extensive genotyped collections of A. thaliana, including array-based whole-genome variation scans on 19 accessions (Clark et al., 2007) and scoring of 500 divergent accessions using an Affymetrix genotyping array containing 250,000 SNPs (http://naturalvariation.org/haplotype). Such genotyping will soon be extended to approximately 1,300 accessions (http://walnut.usc.edu/2010/SNPs). There are also large phenotypic datasets for A. thaliana including transcripts http://www.weigelworld.org/resources/microarray/AtGenExpress/), metabolites (), proteins (Chevalier et al., 2004), interactome () and the ionome which includes the shoot and seed concentration of P, Ca, K, Mg, Cu, Fe, Zn, Mn, Co, Ni, B, Se, S, Mo, Na, As and Cd in over 100 accessions and 4 RIL populations curated in a publically accessible relational database at www.purdue.edu/dp/ionomics (Baxter et al., 2007). RIL populations and accessions have also been analyzed for flowering time, fruit size, dormancy, seed sugar composition and storability, hypoctyl length, phytate content (http://www.dpw.wau.nl/natural/), other growth related traits (http://www.mpiz-koeln.mpg.de/english/research/koornneefGroup/index.html), glucosinolates (Kliebenstein et al., 2001) drought (Bouchabke et al., 2008), self incompatibility (Nasrallah et al., 2004), seed oil composition (O’Neill et al., 2003), photomorphogenesis (Pepper et al., 2002) and freezing tolerance (Hannah et al., 2006) Natural variation in methylation has also been surveyed in 96 accessions (Vaughn et al., 2007). For Mimulus species ………more?. For Aquilegia reference inbred lines, advanced recombinant lines, and hundreds of population samples are available. Two species BAC libraries are fingerprinted and end sequenced. 87,000 ESTs (revealing 16,000 SNPs) were sequenced to which microarrays have been designed and floral whorl specific genes identified.


Crop plant full genome sequence is currently available for rice and the reference sequence for maize will be complete by the end of 2008. In rice, a diverse panel of 20 landrace varieties of O. sativa was recently re-sequenced by Perlegen Sciences as part of the OryzaSNP project (McNally et al., 2006) and 8 additional lines are currently being sequenced using next-generation sequencing technology to provide the basis for SNP chip development. For association mapping, a panel of 1000 purified lines has been genotyped and phenotyped for 40 traits (flowering time, plant height, tiller number, tiller angle, seed color, seed shape, seed weight, grain quality, developmental traits, etc.) in replicated trials over two years (www.ricediversity.org). Additional phenotyping is underway to evaluate biotic and abiotic stresses, including drought, submergence, salt, heat, cold, acid soils, mineral deficiencies and toxicities, etc. A set of 2,000 recombinant inbred lines derived from 8 different crosses are under development at the International Rice Research Institute (IRRI) and an additional 1,500 lines are being purified for association mapping. All materials will be phenotyped for diverse traits and genotyped using Affymetrix arrays in collaboration with international partners.


In maize, numerous phenotypic and genotypic diversity projects are underway. 5000 diverse recombinant inbred lines (RILs) derived from 25 crosses have been genotyped and the lines were released to the public in the spring of 2008. They have already been evaluated in up to 10 environments for 30 traits by multiple groups (flowering, developmental traits, metabolite compositional, pathogen induced secondary metabolites, plant composition, grain quality, etc.) The phenotypic data will be published over the next years for these traits. Additionally, several groups are using next generation sequencing technologies to sequence key maize lines. With the publication of the maize genome, there will be a couple of projects publishing millions of SNPs across key germplasm, which can then be integrated with the vast RIL resources. Overall, there are hundreds of researchers looking at aspects of maize adaptation, and if given a place to store and share data numerous other datasets would become available and remarkable advances could be made.
 

[edit] COMMENTS

I'm a little late on this, but if I'm navigating the site correctly I haven't seen any other responses . . . I really like the Grand Challenge Workshop propoal suggestion by Salt et al.  I acknowledge that some of my interest is that I'm involved in relevant research on the genetics of resource allocation variation in Arabidopsis lyrata, and we are doing both empirical studies and modeling in this area.  However, the real strength of this proposal is that it would integrate resources from both important crop systems and non-crop model plants, which sometimes (though certainly not always) seem to operate in separate realms.  This integration would leverage a lot more information for understanding plant functions relevant for developing crops for a 21st century world with increasing food demands, changing climates, and reduced availability of enhancements such as irrigation and petrochemical fertilizers, and understanding the potential trade-offs involved.  Thus, it's an area of key importance for society in general.

My one suggestion is that the framework be flexible enough to accommodate information from "second-tier" crop and model systems as well.  Genetic research in several other grain crops (e.g. barley) is relatively poor in genomic resources, but seems to be strongly geared toward gaining insights on processes like drought resistance and tolerance.  Integrating these data and concepts with information from more resource-rich models may provide new ways to look at and analyze genotype-phenotype data from model systems to address additional phenomena, and point to where more research is needed.

Dave Remington (dlreming@uncg.edu)




--Vaughn 09:35, 9 March 2008 (MDT) Added a link to Simon Levin's PLoS Biology Editorial Fundamental Questions in Biology

This encompasses both plant biology and computing, but I put it here since I think the computing is the main part of the challenge: How do genomes, RNA, proteins and metabolites interact within living plants, and how can we model this in a useful and predictive way?--Mhudson 06:28, 9 March 2008 (MDT) Within this I can think of yet more specific questions which are still very broad: How do complex, often noisy regulatory networks of gene expression lead to reproducible developmental systems?--Mhudson 10:05, 7 March 2008 (MST) How are such systems perturbed in an adaptive way by biotic and abiotic stress?--Mhudson 10:05, 7 March 2008 (MST) How can such systems be engineered to optimize crop performance to produce food, feed and / or fuel in a changing global climate?--Mhudson 10:05, 7 March 2008 (MST)

Across species, can we develop a robust approach to predict ecotypic differences in phenology as expressed in natural (field) environments?--Jeff White 07:18, 11 March 2008 (MDT) see entry under community portal

Systematics is currently informed by phylogenies (lineages of trait changes, molecular and morphological), but may be bettered by inferences of lineages of living things (as taxa). How might this be accomplished? [[User:rzander}Richard Zander]] 09:57, 14 March 2008 (CDT)

Synthetic Biology Wiki is new to me; I apologize if I do this wrong. I suggest we consider SYNTHETIC BIOLOGY. Synthetic Biology is a broad approach that uses rational design and detailed modeling of biological systems for a specific purpose. It differs from genetic engineering in that it heavily relies on collection, application and modeling of quantitative data. By understanding and modeling the widely varying kinetics the underlies the expression of plant genes and plant systems we can begin to understand how genes function in networks to bring about phenotypes and produce synthetic networks for specific purpose (e.g., engineer metabolic pathways, understand the plastic nature of development).June Medford

How do ecological and evolutionary pressures establish diversity in metabolic networks in Viridiplantae? The rationale for posing this question is based on the fact that the chemical and metabolic capacity of Viridiplantae is hugely diverse. Because Viridiplantae is the primary vehicle by which energy and chemical diversity enters the biotic world from the abiotic, this diversity is the basis for the chemical complexity of the biosphere. Moreover, this chemical diversity is a major determinant of the interaction of plants with complex abiotic and biotic environments and yet is established within the constraint that life forms need to maintain chemical and physical order (thermodynamically decrease entropy), and replicate (increasing thermodynamic entropy of the ecosystem), while at a micro- spatial and temporal scale maintain homeostasis. This grand challenge question would generate computational approaches to query expanding sets of genomics and functional genomics datasets, in concert with information concerning ecological diversity of plant chemical phenotypes and their adaptive significance, to discover patterns that would provide predictive capabilities of where, when and how (across phylogenetic lineages and ecological conditions) metabolic networks are expressed and regulated within Viridiplantae. Basil J. Nikolau and John Nason

Recovering Approximate Gene Networks for Plant Phenology from Multi-Sourced Data. Plant modelers have spent over 40 years developing models that reproduce a wide variety of physiological processes. An established result is that accurate simulation of phenological events (e.g., leaf and floral initiation, grain fill, physiological maturity, etc.) is a prerequisite for success in predicting any other traits of interest. Therefore, in concert with other broad advances in understanding plants at the gene network level, it is highly desirable to extend knowledge of phenological control. The full understanding of any network requires identification of the genes that comprise it, their epistatic relationships, environmental interactions, and links to ultimate phenotypes. For the phenology of floral initiation, perhaps the most-studied genetic system in plants, the known network contains over 100 genes. However, simple, but highly predictive empirical models that relate floral timing to environmental variables date to at least 1735. Mathematical analyses and network models indicate that large amounts of floral timing variation can be accounted for by networks with small numbers of gene-like nodes that can be associated with actual genes and that capture key features of the real network. These constructs can be termed “approximate genetic networks”. Such models can now explain the predictive skill of classical phenology approaches to floral timing. Historically, the elucidation of the floral initiation network via molecular methods has been a very time consuming activity targeting discovery of the full network – something that may not be necessary for many practical applications. The aims of a grand challenge project could be (1) to conduct a modeling requirements analysis for one or more concrete tasks (breeding being one example) and (2) to develop computational methods to combine existing genetic (including molecular) and phenotype data (from controlled environments and field studies) to extract approximations meeting those requirements for gene networks controlling plant phenological development. Stephen M Welch

Facilitating the transition from qualitative to quantitative integrative plant models using bioNumbers – the database of useful biological numbers. It is currently very frustrating and time consuming when one tries to find concrete numbers in biology such as sizes of cells and organelles, absolute numbers of proteins, fluxes of metabolites or of elements in the biosphere, etc. BioNumbers (http://bionumbers.hms.harvard.edu/) is the database of useful biological numbers and aims to enable you to find in one minute any common biological number that can be important for your research. It contains full reference, comments and related numbers that are useful. It is an open, wiki-like community based effort. Although of general purpose it is currently mostly populated with microbial data. I think it will be worthwhile as part of the iPlant project to develop a similar effort focused at plants at the molecular, physiological and ecological levels that will help facilitate the vision of future iPlant models. It may also have educational relevance as a way to engage and train students in quantitative reasoning in biology. Will be great to get comments the concept and on how it can be a part of grand challenges in iPlant. [Ron Milo]


[edit] Community contributed Education Grand Challenge Wiki pages

Describe a particular grand challenge in education in computational thinking in biology that interests you. Discussions of your grand challenge will appear under community portal.

Grand Challenge vs. What do we do NOW? (M.Bruck)

There is slight conflict between waiting for GCs to be defined and the idea of "What can we do right NOW?" Many ideas were floated for immediate activity. Most of the ideas are based on existing educational ideas or websites. As an alternative to building new items it might be valuable to build a database driven website that allows community members to submit urls for BIO and CS educational websites and to categorize and review the websites for educational style, content and perceived or measured value. Basically a reviewed, curated catalog of BIO and CS Educational tools (interactive, info content, curricular, etc)


A similar website already exists. Merlot [1] is a peer-reviewed archive of education materials for all levels (K - graduate). As an example of what is available you might see Protein Explorer [2], free software for visualizing the 3-D structure of proteins, DNA and RNA (based on the Chime plugin [3]). This is great stuff, but does not address the fundamental question of the need for an Educational Grand Challenge. I believe that there is such a need, and will try and write more about it in the near future. For now, I only want to raise the possibility that formulating such a Grand Challenge might be something to consider. - Bruce Kirchoff [4]

[edit] Discussion of data, tools, perspectives, etc., 

Issues for Discussion relating to data, tools, perspectives, etc., necessary for making grand challenges tractable.

The community through iPC's Board of Directors may deem Grand Challenges tractable only if essential data and tools already exist, since the PSCIC program cannot fund data collection or long term tool/algorithm/model development. So the community may want to discuss here important data and/or tool projects separate from iPlant that could make grand challenges tractable and attractive enough to be chosen by the Board as high priority targets for cyberinfrastructure development.

Since every member has their own username/password, why not have a page for each registered member and make to like FACE BOOK. I see this kind of system on ASPB. Then each member can update his profile, and be in touch and follow the community very actively. Ratnakar Vallabhaneni.

[edit] Foundational Cyberinfrastructure

Foundational cyberinfrastructure to be tackled by iPlant in the first year that will support (m)any grand challenge and the community

At the iPlant kickoff suggestions for important generic infrastructure were asked for. This infrastructure could be fast-tracked for implementation by iPlant in the first year prior to formation of grand challenge teams. It should potentially facilitate many or all grand challenge projects and have immediate benefit to the community. Comments, ideas and proposals are welcomed, please add below. --Heikoschoof 05:30, 9 April 2008 (MDT)

  • Collaborative (social) tools
    • Social web portal for community cyberinfrastructure: MyToolSpace --Heikoschoof 05:44, 9 April 2008 (MDT)
  • SecurityModel
    • An effective shared SecurityModel will be required for sharing of information resources. The early adoption of specific security tools for Web sites (and services) will allow collaborators to prepare their own information server and client tools in advance of specific grand challenge activities. --GregRiccardi 9 April 2008.
Personal tools