Groundbreaking research published by BIO5 scientists and their collaborators


PubMed Articles

Search form

Gene expression-based prostate cancer gene signatures of poor prognosis are hampered by lack of gene feature reproducibility and a lack of understandability of their function. Molecular pathway-level mechanisms are intrinsically more stable and more robust than an individual gene. The Functional Analysis of Individual Microarray Expression (FAIME) we developed allows distinctive sample-level pathway measurements with utility for correlation with continuous phenotypes (e.g. survival). Further, we and others have previously demonstrated that pathway-level classifiers can be as accurate as gene-level classifiers using curated genesets that may implicitly comprise ascertainment biases (e.g. KEGG, GO). Here, we hypothesized that transformation of individual prostate cancer patient gene expression to pathway-level mechanisms derived from automated high throughput analyses of genomic datasets may also permit personalized pathway analysis and improve prognosis of recurrent disease.

Via FAIME, three independent prostate gene expression arrays with both normal and tumor samples were transformed into two distinct types of molecular pathway mechanisms: (i) the curated Gene Ontology (GO) and (ii) dynamic expression activity networks of cancer (Cancer Modules). FAIME-derived mechanisms for tumorigenesis were then identified and compared. Curated GO and computationally generated "Cancer Module" mechanisms overlap significantly and are enriched for known oncogenic deregulations and highlight potential areas of investigation. We further show in two independent datasets that these pathway-level tumorigenesis mechanisms can identify men who are more likely to develop recurrent prostate cancer (log-rank_p = 0.019).

Curation-free biomodules classification derived from congruent gene expression activation breaks from the paradigm of recapitulating the known curated pathway mechanism universe.

Despite thousands of reported studies unveiling gene-level signatures for complex diseases, few of these techniques work at the single-sample level with explicit underpinning of biological mechanisms. This presents both a critical dilemma in the field of personalized medicine as well as a plethora of opportunities for analysis of RNA-seq data. In this study, we hypothesize that the "Functional Analysis of Individual Microarray Expression" (FAIME) method we developed could be smoothly extended to RNA-seq data and unveil intrinsic underlying mechanism signatures across different scales of biological data for the same complex disease. Using publicly available RNA-seq data for gastric cancer, we confirmed the effectiveness of this method (i) to translate each sample transcriptome to pathway-scale scores, (ii) to predict deregulated pathways in gastric cancer against gold standards (FDR<5%, Precision=75%, Recall =92%), and (iii) to predict phenotypes in an independent dataset and expression platform (RNA-seq vs microarrays, Fisher Exact Test p<10(-6)). Measuring at a single-sample level, FAIME could differentiate cancer samples from normal ones; furthermore, it achieved comparative performance in identifying differentially expressed pathways as compared to state-of-the-art cross-sample methods. These results motivate future work on mechanism-level biomarker discovery predictive of diagnoses, treatment, and therapy.

While genome-wide association studies (GWAS) of complex traits have revealed thousands of reproducible genetic associations to date, these loci collectively confer very little of the heritability of their respective diseases and, in general, have contributed little to our understanding the underlying disease biology. Physical protein interactions have been utilized to increase our understanding of human Mendelian disease loci but have yet to be fully exploited for complex traits.

We hypothesized that protein interaction modeling of GWAS findings could highlight important disease-associated loci and unveil the role of their network topology in the genetic architecture of diseases with complex inheritance.

Network modeling of proteins associated with the intragenic single nucleotide polymorphisms of the National Human Genome Research Institute catalog of complex trait GWAS revealed that complex trait associated loci are more likely to be hub and bottleneck genes in available, albeit incomplete, networks (OR=1.59, Fisher's exact test p < 2.24 × 10(-12)). Network modeling also prioritized novel type 2 diabetes (T2D) genetic variations from the Finland-USA Investigation of Non-Insulin-Dependent Diabetes Mellitus Genetics and the Wellcome Trust GWAS data, and demonstrated the enrichment of hubs and bottlenecks in prioritized T2D GWAS genes. The potential biological relevance of the T2D hub and bottleneck genes was revealed by their increased number of first degree protein interactions with known T2D genes according to several independent sources (p<0.01, probability of being first interactors of known T2D genes).

Virtually all common diseases are complex human traits, and thus the topological centrality in protein networks of complex trait genes has implications in genetics, personal genomics, and therapy.

The strict tropism of many pathogens for man hampers the development of animal models that recapitulate important microbe-host interactions. We developed a rhesus macaque model for studying Neisseria-host interactions using Neisseria species indigenous to the animal. We report that Neisseria are common inhabitants of the rhesus macaque. Neisseria isolated from the rhesus macaque recolonize animals after laboratory passage, persist in the animals for at least 72 d, and are transmitted between animals. Neisseria are naturally competent and acquire genetic markers from each other in vivo, in the absence of selection, within 44 d after colonization. Neisseria macacae encodes orthologs of known or presumed virulence factors of human-adapted Neisseria, as well as current or candidate vaccine antigens. We conclude that the rhesus macaque model will allow studies of the molecular mechanisms of Neisseria colonization, transmission, persistence, and horizontal gene transfer. The model can potentially be developed further for preclinical testing of vaccine candidates.

The management of epilepsy in children is particularly challenging when seizures are resistant to antiepileptic medications, or undergo many changes in seizure type over time, or have comorbid cognitive, behavioral, or motor deficits. Despite efforts to classify such epilepsies based on clinical and electroencephalographic criteria, many children never receive a definitive etiologic diagnosis. Whole exome sequencing (WES) is proving to be a highly effective method for identifying de novo variants that cause neurologic disorders, especially those associated with abnormal brain development. Herein we explore the utility of WES for identifying candidate causal de novo variants in a cohort of children with heterogeneous sporadic epilepsies without etiologic diagnoses.

We performed WES (mean coverage approximately 40×) on 10 trios comprised of unaffected parents and a child with sporadic epilepsy characterized by difficult-to-control seizures and some combination of developmental delay, epileptic encephalopathy, autistic features, cognitive impairment, or motor deficits. Sequence processing and variant calling were performed using standard bioinformatics tools. A custom filtering system was used to prioritize de novo variants of possible functional significance for validation by Sanger sequencing.

In 9 of 10 probands, we identified one or more de novo variants predicted to alter protein function, for a total of 15. Four probands had de novo mutations in genes previously shown to harbor heterozygous mutations in patients with severe, early onset epilepsies (two in SCN1A, and one each in CDKL5 and EEF1A2). In three children, the de novo variants were in genes with functional roles that are plausibly relevant to epilepsy (KCNH5, CLCN4, and ARHGEF15). The variant in KCNH5 alters one of the highly conserved arginine residues of the voltage sensor of the encoded voltage-gated potassium channel. In vitro analyses using cell-based assays revealed that the CLCN4 mutation greatly impaired ion transport by the ClC-4 2Cl(-) /H(+) -exchanger and that the mutation in ARHGEF15 reduced GEF exchange activity of the gene product, Ephexin5, by about 50%. Of interest, these seven probands all presented with seizures within the first 6 months of life, and six of these have intractable seizures.

The finding that 7 of 10 children carried de novo mutations in genes of known or plausible clinical significance to neuronal excitability suggests that WES will be of use for the molecular genetic diagnosis of sporadic epilepsies in children, especially when seizures are of early onset and difficult to control.

The actin-bundling protein fascin is a key mediator of tumor invasion and metastasis and its activity drives filopodia formation, cell-shape changes and cell migration. Small-molecule inhibitors of fascin block tumor metastasis in animal models. Conversely, fascin deficiency might underlie the pathogenesis of some developmental brain disorders. To identify fascin-pathway modulators we devised a cell-based assay for fascin function and used it in a bidirectional drug screen. The screen utilized cultured fascin-deficient mutant Drosophila neurons, whose neurite arbors manifest the 'filagree' phenotype. Taking a repurposing approach, we screened a library of 1040 known compounds, many of them FDA-approved drugs, for filagree modifiers. Based on scaffold distribution, molecular-fingerprint similarities, and chemical-space distribution, this library has high structural diversity, supporting its utility as a screening tool. We identified 34 fascin-pathway blockers (with potential anti-metastasis activity) and 48 fascin-pathway enhancers (with potential cognitive-enhancer activity). The structural diversity of the active compounds suggests multiple molecular targets. Comparisons of active and inactive compounds provided preliminary structure-activity relationship information. The screen also revealed diverse neurotoxic effects of other drugs, notably the 'beads-on-a-string' defect, which is induced solely by statins. Statin-induced neurotoxicity is enhanced by fascin deficiency. In summary, we provide evidence that primary neuron culture using a genetic model organism can be valuable for early-stage drug discovery and developmental neurotoxicity testing. Furthermore, we propose that, given an appropriate assay for target-pathway function, bidirectional screening for brain-development disorders and invasive cancers represents an efficient, multipurpose strategy for drug discovery.

Despite remarkable advances in basic biomedical science that have led to improved patient care, there is a wide and persistent gap in the abilities of researchers and clinicians to understand and appreciate each other. In this Editorial, the authors, a scientist and a clinician, discuss the rift between practitioners of laboratory research and clinical medicine. Using their first-hand experience and numerous interviews throughout the United States, they explore the causes of this 'cultural divide'. Members of both professions use advanced problem-solving skills and typically embark on their career paths with a deeply felt sense of purpose. Nonetheless, differences in classroom education, professional training environments, reward mechanisms and sources of drive contribute to obstacles that inhibit communication, mutual respect and productive collaboration. More than a sociological curiosity, the cultural divide is a significant barrier to the bench-to-bedside goals of translational medicine. Understanding its roots is the first step towards bridging the gap.

An important statistical objective in environmental risk analysis is estimation of minimum exposure levels, called benchmark doses (BMDs), that induce a pre-specified benchmark response in a dose-response experiment. In such settings, representations of the risk are traditionally based on a parametric dose-response model. It is a well-known concern, however, that if the chosen parametric form is misspecified, inaccurate and possibly unsafe low-dose inferences can result. We apply a nonparametric approach for calculating benchmark doses, based on an isotonic regression method for dose-response estimation with quantal-response data (Bhattacharya and Kong, 2007). We determine the large-sample properties of the estimator, develop bootstrap-based confidence limits on the BMDs, and explore the confidence limits' small-sample properties via a short simulation study. An example from cancer risk assessment illustrates the calculations.

Benchmark analysis is a widely used tool in biomedical and environmental risk assessment. Therein, estimation of minimum exposure levels, called benchmark doses (BMDs), that induce a prespecified benchmark response (BMR) is well understood for the case of an adverse response to a single stimulus. For cases where two agents are studied in tandem, however, the benchmark approach is far less developed. This paper demonstrates how the benchmark modeling paradigm can be expanded from the single-agent setting to joint-action, two-agent studies. Focus is on continuous response outcomes. Extending the single-exposure setting, representations of risk are based on a joint-action dose-response model involving both agents. Based on such a model, the concept of a benchmark profile-a two-dimensional analog of the single-dose BMD at which both agents achieve the specified BMR-is defined for use in quantitative risk characterization and assessment.

We study the popular benchmark dose (BMD) approach for estimation of low exposure levels in toxicological risk assessment, focusing on dose-response experiments with quantal data. In such settings, representations of the risk are traditionally based on a specified, parametric, dose-response model. It is a well-known concern, however, that uncertainty can exist in specification and selection of the model. If the chosen parametric form is in fact misspecified, this can lead to inaccurate, and possibly unsafe, lowdose inferences. We study the effects of model selection and possible misspecification on the BMD, on its corresponding lower confidence limit (BMDL), and on the associated extra risks achieved at these values, via large-scale Monte Carlo simulation. It is seen that an uncomfortably high percentage of instances can occur where the true extra risk at the BMDL under a misspecified or incorrectly selected model can surpass the target BMR, exposing potential dangers of traditional strategies for model selection when calculating BMDs and BMDLs.

Estimation of benchmark doses (BMDs) in quantitative risk assessment traditionally is based upon parametric dose-response modeling. It is a well-known concern, however, that if the chosen parametric model is uncertain and/or misspecified, inaccurate and possibly unsafe low-dose inferences can result. We describe a nonparametric approach for estimating BMDs with quantal-response data based on an isotonic regression method, and also study use of corresponding, nonparametric, bootstrap-based confidence limits for the BMD. We explore the confidence limits' small-sample properties via a simulation study, and illustrate the calculations with an example from cancer risk assessment. It is seen that this nonparametric approach can provide a useful alternative for BMD estimation when faced with the problem of parametric model uncertainty.

Benchmark analysis is a widely used tool in public health risk analysis. Therein, estimation of minimum exposure levels, called Benchmark Doses (BMDs), that induce a prespecified Benchmark Response (BMR) is well understood for the case of an adverse response to a single stimulus. For cases where two agents are studied in tandem, however, the benchmark approach is far less developed. This article demonstrates how the benchmark modeling paradigm can be expanded from the single-dose setting to joint-action, two-agent studies. Focus is on response outcomes expressed as proportions. Extending the single-exposure setting, representations of risk are based on a joint-action dose-response model involving both agents. Based on such a model, the concept of a benchmark profile (BMP) - a two-dimensional analog of the single-dose BMD at which both agents achieve the specified BMR - is defined for use in quantitative risk characterization and assessment. The resulting, joint, low-dose guidelines can improve public health planning and risk regulation when dealing with low-level exposures to combinations of hazardous agents.

Propiconazole (PPZ) is a conazole fungicide that is not mutagenic, clastogenic, or DNA damaging in standard in vitro and in vivo genetic toxicity tests for gene mutations, chromosome aberrations, DNA damage, and cell transformation. However, it was demonstrated to be a male mouse liver carcinogen when administered in food for 24 months only at a concentration of 2,500 ppm that exceeded the maximum tolerated dose based on increased mortality, decreased body weight gain, and the presence of liver necrosis. PPZ was subsequently tested for mutagenicity in the Big Blue® transgenic mouse assay at the 2,500 ppm dose, and the result was reported as positive by Ross et al. ([2009]: Mutagenesis 24:149-152). Subsets of the mutants from the control and PPZ-exposed groups were sequenced to determine the mutation spectra and a multivariate clustering analysis method purportedly substantiated the increase in mutant frequency with PPZ (Ross and Leavitt. [2010]: Mutagenesis 25:231-234). However, as reported here, the results of the analysis of the mutation spectra using a conventional method indicated no treatment-related differences in the spectra. In this article, we re-examine the Big Blue® mouse findings with PPZ and conclude that the compound does not act as a mutagen in vivo.

The combination of information from diverse sources is a common task encountered in computational statistics. A popular label for analyses involving the combination of results from independent studies is meta-analysis. The goal of the methodology is to bring together results of different studies, re-analyze the disparate results within the context of their common endpoints, synthesize where possible into a single summary endpoint, increase the sensitivity of the analysis to detect the presence of adverse effects, and provide a quantitative analysis of the phenomenon of interest based on the combined data. This entry discusses some basic methods in meta-analytic calculations, and includes commentary on how to combine or average results from multiple models applied to the same set of data.

A simultaneous confidence band provides useful information on the plausible range of the unknown regression model, and different confidence bands can often be constructed for the same regression model. For a simple regression line, it is proposed in Liu and Hayter (2007) to use the area of the confidence set that corresponds to a confidence band as an optimality criterion in comparison of confidence bands; the smaller is the area of the confidence set, the better is the corresponding confidence band. This minimum area confidence set (MACS) criterion can clearly be generalized to the minimum volume confidence set (MVCS) criterion in study of confidence bands for a multiple linear regression model. In this paper the hyperbolic and constant width confidence bands for a multiple linear regression model over a particular ellipsoidal region of the predictor variables are compared under the MVCS criterion. It is observed that whether one band is better than the other depends on the magnitude of one particular angle that determines the size of the predictor variable region. When the angle and so the size of the predictor variable region is small, the constant width band is better than the hyperbolic band but only marginally. When the angle and so the size of the predictor variable region is large the hyperbolic band can be substantially better than the constant width band.

A primary objective in quantitative risk assessment is the characterization of risk which is defined to be the likelihood of an adverse effect caused by an environmental toxin or chemcial agent. In modern risk-benchmark analysis, attention centers on the "benchmark dose" at which a fixed benchmark level of risk is achieved, with a lower confidence limits on this dose being of primary interest. In practice, a range of benchmark risks may be under study, so that the individual lower confidence limits on benchmark dose must be corrected for simultaneity in order to maintain a specified overall level of confidence. For the case of quantal data, simultaneous methods have been constructed that appeal to the large sample normality of parameter estimates. The suitability of these methods for use with small sample sizes will be considered. A new bootstrap technique is proposed as an alternative to the large sample methodology. This technique is evaluated via a simulation study and examples from environmental toxicology.

In modern environmental risk analysis, inferences are often desired on those low dose levels at which a fixed benchmark risk is achieved. In this paper, we study the use of confidence limits on parameters from a simple one-stage model of risk historically popular in benchmark analysis with quantal data. Based on these confidence bounds, we present methods for deriving upper confidence limits on extra risk and lower bounds on the benchmark dose. The methods are seen to extend automatically to the case where simultaneous inferences are desired at multiple doses. Monte Carlo evaluations explore characteristics of the parameter estimates and the confidence limits under this setting.

We study use of a Scheffé-style simultaneous confidence band as applied to low-dose risk estimation with quantal response data. We consider two formulations for the dose-response risk function, an Abbott-adjusted Weibull model and an Abbott-adjusted log-logistic model. Using the simultaneous construction, we derive methods for estimating upper confidence limits on predicted extra risk and, by inverting the upper bands on risk, lower bounds on the benchmark dose, or BMD, at which a specific level of 'benchmark risk' is attained. Monte Carlo evaluations explore the operating characteristics of the simultaneous limits.

The Social Vulnerability Index (SoVI), created by Cutter et al. (2003), examined the spatial patterns of social vulnerability to natural hazards at the county level in the United States in order to describe and understand the social burdens of risk. The purpose of this article is to examine the sensitivity of quantitative features underlying the SoVI approach to changes in its construction, the scale at which it is applied, the set of variables used, and to various geographic contexts. First, the SoVI was calculated for multiple aggregation levels in the State of South Carolina and with a subset of the original variables to determine the impact of scalar and variable changes on index construction. Second, to test the sensitivity of the algorithm to changes in construction, and to determine if that sensitivity was constant in various geographic contexts, census data were collected at a submetropolitan level for three study sites: Charleston, SC; Los Angeles, CA; and New Orleans, LA. Fifty-four unique variations of the SoVI were calculated for each study area and evaluated using factorial analysis. These results were then compared across study areas to evaluate the impact of changing geographic context. While decreases in the scale of aggregation were found to result in decreases in the variance explained by principal components analysis (PCA), and in increases in the variance of the resulting index values, the subjective interpretations yielded from the SoVI remained fairly stable. The algorithm's sensitivity to certain changes in index construction differed somewhat among the study areas. Understanding the impacts of changes in index construction and scale are crucial in increasing user confidence in metrics designed to represent the extremely complex phenomenon of social vulnerability.

Issues surrounding the wide spectrum of (perceived) risks and possible benefits associated with the rapid advance of modern nanotechnology are deliberated. These include the current realities of nanotechnological hazards, their impact vis-à-vis perceived nanotech-risks and perceived nanotech-benefits, and the consequent repercussions on the public and society. It is argued that both the risks and the benefits of nanoscientific advances must be properly communicated if the public is to support this emerging technology.