Statistician and R programmer

Availability:	Immediate

I am a statistician and R programmer, and is interested in data science. I am looking for home-based statistician and R programer job

PhD. in Bioinformatics and Computational Biology and Ph.D. minor in Computer Science, Iowa State University (2008),Ames, IA

MS in Statistics, Iowa State University (2007),Ames, IA

Computer science coursework, Iowa State University (2002) Ames, IA

Working Experience

Statistician 11/2015-

Sylvester Comprehensive Cancer Center

Biostatistics and Bioinformatics Core Miami, FL

Projects I am working on

Develop an R package to constructing enrichment network by adjusting exons and/or splicing junctions number bias in gene set enrichment analysis using RNA-Seq data
Develop an R package for processing and analyzing 5UTR, 3UTR and downstream of gene(DoGs) sequencing data
Develop pipelines for processing RNA-Seq, Chip-Seq and ATAC-Seq data
Develop a pipeline for protein structure predictive models process, alignment, comparison and visualization
Develop a method for identifying regulatory elements of genes using Chip-Seq data
Develop a pipeline for identifying and annotating somatic mutation for whole exome sequencing data

Statistician 9/2013-11/2015

Cornell University Ithaca, NY

Projects I worked on

Participate actively in software development and database administration in Sol Genomics Network project(https://solgenomics.net)
Develop yambase(https://yambase.org) and zeabase database, and contribute to the development of cassavabase database(https://www.cassavabase.org)
Develop parsers for parsing raw phenotype, pedigree, Genotyping-By-Sequencing data from different breeding trials
Develop loaders for loading phenotype, pedigree, Genotyping-By-Sequencing data into database
Develop an Identity-By-Descent(IBD) based General Combining Ability (GCA) model for genomic prediction
Help on implementing a modified augmented design and integrate this design and other design of experiment into a database server
Supply instruction to biologists on using R statistical language and applying statistical methods to perform data analysis
Process RNA-Seq data and perform De Novo Transcriptome Assembly in Trinity

Statistician 3/2012-8/2012

Program of Biostatistics and Biomathematics

Fred Hutchinson Cancer Research Center Seattle, WA

Projects I worked on supplying statistics and informatics support for identifying biomarkers on cancer early detection

Combine machine learning methods with search algorithms for identifying biomarkers on cancer early detection
Apply a regularized multivariate regression method to study the relationship between protein expression profile and gene expression profile
Apply Procrustes analysis procedure to compare protein expression profile and gene expression profile
Combine principal component analysis with gene ontology and pathway information to identify the set of biomarkers related to cancer status.
Data management and quality control for the clinical data from different types of cancer researches

Statistician 12/2010- 7/2011

AVEO Pharmaceuticals, Inc. Cambridge, MA

Projects I worked on supplying statistics and bioinformatics support for identifying biomarkers in high dimensional data space in anti-cancer drug discovery research

Develop Bayesian statistical methods to identify biomarkers and gene regulatory network related to drug response in high-dimensional data space
Apply penalized Cox regression model to identify biomarkers in high-dimensional data space by integrating gene expression data and outcome of progression free survival from phase I and phase II clinical trials of two drug candidates
Apply and implement a Procrustes analysis procedure to compare gene expression profiles between different data sets
Implement the reference sample-based batch effect adjustment for microarray expression profile data
Develop a pipeline for bridging a public gene expression database with an in-house Postgres relational database
Supply statistics theory supports on biostatistics and bioinformatics methods related to data analysis in anti-cancer drug discovery

Statistician 2/2010-12/2010

Department of Biostatistics and Computational Biology

Dana-Farber Cancer Institute

Department of Biostatistics

Harvard School of Public Health Boston, MA

Projects I worked on developing statistical methods and informatics pipelines, constructing a database for SNP array and next-generation sequence data related to cancer research

Identify the rare genetic variants using a Bayesian regression approach
Identify the SNP markers for quantitative traits underlying WM blood cancer using a Bayesian regression method

Apply Bayesian regression method to perform eQTL mapping using RNA-seq data
Supply statistical and bioinformatics support for a family based deep sequencing project related to WM blood cancer
Implement the DFCI-Genolytics project(A database project that is based on the LAMP(Linux, Apache HTTP Server, MySQL, PHP)

Statistician 9/2008-12/2009

Center for Integrated Animal Genomics

and Department of Animal Science, Iowa State University Ames, IA

Projects I worked on applying and developing statistical methods for identifying SNP genetic markers, constructing predictive models and building a database for managing different types of phenotype data in animal genomic selection

Moobase: A relational database for managing data in genomic selection project of energy balance traits in dairy cattle
Rmoo: An R package for processing phenotype data in genomic selection project of energy balance traits in dairy cattle (in this package, I developed quite a few R functions for processing the raw data, and further performed imputation of missing values for several traits(Body weight, Body Condition Score, Feed in Take, the content of protein, fat, lactose in milk) using a natural cubic spline method on a two-year longitudinal study

Identify the markov boundary of SNP sets to investigate epistasis
Study SNP-SNP interaction using filter-wrapper approach
Use high density SNP markers to predict the genetic value of several traits by dimension reduction and machine learning methods, and compare the prediction results from these methods with one by Bayesian method

Statistician 6/2002-8/2008

Baker Center for Bioinformatics and

Biological Statistics, Iowa State University Ames, IA

Projects I worked on applying statistical methods and developing computational approaches for prediction, modeling and molecular dynamics of protein structures as well as building a web server for distance weighted elastic network model

Develop a web server for B-factor calculation using distance weighted elastic network model
Work on studying the effects of different superposition methods on the correspondence between the experimental conformational changes and the motions generated from elastic network model

Apply principal component shaving method for clustering protein structures
Develop a visualization tool for visualizing protein structure data
Develop a novel knowledge–based side chain orientation potential for protein fold recognition

Solid knowledge in multivariate statistics, machine learning methods, Bayesian statistics, Design of Experiment, Clinical statistics, Survival analysis and statistical methods in bioinformatics
More than 10 years of working experience in predictive modeling using logistic regression, naive Bayes classifier, tree-based methods, neural network, and support vector machine
Extensive experience to providing statistics and bioinformatics expertise for preparing presentations, manuscripts and grant proposals
Fluent in using Matlab, R/Bioconductor, SAS and S-PLUS
Experienced in the design of relational database (MYSQL, PostgreSQL)
Many years of experiences in using C++ language
Experienced in using VB, Java, FORTRAN (LAPACK, BLZPACK), OpenGL, Ajax, Jquery and JavaScript
Skilled in bash shell script, PERL, HTML, PHP(LAMP), Python, Conda and Git version control
Working in Windows, Linux/HPC and SGI
Many years of experience in applying and developing statistical methods for genomic prediction, genome wide association studies and biomarker discovery
Many years of experience in developing score function using knowledge-based potential for native-like protein structure discrimination
Familiarity with using BWA, STAR, SAMtools, Picard, GATK and Trinity to analyze Next Generation Sequence(NGS) data
Familiarity with using statistical genetics analysis and bioinformatics tools such as rrBLUP, GenSel, Beagle FastIBD, PLINK, Haploview, SeqPup, Phylip, VMD and ProFit

Jobs for R-users A job board for people and companies looking to hire R users

Statistician and R programmer

Summary

Education

Experience

Skills

Add Website

Contact aiminy

R-bloggers

Popular Jobs Today

Popular Jobs Overall

Job Categories