Data scientist / bioinformatician / statistician
|Career Status:||Actively looking|
|Willing to relocate:|
|Willingness to travel:||Not very willing to travel|
Computational biologist with over 10 years of experience. Comprehensive skills from wet lab experiments to large scale genomic data mining, in particular analyses of next generation sequencing data. Proficiency in R and Perl programming. Strength in multi-tasking and self-learning.
• Cornell University, Ithaca, NY
Ph.D. in Plant Cellular and Molecular Biology (minors in Statistics & Genetics)
• Peking University, Beijing, P. R. China
B.S. in Plant Molecular and Developmental Biology
2011-present Associate Research Scientist (promoted from Postdoctoral Associate in 2013) in Cell Biology, Yale University School of Medicine, New Haven, CT
• Discovered novel transcriptional and epigenetic features of nuclear lamina-associated genes in a number of mouse and human cell lines by cross-analyzing multiple large scale sequence datasets.
• Improved microarray-based DamID assay by integrating next generation sequencing and published a video method article.
• Supervised a laboratory technician and a graduate student in DamID-seq experiments.
2006-2009 Postdoctoral Associate in Plant Breeding and Genetics, Cornell University, Ithaca, NY
• Significantly reduced cost, time and man power in different scientific research and plant breeding projects by developing a large set of PCR-based universal markers from multiple Unigene databases and a fully sequenced genome.
• Took the primary responsibility for multiple comparative genetic mapping projects in the plant family Solanaceae (tomato, potato, eggplant, pepper and tobacco) and delivered peer-reviewed publications.
• Collaborated with or served as consultant for research groups and companies on applying universal markers.
- Programming: R (Rstudio, Bioconductor, ggplot2, shiny), Perl (BioPerl), SQL, MATLAB
- Computer & softwares: Linux (terminal commands and shell scripting, VMware, Emacs, Open Office), Windows (Microsoft Office, VMware, Notepad++, Adobe Illustrator, Adobe Photoshop, EndNote, FileZilla), MAC
- Next generation sequencing data analyses: R(Bioconductor), Bowtie/Bowtie2, SAMtools, BEDtools, TopHat, Cufflinks, SICER, qeseq, HOMER, ChromHMM, IGV, dREG
- Statistical analyses: Principal Component Analysis, K-means clustering
- Gene enrichment analyses: DAVID, WebGestalt (WEB-based GEne SeT AnaLysis Toolkit)
- Public databases: Gene Expression Omnibus (retrieve and submit data), UCSC genome browser (retrieve data and graphs, create custom tracks and process data with UCSC Genome Brower software), NIH Roadmap Epignomics Mapping project (retrieve data)
- Molecular biology: PCR, qPCR, gel electrophoresis, DNA purification, library preparation for next generation sequencing
- Multi-Tasking, Self-learning
- English (Fluent), Mandarin Chinese