Data Scientist / Statistician
An experienced statistician and programmer with a particular talent for automating repeated analyses and code optimization. I’m passionate about data and uncovering the stories behind the numbers. I am a skilled programmer, in particular with R, which is my main statistical tool. I also have experience with Python, SQL, C/C++, SAS, and Julia. I’m willing to learn any tool that helps get the job done.
Master of Science, Statistics, May 2009
UNIVERSITY OF VIRGINIA, Charlottesville, VA
Bachelor of Science, Mechanical Engineering, August 1996
TEXAS A&M UNIVERSITY, College Station, TX
UNIVERSITY OF TEXAS SCHOOL OF PUBLIC HEALTH
Houston, TX 2010-Present
Performed statistical analysis, modeling, and bioinformatics in next-generation DNA sequence data, epidemiology, and clinical trials as an adjunct to various faculty research groups. Responsible for the design of statistically valid research studies, the application of advanced statistical methods for conducting analysis and preparing reports.
- Researched and presented potential solutions to our data storage and data structure needs with a focus on reducing the time of analysis and memory utilization. This resulted in more than 90% reduction of in memory usage while simultaneously reducing the computation time by 75%.
- Performed statistical analysis and modeling of bioinformatics in next-generation DNA sequence data using a variety of statistical techniques (linear regression, multiple regression, logistic regression, survival models, mixed effects models, principal component analysis and permutation testing, etc).
- Created a customizable base seqMeta pipeline for a cloud computing application hosted on DNAnexus.
- Maintainer of the R package seqMeta on The Comprehensive R Archive Network (CRAN). Moved the code base to a version control and issue tracking system (GIT), implemented an automated build system (TravisCI and AppVeyor) and regression testing platform.
- Created an integrated R and Python program to translate data between both the Variant Call Format (VCF) and the genotype matrix / SNP information format.
- Developed a quality control and automated reporting system for next-generation sequence data. This resulted in early detection of errors from the sequencing centers, and identification of outliers in the data.
- Collaborated with investigators in grant development including statistical design of studies, sample size requirements, power calculations, and statistical methodology. This included writing the bioinformatics and statistical methods portion of grant proposals.
- Mentored statistical analysts and students in the application of statistical methodologies, interpretation of results and programming techniques.
- Served as statistical and computational representative on the cross-functional teams in the strategic planning, development and execution of new methodologies and algorithms.
Association of Low-Frequency and Rare Coding-Sequence Variants with Blood Lipids and Coronary Heart Disease in 56,000 Whites and Blacks, The American Journal of Human Genetics (2014)
Strategies to design and analyze targeted sequencing data: The Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) targeted sequencing study. Circulation: Cardiovascular Genetics (2014) 7(3):335–343
Associations between lipoprotein(a) levels and cardiovascular outcomes in blacks and white subjects: the Atherosclerosis Risk in Communities (ARIC) Study. Circulation. 2012;125:241–249
UNIVERSITY OF VIRGINIA
Charlottesville, VA 2007-2009
- Introduction to Statistical Analysis (STAT 212) – Introduction to the probability and statistical theory underlying the estimation of parameters and testing of statistical hypotheses, including those arising in the context of simple and multiple regression models.
- Statistical Laboratory (Stat 398/598) – This undergraduate/graduate course is part of the applied statistics curriculum. Topics include statistical software usage in data analysis and general programming methodologies. Primarily focused on R, S-Plus, and SAS.
- Applied Linear Models (Stat 512) – Topics include linear regression models, inference in regression analysis, model validation, selection of independent variables, multicollinearity, influential observations, autocorrelation in time series data, polynomial regression, and nonlinear regression.
- Sample Surveys (Stat 518) – Discussion of the main design and estimation techniques used in sample surveys: simple random sampling, stratification, cluster sampling, double sampling, post-stratification, ratio estimation.
CAPITAL ONE FINANCIAL
Richmond, VA Summer Session 2008
- Modeled historical credit card transaction data for real time fraud detection system.
- Aggregated data from several Oracle databases into SAS for data cleaning, model fitting, model selection, and forecasting.
- Determined statistical methodology and performed analysis for detecting transaction fraud.
- Created automated reports of important historical factors which could be monitored by various business units.
- Wrote and modified SAS programs for the purpose of reporting, summarizing, analyzing, and validating financial data.
MEDICAL AUTOMATION SYSTEMS
Charlottesville, VA 2006-2008
Test Systems Software Engineer / Quality Analyst
- Architected and implemented a new software quality tracking system to ensure compliance with FDA standards.
- Developed standardized procedures and best practices for software testers to follow when generating test cases, test plans and test reports.
- Developed SQL queries to automate verifying medical software with the medical device specification.
- Created training material for new quality assurance system.
- Developed an Excel report, which accessed a SQL database to generate pre/post testing reports.
- Customized Mercury Software quality tracking system to ensure compliance with business processes and applicable government regulations.
UNIVERSITY OF VIRGINIA MEDICAL CENTER
Charlottesville, VA 2005-2006
- Administered Microsoft SQL Servers
- Installed new servers and maintained existing servers.
- Project lead on database password security audit.
- Assisted in implementing a new patient monitoring and tracking system.
Houston, TX 1999-2005
Systems Software Engineer
- Performed automated testing and manual verification of bug fixes on new and existing HP Proliant server ROMs.
- Designed, developed and maintained over 40 test utilities and test tools for HP Proliant system ROM (BIOS) testing.
- Developed and executed test plans, reported and tracked problems and coordinated problem resolution.
- Project lead on HP Proliant ML 330 G3, ML350 G4, and ML580 G3.
- Developed data protection and disaster recovery solutions with all Compaq StorageWorks equipment in a heterogeneous SAN environment.
- Researched and developed testing strategies and test plans for Enterprise Backup Solutions (EBS).
- Responsible for hiring, mentoring, assigning and reviewing work of interns.
- Trained “Must-Win” pre-sales engineering team and 3rd level support.
- Presented to over 1000 CIO’s and industry leaders at Compaq’s National Storage Days conference.
- Authored EBS with VERITAS Backup Exec Mixed Media Whitepaper, EBS with VERITAS Backup Exec User Guide, and EBS with VERITAS Backup Exec for Microsoft and Novell Clusters Tech Note
ABB VETCO GRAY
Houston, TX 1997-1999
Mechanical Design Engineer
- Designed, tested, analyzed, and documented risers, riser handling tools, and guideline winch skid.
- Supplied shop and assembly drawings of various pressure-containing sub-sea equipment on Pro/E and AutoCAD systems
- Supplied analysis and design documentation for multiple large-scale projects.
- Wrote design and test reports and purchase specifications.
STEWART & STEVENSON SERVICES
Houston, TX 1996-1997
Mechanical Product Engineer
- Designed risers, sub-sea blowout preventers, gate valves, actuators, and sub-sea Christmas trees.
- Supplied interface, bid, shop, and assembly drawings on a Pro/E system.
- Supplied technical analysis and design documentation for various certifying organizations.
- Statistical modeling
- linear regression
- survival analysis
- categorical data analysis
- multivariate analysis
- principal component analysis
- Monte Carlo simulation
- parallel computing
- high-performance computing (HPC)
- reproducible reasearch