Predictive Analyst/Machine Learning Expert
I do Probabilistic Modeling, multilevel models, time series,mixtures,Gaussian Processes, and Machine Learning
MSc Statistics,University of Nairobi(2015)
BSc Applied Statistics, Maseno University(2011)
Jan 2017 – Jan 2018, Predictive Modeler at M-KOPA Solar
=>Sentiment Analysis using Naive Bayes and Multinomial Mixtures: I was essentially involved in profiling and extracting features that may predispose a user to react a certain way to M-Kopa’s services. A dictionary of consumer’s reaction statements describing degree of satisfaction was used to learn and predict consumer’s attitudes towards M-Kopa products and services.
=>Implement Hidden Markov Model to learn the progression of consumers credit rating transitions given their payment history for costly M-Kopa energy products, and also the exposures of certain industries to market risks.
=>Trend analysis to account for temporally varying sources of price volatility. A structural time series model with kalman filter computations (to regularize between sources of variations and account for uncertainty) was implemented predict or forecast price movements under contemporaneous variables.
=>Time series forecasting for very large datasets using LSTM networks, implemented in python’s Keras Library with Tensorflow backend. Anomaly detection using appropriate data mining techniques to detect and flag out outliers in the inventory changes data-sets for futher investigation.
May 2015 – Jan 2017 Statistician at Kemri-Wellcome Trust Research Program
=>Features Selection & Model validations: Determine relevance of variables to a regression problem from a multidimensional datasets based on their BIC/AIC score, and also sparse learning models.
=>Predicting a continuous surface of malaria parasite rate and classifying into WHO-designated endemicity classes across/within administrative boundaries with stable transmission. Spatial heterogeneity of malaria prevalence was assessed using Gaussian Process Regression and classification into endemicity levels done in classification and regression trees.
=>Small Area Estimation of Malaria Indicators derived from country-specific MIS Datasets. Areal spatial dependency for these indicators were assessed as GMRF and parameters computed in both approaximate Bayes(INLA) and full Bayes(HMC in Stan).
=>Using causal inference techniques to assess effects of contemporaneous socio-economic interventions on malnutrition risk indicators.
=>I implemented a correction to measurement and sampling biases in parasite rate data as embeddings in our spatial regression models. Post-stratification was done using Horvitz-Thompson estimator, while, age-structure transmission inherent in malaria prevalence rates was adjusted by catalytic/4PL curve.
Mar 2014 – Jan 2015 Research Associate (Data Analyst) at Innovations for Poverty Action
=>Performing sample-weighting procedures. Testing survey designs for effective sample sizes, power analysis, post-stratification and calibration to get bias-corrected estimates of predicted vs. observed values .
=>Developed R routines to detect and flag out outliers, and to impute missing values.
=>Implemented variable selection and dimension-reduction routines to flag most relevant predictors for regression models, and to have metrics that can be scored as single variables. PCA and glm methods were used in this process.
=>Data Management Roles: Programming Survey questionnaires into ODK/SurveyCTO mobile data collection apps. Ensuring the validity and security of data keyed in the forms Writing data cleaning scripts in Stata Turn survey instruments into MySQL database schema Writing SQL data retrieval and collation queries and transferring the results into different analytical formats for analysis; Data wrangling.
=>Run regression models to assess effects of nutritional supplements on children immunity with regard to specific malnutrition disorders
Sep 2012 – Feb 2014 Assistant Statistician at World Agroforestry Center (ICRAF)
=>Data Management Tasks: Programming survey instruments into CsPRO and ODK data collection applications.
=>Write data cleaning routines in STATA and R.
=>Biodiversity Analysis: Computing tree diversity and richness indicators from agroforestry ecosystem data collected from farming Districts in African and South Asian Countries. Diversity indicators like Renyi, Shannon and Simpson indices were computed using the BiodiversityR pac age.
=>Multivariate analysis to reduce dimensionality of data. I did PCA and Cluster Analyses to this effect.
=>Multi-level regression analysis to assess cross-zonal variations in tree diversity and richness under the influence of socio-economic and environmental factors. Zone-level covariates and household level covariates were incorporated in such models.
=>Perform exploratory analysis to assess spatial distribution of landscape features in agro-ecological zones .Spatial correlation over continuously indexed was computed using variogram methods and kriegging(interpolation) done based on the estimates achieved.
- • Statistical Programming: R, Stan Probabilistic Programming, Python(numpy, panda, scikit-learn, Keras on Tensorflow)
- • Machine Learning(Sequence Models ,LSTM Networks, Regression & Classification Metrics, Spatial Models)
- • Statistical Models: State Space Models by Kalman Filtering, Linear/Non-linear Timeseries • Code Version Control in git
- • Programming Mobile Data Collection Systems in ODK
- • Linux,Slurm Server Management
- • Database development(MariaDB,MySQL) • Deep Learning/Neural Networks
- • ArcGIS
- • C++
- • Financial Risk Analysis