Freelance (lead) data scientist
Freelance (lead) data scientist with extensive experience getting data products in production at large companies.
If your data science team struggles to get their products into production, I can help out.
At CZ I was part of the Architecture Board as a Data Science Architect, to ensure that data science was taken into account early in the software development life cycle.
Cloudera Developer training for Spark & Hadoop 2019
Cloudera administrator trainig for Apache Hadoop 2018
Next-Generation Sequencing PhD course 2011 (NBIC The Netherlands)
PhD Course Pattern Recognition 2009 (NBIC The Netherlands)
PhD Course Biostatistics 2008 (Eindhoven University of Technology)
BioMedical Engineering 2000‐2006 (Eindhoven University of Technology)
Freelancer @ 42Analytics (Loon op Zand, The Netherlands) November 2019 – now
As the most experienced data scientist in the team, I have helped other team members. This ranged from code reviewing, designing the structure
of the code to educating team members on the proper way of training models.
• Helped with building a data ingestion pipeline for agricultural IoT devices, and a shiny dashboard that displays the data. Helped them move
from a proof of concept phase to the minimal viable product phase, in order to have a commercially viable product at the end of 2020. Explained
the value of automated testing, especially with regards to a commercial launch at the end of 2020.
Senior Data Scientist @ CZ (Tilburg, The Netherlands) October 2017 – November 2019
As a Senior Data Scientist in the data science team of CZ, I increased the maturity of the team. I introduced the practice of having a Docker
container as a deliverable. Docker containers were new for CZ, so this also included convincing the IT department of the value of containers.
• To ensure models can be updated easily and with confidence that they work correctly, I also introduced continuous integration and automated
testing a concept that is new for many data scientists. Since the IT department was not creating Docker containers yet, this also meant that the
data science team writes API’s to interface with the models, this is yet another element that is new for most data scientists.
• I also worked to get a small (5 node) cluster for the data science team, this allows the team to tackle larger problems and to streamline the data
analysis process. This also included doing a lot of work which is typically handled by a data engineer, one example that I am proud of, is that
we used Sqoop to import data from the SAS data warehouse into HDFS parquet files, that required a wrapper around the SAS JDBC drivers and
a patch to Sqoop in order to make it work. We then automated the ingest process using Apache Airflow.
• To ensure data science was taken into account early in the process, I also became the Data Science Domain Architect in the architect board. In
this position I was consulted on how to make R/Python available to CZ.
Data Scientist @ Dr. Reddy’s (Leiden, The Netherlands) November 2016 – October 2017
As a Data Scientist for Dr. Reddy’s Leiden, I have improved their data analysis capacity, by implementing their workflow in R. In this capacity I
developed two R packages that implemented specific workflows and that could be used by R novices.
• Worked on a R/Shiny application to performs scale up calculations from small scale freeze dryers to manufacturing scale freeze dryers.
• Taught how to use R for data analysis to lab technicians, making them more efficient by using the aforementioned R packages.
Statistical Consultant @ Open Analytics (Antwerp, Belgium) September 2010 – November 2016
As a statistical consultant (data science consultant) I have worked for different clients, mostly in the pharmaceutical and biotechnology sector. In
this position my work ranged from analyzing different types of experiments (e.g. microarray, protein assays etc.), automating statistical analyses
using web applications I developed to managing a small group of developers to implement solutions for clients.
• An analysis consisted of both the statistical analysis and visualizations to allow for easy and correct interpretation of the results by the scientists.
The interpretation of the results was done in conjunction with the scientists. For these analyses I used R and Bioconductor extensively.
• For other clients I have developed custom R packages wrapping around C++ code, to ease integration of these functions in their analysis work
• Landed a large contract to implement a database to store different types of data. This database enables researchers to search for mutational
status, gene expression etc. across various different studies. To enable easier interaction with the PostgreSQL database, I have developed
Rango, an Object Relational Mapper (ORM) for R, which I have demonstrated at the UseR 2015 Conference.
• My responsibility grew over the years, in the end I managed the development of multiple different projects, I worked with the clients to define
solutions for their problems and also managed a small team of developers to implement our proposed solution.
Unfinished PhD on the subject of NEtwork Theory in Biology @ Eindhoven University of Technology Februari 2006 – September 2010
During my research I have developed algorithms to reverse engineer gene regulation networks from experimental data, in each experiment one
gene was turned off. For this research I also took the Pattern Recognition and Biostatistics courses. During the development many different
statistical features and machine learning techniques were applied and evaluated.
- R 14 years of experience
- Python 15 years of experience
- Machine learning 11 years of experience
- Docker 7 years of experience
- R package development
- Ubuntu (Linux) 15 years of experience
- Bringing data science teams to the next level
- Dutch, English (Fluent), German (Basic)