Data Scientist

Desired position type: Any
Location: Cambridge England, United Kingdom

Contact santosh26a


Currently, I am working as a researcher at Cambridge University in area of big data analytic. I have over 6 years of work experience in various industries in USA, India, France, and UK.


Master in Systems Engineering, University of Arizona

Master of Technology, Indian Institute of Technology Kharagpur

Bachelor of Engineering, Amravati University


Employer: Computer Laboratory, University of Cambridge, UK                                      Job Title: Researcher                                                                                                        Duration: May 2015 – Present                                                                          Responsibilities:  Participating in Digital Whistle Blower project which aims at improving trust in governments and eefficiency of public spending via Big Data analysis. Digital Whistle Blower project is 3 million project funded by European Union. I am responsible for data collection, cleaning and validation for public procurement data in 35 countries in Europe. Data validation for public procurement data is performed using semantic analysis in Python. Developing learning algorithm in Python for finding structured pattern of tender labels in tender documents. Conducting research review on public procurement data mining. Transformation of unstructured data to structured data


Employer: Arizona Department of Education, USA                                                            Job Title: Data Architect                                                                                                  Duration: May 2014 – Apr 2015                                                                                 Responsibilities: Implemented new state educational system data warehouse in SQL Server. Assisted application development teams by creating effective database designs from business requirements using best practices modeling techniques. Developed ETL rules for ETL developer for source to target data mapping. Developed source and target data documentation. Logical data modeling and data integration for post-secondary education data in Arizona.


Employer: CVS Caremark, USA                                                                                                 Job Title: Data Analytic Consultant                                                                                   Duration: Sep 2013 – Apr 2014                                                                                        Responsibilities: Synthesized large amounts of relational data to optimize complex member communications products to key audiences. Analyzed health care claim eligibility and performed medical claim analysis for a client company of CVS Caremark using SAS programming. Data extraction and transformation from DBMS such as Teradata and Oracle using SAS. Developed statistical algorithms such as ARIMA and regression models in SAS. Built structured SAS-MACRO and SAS-SQL code to synthesize large amounts of relational data. Performed timely and accurate data quality assurance to avoid in delay in business operations. Effectively delivered data of more than 150 drug claim analysis requests to the several CVS Caremark clients before the deadlines.


Employer: University of Arizona, USA                                                                                      Job Title: Researcher                                                                                                               Duration: Aug 2011 – May 2013                                                                                Responsibilities: Discrete event simulation (Arena) modeling for the analysis of aircraft design change propagation (Funded by The Boeing Company, USA). Agent-based simulation modeling for drivers route choice problem (Funded by Federal Department of Transportation, USA). Agent-based simulation modeling for toll pricing of hazardous material transportation in Albany, NY (Funded by NSF). Multi-criteria decision making for modeling dual-toll pricing policy for regulating hazardous material transportation (Funded by NSF).


Employer: INRIA, France                                                                                                          Job Title: Visiting Research Engineer                                                                         Duration: Apr 2011 – Jul 2011
Responsibilities: Developed computationally efficient machine learning and optimization techniques to solve computationally complex project scheduling problem for a large infrastructure projects. Developed clustering and genetic algorithm in Matlab that solved NP-hard problem near optimally within reasonable time frame. Volunteered to the international conferences organized by The French Institute for Research in Computer Science and Automation in Metz city of France.


Employer: Procter & Gamble, India                                                                                        Job Title: Operations Consultant                                                                                 Duration: Jun 2010 – Mar 2011
Responsibilities: Headed the project successfully for developing data-driven IT application for logistics strategy. Analyzed logistics data to devise the new logistics strategy particularly for loading and unloading of capacitated trucks and vehicle routing. Developed decision tree for forecasting the transportation budget and customer demands. Guided the team of several programmers that successfully developed and implemented optimization algorithm for business application. Extracted and transformed the data in SQL Server using T-SQL and Integration Services SSIS. Performed statistical analysis by developing several statistical modeling in SAS. Developed cost estimation algorithm in SAS to determine the logistics cost.
Led the group of 5 people including engineers and consultants for data analysis project. Leader of the team that developed graphical user interface (GUI) to resolve the multi-item multi-vehicle problem in the transportation division of Procter & Gamble.
Worked with senior managements and helped them to allocate the transportation budget for a fiscal year


Employer: IIT Kharagpur/NHAI, India                                                                                  Job Title: Analytic Consultant                                                                                         Duration: July 2008 – May 2010
Responsibilities: Directly reported to NHAI head and main project coordinator for trac analytic project. Successfully used several data analytic tools such as SQL, SAS and Matlab for analyzing traffic jams in urban cities. Extensively used SQL Server, MS Excel, MS Access and SAS for data modeling and, statistical modeling and analysis.
Successfully identified problems with the data, produced derived data sets, tables, listings and figures. Preliminary data integration and transformation using T-SQL queries and SSIS/SQL Server. ETL process for annual traffic data using SQL Server, SSIS/SQL Server, and T-SQL/SQL Server. Generated graphs and reports using SAS/GRAPH and SAS Proc Reports. Provided recommendations to the senior decision makers related to the traffic strategy


Employer: HDFC Bank, India                                                                                                  Job Title: Data Analyst                                                                                                   Duration: Aug 2007 – Jun 2008
Responsibilities: Participated in meetings and calls with the client for understanding the requirements both at functional and technical level. Developed SQL Server Integration Services ETL Packages to extract,cleanse,validate and consolidate data from SharePoint lists and external database back ends. Performed ETL daily/weekly/monthly loading from Multi-database/Flat files/Database into the SQL Server relational database. Developed, maintained and managed numerous daily reports to support management reporting requirements and to improve timely and accurate reporting. Developed ad-hoc reports and dashboard using Tableau. Performed complex ad-hoc SQL Queries when immediate access to data was required. Guided team members in their approach to solving many problems both on business and
technical sides. Generated ad-hoc reports based on client requirement.


  • Data Analytic (6 years of experience)
  • R Programming
  • SAS Programming
  • Python
  • Matlab
  • Statistics
  • Machine Learning