Data Scientist

Resume posted by Him4u324 in Other.
Desired salary: $1,850,000.00
Desired position type: Any
Location: Bengaluru Karnataka, India

Contact Him4u324


Statistical Models building with visualization.


Business Analytics & Intelligence IIM Bangalore 2016
Bachelor of Science in Mathematics (Honors) University of Delhi 2007
10+2 (Mathematics & Science) UP Board 2004


3 years experience in Data Science ( TCS & Tech Mahindra)
7 years experience in Reporting and Compliance ( Genpact & TCS)

Oct 2016 – Present                                                              TECH MAHINDRA                                                 Data Scientist

  • Supply Chain Incident Ticket Analysis – using an interactive rMarkdown dashboard
  • NLP Tool on R-shiny – an interactive visualization of free text of a ticket management data, topic modelling (lda), word network graph, periodic word cloud comparison, knowledge search (self-help using document similarity)
  • R replication in SQL (FMCG) – created workflow of a complex R script with multiple nested loops for it to be understood by SQL data modeler and replicate the logic of R in SQL for Out of Stock flag
  • Knowledge search – Recommendation Engine (Electronics):
    • Developed the knowledge search engine to that automatically provides the top 3 solutions for a query searched by a user
    • Similarity algorithm and latent analysis algorithms are used to develop the solution
  • Data Envelopment analysis (DEA) – Optimization model (IT)
  • Implemented data envelopment analysis to measure productive efficiency of decision making units (DMU’s)
  • DEA was used to compare various teams and provide the optimal value to attain efficiency. The model was used for a leading IT company

Jul 2011 – Sep 2016                                                             TATA Consultancy Services                                                 Data Scientist

  • Service ticket classification – (R, SVM, Neural Network) (IT)
    • A leading BI developer wanted to categorize the service tickets logged in system based on the problem/request descriptions so that it can be prioritized accordingly
    • Applied Machine Learning techniques such as SVM and Neural Network in R programming environment to classify the tickets accurately 80-84% of the time
    • This would help the organization to improve the service level and customer satisfaction.
  • Lead contact profiling (Marketing Analytics) – R, SQL, NLP (IT)
    • The Sales & Marketing team of a leading software & application firm, wanted to validate the leads information collected from various channels (Social media campaign, surveys etc.) before pitching their product. The objective of the project was to validate the existing information of leads with the details available on social media platform for Professionals, profile the leads, suggest suitable product for marketing and decision-making authority of the leads based on pre-defined business rules.
    • Used R’s text processing capability and Levenshtein Distance algorithm to match the leads’ information (Organization and Title) with more reliable secondary source of information (such as LinkedIn). 52% of total leads were validated successfully.
    • This exercise helped the client to filter out the invalid leads which saved time and efforts of S&M team.
    • Improved product recommendation and deal conversion ratio.
  • Promotions effectiveness – R, NLP (FMCG, Retail)
    • A multinational beverage company’ objective was to measure the performance of their promotional offers. But the data was recorded at 2 levels – Distributor (campaign details) and Sales (sales figures). The data at these levels were not normalized and no common key to merge the files
    • Project objective was to remove this bottleneck and merge the files in order to measure promotion effectiveness; the solution was to approximate the match using Levenshtein Distance algorithm in R
    • Fields used to merge 2 files – Client name, location, postal codes, and segments.
    • 50% match rate achieved which was 10% more than clients’ expectation.
  • Kaggle Competition – R, Random Forest (Telecom)
    • Goal – To predict the severity of service disruptions on Telstra network. Using a dataset of features from their service logs, predicting if a disruption is a momentary glitch or a total interruption of connectivity.
    • Process –Random Forest technique applied to classify the severity disruption. Achieved accuracy was 79%
    • Scored 6.13 against the cap of 25.8 (sample submission benchmark – Multi log-loss score)
  • HR Analytics – R, Shiny Dashboard, Logistic Regression
    • High employee attrition has been one of the major concern for organizations. It would be good idea to find out the top 4-5 reasons responsible for attrition and be able to predict the chances of one to attrite in near future.
    • Built a demo model to predict the likelihood of attrition using logistic regression and identify the significant reasons.
    • Attached this model to interactive R Shiny dashboard which can be used by anyone to come up with attrition likelihood of any associate by feeding in his/her details.
  • Load forecasting – R, Time Series (Power, Utility)
    • Load and Price forecasting for a US Power major with double seasonality
    • Objective is to minimize the gap between demand and power generation; maximize the revenue; remain price competitive
  • BWSSB (IIMB Analytics course project) (WIP) – R, Shiny, Markov, Visualization
    • Working with Bangalore Water Supply and Sewerage Board to reduce the Unaccounted-for Water (UFW) and identify the likely sources.
    • Pattern of consumption on Liter Per Capita per Day (LPCD). Assessing the connections falling below the limits of control charts (acceptable variation in usage, possible case of default)
    • Using existing default record to predict the defaulters.
    • Using Bangalore wards’ population (as per 2011 census), water supply information and wards’ polygon data created a heat map to visualize demand and supply gap if any.
  • Employee Satisfaction Survey (Internal HR) – Python, NLP
    • Objective of the project was to perform an analysis on Employee Satisfaction Survey for the best and least performing units considered by the employees.
    • Frequency analysis on bag of words using Python’s NLTK package
  • Internet Service Provider (EDA) – Excel
    • Performed Exploratory data analysis for a multinational ISP company to provide insights on the following using analytical tools like SAS and excel
    • The objective was to provide insights on the usage of data in different locations to recommend a suitable data plan for increased customer satisfaction
    • Cluster analysis on top user hotspots
    • Insights on cluster-level usage, user-level (most frequent users’ location-wise and vice-versa)

Jul 2009 – Apr 2011                                             TATA Consultancy Services                                 Business Financial Analyst (PMO)

Managed all the financial and administrative aspects of Business with the scope of 450 associates for one of the largest insurance account

  • Project Start Up: All vital information e.g. Master Service Agreement, Statement Of Work, Total Project Value, Billing type, resource requirements etc. to be discussed with the legal team and accordingly coordinate with Clients and TCS stakeholders to finalize the documents.
  • Monitoring and controlling:
    • Identify major & ambiguous cost heads and thereby devise controls
    • Analyze effort vs revenue and present it to stakeholders with probable causes in case of any anomaly
  • Project Resource Management:
    • Set up an entry-exit process to adhere to company’s SLA and avoid revenue leakage
    • Resource utilization and effort reporting
  • Project Demand and Supply:
    • Monitoring resource requirement with real time headcount database and raising hiring indent after discussing it with project managers. Coordinating with recruitment team for hiring and timeline
    • Coordinate with internal and external vendors for any logistics requirement, Change requests for all system changes and procurement requests for assets requirement
  • Project Finance Management:
    • Ensure monthly on time billing
    • Responsible for revenue projection for units by liaising with onshore and offshore stakeholders
    • Analysis of revenue/cost and identify the gaps

Oct 2007 – Jul 2009                                                                             GENPACT                                                               Compliance Analyst

Worked as Compliance Analyst for USA’s one of leading Capital Markets and Treasury. Was responsible for:

  • Detecting Fraud in trade orders and deals
  • Reporting violations of industry norms in electronic communications
  • Anti-Money Laundering: As a part of Global Compliance Team, providing the KYC information to its businesses to ensure organization is not dealing with defaulters/criminals
  • MIS reporting and Ad hoc analysis:
    • Helping clients with their project needs, analyzing the process thoroughly along with clients to identify bottlenecks and improve quality by filing/suggesting new ideas of improvement
    • Preparation, publishing of Dash-Boards for Internal Management review
    • Re-structuring process flow, establishing & updating FMEA, BCP for business


  • R (3 years)
  • R -Shiny ( 2.5 years)
  • R-markdown (6 months)
  • Python ( 3 months)


    Data visualization

Spoken Languages

    Engligh, Hindi