Shashwat Khanna,印度德里的开发商
Shashwat is available for hire
Hire Shashwat

Shashwat Khanna

Verified Expert  in Engineering

数据科学家和软件开发人员

Location
Delhi, India
Toptal Member Since
May 19, 2022

Shashwat是一位经验丰富的专业人士,在核心数据科学领域拥有近十年的工作经验. He has rich experience designing, developing, 并为整个银行业的客户部署机器学习模型, financial services, insurance, retail, eCommerce, and healthcare sectors. Shashwat目前负责Shopify最近推出的产品的端到端产品分析.

Portfolio

Freelancer
Python, Spark, OpenAI GPT-3 API, OpenAI GPT-4 API...
Shopify
SQL, Python 3, PySpark,维度建模,Kimball方法论,数据科学...
Clara Analytics
预测分析、自然语言处理(NLP)...

Experience

Availability

Part-time

Preferred Environment

亚马逊网络服务(AWS), Spark, Windows, Ubuntu, R, Python 3

The most amazing...

...我开发的是一个大型广告关键词收入预测模型,它帮助聚合器实现了20%左右的利润提升.

Work Experience

Data Scientist (Freelance)

2022 - PRESENT
Freelancer
  • 参与多个跨部门的ML和BI项目. 使用ML模型、高级llm、NLP嵌入等创建多个复杂模型. 并行处理多个客户端.
  • Created several POCs. 有付费和开源llm的经验. 目前正在探索将llm用于不同的用例.
  • 成功地管理和改进了大规模部署模型. 兼任数据工程师和数据科学家.
  • 设计并实现了一个大型管道,用于处理多年的移动数据(每天约500 GB),并使用地理空间情报方法得出推论.
Technologies: Python, Spark, OpenAI GPT-3 API, OpenAI GPT-4 API, Natural Language Processing (NLP), Google AdWords, Web Marketing, 生成预训练变压器3 (GPT-3), Regular Expressions, Language Models, Version Control, Git, OpenAI, Custom Models, Front-end, Data Scraping, Recommendation Systems, Web Scraping, ChatGPT, Architecture, Integration, SQL, Team Mentoring, Supervised Learning, Deep Learning, Unsupervised Learning, Prompt Engineering

Senior Data Scientist

2021 - PRESENT
Shopify
  • 处理新产品的产品分析,并负责数据事件仪器, data models, 内部用户的分析仪表板, and user-facing analytics.
  • 结合使用PySpark, DBT/SQL和数据可视化工具. 与产品经理等多学科团队密切合作, UI/UX experts, developers, 和高层领导一起制定数据路线图.
  • 监督产品的测试和GTM发布, 为内部利益相关者提供关于产品使用和采用的关键见解,并推动产品路线图和优先级.
  • 为新推出的Shopify计划定义实验、kpi和护栏指标.
Technologies: SQL, Python 3, PySpark,维度建模,Kimball方法论,数据科学, Data Reporting, ETL Tools, Google BigQuery, A/B Testing, Product Analytics, Key Performance Metrics, Dashboards, Streamlit, Machine Learning, Key Performance Indicators (KPIs), Exploratory Data Analysis, Python, Pandas, Data Analysis, Data Visualization, Reports, Data Analytics, Data Mining, Data Modeling, Google Analytics, REST APIs, Big Data, Scikit-learn, Google Sheets, Office 365, APIs, API Integration, Project Management, eCommerce, Analytics, Business Analysis, Marketplaces, Statistics, Predictive Learning, Google Cloud Platform (GCP), MySQL, Product Development, ETL, Data Pipelines, Data Cleaning, Large Data Sets, Artificial Intelligence (AI), Data Engineering, BigQuery, Business Intelligence (BI), Regular Expressions, Version Control, Git, Data Scraping, Team Mentoring, Supervised Learning, Unsupervised Learning

Senior Data Scientist

2019 - 2021
Clara Analytics
  • 领导数据科学家和工程师团队开发产品,重点关注使用基于规则的深度学习和神经网络- rnn的NLP, LSTMs, and autoencoders using Keras.
  • 使用Spark和Spark NLP管理组织级NLP堆栈的架构和开发.
  • 在为客户创造价值量化方法方面发挥了关键作用.
  • 创建了一个文档处理管道,提取关键信息以帮助保险理算员分析病史.
Technologies: 预测分析、自然语言处理(NLP), 生成预训练变压器(GPT), GPT, LSTM Networks, RStudio Shiny, Spark, R, Python 3, Data Science, Data Reporting, Key Performance Indicators (KPIs), PySpark, Amazon Web Services (AWS), Spark ML, Machine Learning, Key Performance Metrics, Product Analytics, ETL Tools, Flask-RESTful, Logistic Regression, Linear Regression, Exploratory Data Analysis, Forecasting, Python, Pandas, RStudio, Data Analysis, Data Visualization, Reports, Data Analytics, Data Mining, REST APIs, Big Data, Scikit-learn, Google Sheets, Office 365, APIs, API Integration, Project Management, Data Extraction, Analytics, Business Analysis, Statistics, Predictive Learning, Data Pipelines, Data Cleaning, OCR, Artificial Intelligence (AI), Data Engineering, Business Intelligence (BI), Regular Expressions, Language Models, Version Control, Git, Data Scraping, Architecture, Integration, SQL, Team Mentoring, Supervised Learning, Deep Learning, Unsupervised Learning

Senior Data Scientist

2013 - 2019
64 Squares Private Limited
  • 领导并完成了20多个行业客户的ML和分析任务,如银行, insurance, retail, and eCommerce and geographies, including the US, UK, and Australia.
  • 与从传统统计模型(如线性回归和逻辑)到高级预测模型(如随机森林和梯度增强)的技术密切合作.
  • 获得了使用RESTful api生产ML经验的丰富经验, batch processes, and database integration.
Technologies: Forecasting, Exploratory Data Analysis, Predictive Analytics, Chatbots, Linear Regression, Logistic Regression, Random Forests, Gradient Boosting, XGBoost, R, Python 3, Data Science, Machine Learning, Data Reporting, Flask-RESTful, Predictive Modeling, Python, Pandas, RStudio, Data Analysis, Data Visualization, Reports, Data Analytics, Data Mining, Data Modeling, REST APIs, Big Data, Scikit-learn, Google Sheets, Office 365, APIs, API Integration, Time Series Analysis, Project Management, Data Extraction, eCommerce, Analytics, Business Analysis, Statistics, Predictive Learning, Docker, MySQL, ETL, Data Pipelines, Data Cleaning, Large Data Sets, Artificial Intelligence (AI), Data Engineering, BigQuery, Business Intelligence (BI), Google Data Studio, Regular Expressions, nbdev, Version Control, Git, Data Scraping, Recommendation Systems, Architecture, Integration, SQL, Team Mentoring, Supervised Learning, Deep Learning, Unsupervised Learning

Senior Consultant

2012 - 2012
Deloitte
  • 在战略和运营部门工作,重点关注医疗保健和生命科学领域.
  • 致力于发展机会/扩张, business plans, impact evaluations, 为各种客户进行可行性研究.
  • 与包括州和中央政府在内的各种客户合作, hospital chains, large business conglomerates, bilateral funding, and donor agencies.
Technologies: Consulting, Microsoft PowerPoint, Financial Modeling, Strategy, Public Health, Public Policy, Office 365

大规模关键词收入预测模型

我研究了一个大规模的实时关键词收入预测模型,作为客户的Google AdWords竞价系统的关键输入. 我负责整个系统的设计, development, and implementation, 这个项目持续了大约一年.

Towards the end of the project, 与现有的模型相比,客户能够实现20%的提升.

大型医疗记录处理引擎

我设计并开发了一个大型引擎,用于处理和提取医疗记录中的信息和见解. 这些信息被提交给保险理算员,这减少了他们50%的审查时间. 该引擎是使用Spark NLP开发的.

实体零售店的通用预测模型

我创建了一个通用模型,用于预测实体零售店销售的产品的销售情况. 该模型是在两家商店的数据上构建和测试的,但它具有足够的可扩展性和通用性,可以在跨越各种零售连锁店的数千家商店上进行训练.

Chatbot for a Large B2B Aggregator

我创建了一个买卖消息助手和聊天机器人,为下面的消息提供建议, 从而将平台的用户粘性提高了10%左右. 日志含义RESTful API延迟小于100毫秒.

Languages

SQL, Python, Python 3, R

Libraries/APIs

Pandas, REST APIs, Scikit-learn, XGBoost, Spark ML, Google AdWords, PySpark, Flask-RESTful, TensorFlow, Keras

Tools

Git, Google Sheets, BigQuery, Microsoft PowerPoint, Google Analytics

Paradigms

Data Science, Key Performance Metrics, Business Intelligence (BI), ETL, Dimensional Modeling, Kimball Methodology, Distributed Computing

Platforms

RStudio, Jupyter Notebook, Amazon Web Services (AWS), Linux, Ubuntu, Windows, Google Cloud Platform (GCP), Docker

Industry Expertise

Project Management

Storage

MySQL, Data Pipelines

Other

Natural Language Processing (NLP), Predictive Analytics, Exploratory Data Analysis, Forecasting, Data Reporting, Google BigQuery, Product Analytics, Dashboards, Key Performance Indicators (KPIs), Machine Learning, Unstructured Data Analysis, Predictive Modeling, eCommerce, Analytics, Data Analysis, Data Visualization, Reports, Data Analytics, Data Mining, Office 365, APIs, API Integration, Data Extraction, Business Analysis, Statistics, Predictive Learning, Classification, Regression, Data Cleaning, Artificial Intelligence (AI), OpenAI GPT-3 API, OpenAI GPT-4 API, Language Models, nbdev, Version Control, ChatGPT, Team Mentoring, Supervised Learning, Unsupervised Learning, Econometrics, Finance, Gradient Boosting, Random Forests, Chatbots, ETL Tools, A/B Testing, Streamlit, Data Modeling, Big Data, Time Series Analysis, Deep Learning, Product Development, GPT, 生成预训练变压器(GPT), Large Data Sets, 生成预训练变压器3 (GPT-3), Regular Expressions, OpenAI, Data Scraping, Architecture, Integration, Prompt Engineering, Time Series, LSTM Networks, Logistic Regression, Linear Regression, Feature Engineering, Clustering, Pipelines, OCR, Monitoring, Consulting, Financial Modeling, Strategy, Public Health, Public Policy, Marketplaces, Neural Networks, Modeling, Data Engineering, Google Data Studio, Web Marketing, Custom Models, Front-end, Recommendation Systems, Web Scraping

Frameworks

Spark, RStudio Shiny, Flask

2009 - 2011

Master's Degree in Economics

英迪拉甘地发展研究所-孟买,印度

2004 - 2007

Bachelor's Degree in Physics

University of Delhi - Delhi, India