Ayoub Abraich
Data Scientist | AI Research Engineer
Passionate Data Scientist and Python Expert with a Master's in Data Science and a Bachelor's in Applied Mathematics. Over 4 years of experience in data analysis, machine learning, deep learning, and causal inference, specializing in healthcare and finance. Proficient in Python (Pytorch/TensorFlow) and adept at code reviews. Eager to contribute expertise to impactful projects and open to collaborative opportunities for continuous learning.
Get in TouchAbout Me
I am a highly accomplished Data Scientist and AI Research Engineer with a proven track record of delivering impactful solutions in Natural Language Processing (NLP) and Deep Learning. Holding a Master's degree in Data Science and with over four years of experience, I specialize in causal inference, domain adaptation, and cutting-edge research. My proficiency in Python, PyTorch, and TensorFlow, combined with a relentless drive, allows me to contribute to pioneering projects and push the boundaries of AI.
Professional Experience
AI Research Scientist & PhD Candidate @ LaMME, Evry, France
04/2020 - 2023
- Lead creation and deployment of advanced deep learning models for causal effects, utilizing PyTorch and TensorFlow. Conducted pioneering research in causal inference, enhancing predictive accuracy in complex data landscapes. Implemented domain adaptation strategies, optimizing model performance across diverse domains.
- Played a key role in cross-functional collaboration, designing experiments, analyzing outcomes, and providing actionable insights. Innovated methodologies for integrating causality into deep learning models, advancing industrial applications.
- Published research in prestigious journals and preprints, presented findings at conferences, contributing to the scientific community. Stayed current with deep learning, causality, and domain adaptation, applying knowledge for impactful results. Mentored junior team, fostering continuous learning and knowledge sharing. Proficient in FastAPI, Flask, with expertise in end-to-end solution deployment.
Data Scientist Freelancer @ Malt | Upwork, Paris, France
09/2019 - Present
- Data Analysis and Visualization: Applied advanced statistical techniques to distill meaningful insights from intricate datasets. Translated findings into compelling visualizations for actionable outcomes.
- Machine Learning and Predictive Modeling: Designed and implemented machine learning models, leveraging classification, regression, and clustering algorithms to optimize decision-making processes and solve business challenges.
- Python Programming: Proficiently utilized Python for data manipulation, analysis, and model implementation. Demonstrated expertise in libraries such as Pandas, NumPy, Scikit-learn, PyTorch, and TensorFlow.
- End-to-End Solution Development: Successfully implemented end-to-end solutions using frameworks like Flask, ensuring seamless integration and deployment of data science applications.
- Client Collaboration: Effectively engaged with clients, comprehending unique requirements, and delivering tailored solutions that align with specific business objectives.
- Project Management: Successfully oversaw end-to-end project lifecycles, from scoping and planning to execution and delivery. Ensured timely and high-quality outcomes.
Research Assistant @ CMAP Ecole Polytechnique, Palaiseau
04/2019 - 07/2019
- Collaborated with Professor Eric Moulines on pioneering research for "Visually Grounded Question Answering" (VGQA), implementing cutting-edge deep reinforcement learning algorithms. Focused on enhancing dialogue generation by synergizing visual information and natural language understanding, achieving coherent and contextually relevant responses to visually grounded queries.
- Contributed to academic discourse through valuable insights and methodologies, pushing the boundaries of knowledge at the intersection of deep learning, reinforcement learning, and natural language processing.
Education
PhD in Data Science: Deep Learning Applications for Causal Treatment Effect Estimation in Longitudinal Context
Paris-Saclay University • 09/2020 - 2023
- Research Area: Causal Inference, Survival Analysis, Representation Balancing
- Relevant Publications:
- "Theoretical Guarantees for Representation Balancing in Survival and Classification Causal Inference with Multiple Treatment Lines" (Preprint, November 2023)
- "SurvCaus: Representation Balancing for Survival Causal Inference" (Preprint, March 2022, 10+ citations as of April 2024)
- Scholarships and Funding:
- PhD Bourse (FMJH and EDMH): €57,000 net over 3 years (2020-2023)
- FMJH (Fondation Jack Hadamard) Excellence Bourse: €14,000 (2018-2020)
- OCP Bourse Excellence: €6,000 (2016)
Master's degree (M1- M2) in Data Science - Finance
Paris-Saclay University • GPA: Major • 09/2018 - 10/2020
- Completed rigorous AI coursework covering Reinforcement Learning, Deep Learning, and Computational Statistics.
- Applied advanced techniques in GANs for practical data applications.
- Demonstrated expertise in Scientific Python, R, and GPU computing for AI-driven solutions.
Bachelor's degree in Applied Mathematics (3 years)
Université d'Evry-Val d'Essonne • GPA: Major • 08/2015- 09/2018
- Completed a 3-year Bachelor's program in Applied Mathematics, including 2 years of CPGE (Preparatory class for high schools in Mathematics and Physics).
- Acquired a strong foundation in mathematical principles, advanced probability, statistics, and programming skills (C, Python, R).
- Specialized in advanced mathematics during academic pursuits.
Projects
SCRIBDOWN: A SCRIBD PDF Downloader
Developed an efficient PDF downloader for SCRIBD using Python and FastAPI.
View ProjectMoroccoHub API
An API for accessing data and resources related to Morocco.
API Docs (in development): API DOCS
Example: Discours et Activités Royales
View APIPYCAUS
Python package for causal survival analysis and counterfactual classaification predection with PyTorch, built on the torchtuples package for training PyTorch models.
View ProjectPublications
SurvCaus : Representation Balancing for Survival Causal Inference
Individual Treatment Effects (ITE) estimation methods have risen in popularity in the last years. Most of the time, individual effects are better presented as Conditional Average Treatment Effects (CATE). Recently, representation balancing techniques have gained considerable momentum in causal inference from observational data, still limited to continuous (and binary) outcomes. However, in numerous pathologies, the outcome of interest is a (possibly censored) survival time. Our paper proposes theoretical guarantees for a representation balancing framework applied to counterfactual inference in a survival setting using a neural network capable of predicting the factual and counterfactual survival functions (and then the CATE), in the presence of censorship, at the individual level. We also present extensive experiments on synthetic and semisynthetic datasets that show that the proposed extensions outperform baseline methods.
View PaperRepresentation Balancing with Theoretical Guarantees for Survival and Classification Causal Inference with Multiple Treatments
SIn recent years, there has been a surge of interest in individual treatment effects (ITE) estimation methods, with Conditional Average Treatment Effects (CATE) being a popular way to express these effects. Although representation balancing techniques for causal inference from observational data have gained attention, they are limited to continuous (and binary) outcomes, despite survival times being a desirable objective in many pathologies. Our research paper offers theoretical guarantees for a representation balancing framework that can perform counterfactual inference in survival or classification contexts, for binary or multiple treatments, by utilizing a neural network capable of predicting both factual and counterfactual survival (respectively, probability of each class) functions and subsequently the CATE, even in the presence of censoring. Our experimental results demonstrate that the proposed extensions outperform baseline approaches on synthetic and semisynthetic datasets. This research extends and generalizes our previous work \cite{Abraich2022SurvCausR}.]
View PaperSkills
Programming Languages
- C++
- HTML
- JavaScript
- Python
- R
- Java
- Flutter
- HTML, CSS
- SQL
Tools and Frameworks
- AWS Sagemaker
- Azure
- Django
- Docker
- FastAPI
- Flask
- Heroku
- Linux
- PySpark
- PyTorch
- Streamlit
- TensorFlow
- Git
Advanced Modeling and MLOps
- Automating MLOps for Deep Learning
- Designing Experiments
- Implementing Advanced Statistical Modeling
Knowledge Areas
- Algorithmic Trading
- Causal Inference
- Computer Vision
- Deep Learning
- Generative Adversarial Networks (GANs)
- Machine Learning
- Natural Language Processing (NLP)
- Optimization
- Reinforcement Learning
- Statistics
- Time Series Analysis
- LLM's and Prompt Engineering
Data-related Skills
- BeautifulSoup (BSoup)
- Data Scraping: Scrapy, Playwright
- Postman
- Burp Suite - API Reverse Engineering
- PySpark
- REST APIs
- Scrapy
Business Problem Solving
- Data Mining
- Solving Business Problems through ML
- Statistical Algorithms
Large Data Sets and Distributed Computing
- Distributed Computing
- Optimization
- Proficient in Working with Large Data Sets
- Simulation Languages
Languages
- Proficient in English
- French (Fluent)
- Arabic (Native)