Senior Data Scientist

Nov 20, 2023



The Computational Biology Cluster is part of the Precision Medicine & Computation Biology (PMCB) global research function at Sanofi. We are looking for a Senior Scientist (genAI and LLMs for Precision Medicine research) with a passion for building software/data products for pharmaceutical, life science or healthcare verticals. The post holder will be part of the Data Science & Artificial Intelligence Lab in the Computational Biology cluster and helps to index, integrate, and infer new biomedical insights from massive-scale biomedical big data.

The Data Science lab is an innovation-driven team that uses the full spectrum of machine learning methods to address growing needs in precision medicine research. Within the lab, the successful candidate will specialize in methods related to generative AI (genAI) and large language models (LLMs) to accelerate drug target discovery, development, and repositioning.  

Sanofi Research Dataset is poised to be one of the largest human disease datasets in the pharmaceutical industry. The successful candidate will have access to the data and collaborate with a multi-disciplinary group of talented scientists and will lead the development and implementation of state-of-the-art genAI, LLM and machine learning methods, focusing on training, and fine-tuning large models on text, images, as well as custom experimental data. The candidate will work in an exciting, interdisciplinary environment, overlapping the different stages of the discovery pipeline, and interact with multiple internal and external organizations. A close interaction and synergy with all PMCB clusters and various therapeutic area functions will be expected.


  • Use AI / ML to impact precision medicine and drug discovery research.

  • Develop, implement, and apply state-of-the-art ML-based methods to analyze large and/or complex collections of datasets.

  • Communicate clearly results and methodologies to multidisciplinary and international project teams.

  • Document and follow good coding practices.

  • Execute work plans on time, update, and report relevant results to project teams and stakeholders.

  • Maintain close collaborations with other data scientists as well as with scientists from a different background.

  • Constantly monitor literature to maintain in-depth knowledge of the most recent developments in data science, bioinformatics, and cutting-edge AI/ML/DL algorithms as well as the latest applications in the field of drug discovery.

  • Actively engage in evaluation and coordination of both academic and startup collaborations.


Education and Professional Experience

  • A PhD degree in Artificial Intelligence, Data Science, Computational Biology, Computer Science, Machine Learning or Bioinformatics.

  • 0-3 years of post-PhD industry or academic experience with a strong track record of publications, accomplishments, and project experience in applications of generative AI and large language models.

Soft skills

  • Excellent attention to details, problem solving and dedication to address complex problems in biomedicine using an AI-first mindset.

  • Strong written, oral, and interpersonal communication skills.

  • Strong aptitude to work within multidisciplinary team environment.

  • Strong project management skills including organization, time management, prioritization and follow-up are key.

Technical skills

  • Experience with building and fine-tuning foundation models trained on text, image, genomic, clinical, healthcare and/or other data types.

  • Experience and demonstration of skills in a core machine learning area: computer vision, natural language processing, multi-modality learning

  • Experience in access, customization and internalization of models using model zoo’s including: TensorFlow Hub, PyTorch Hub, Hugging Face Transformers Hub, Model Zoo by Apache MXNet, Caffe Model Zoo, ONNX Model Zoo, ModelDepot, Fastai Model Zoo, TorchVision Models, Facebook AI Research (FAIR) Models

  • Experience with large Language Models, particularly GPT (Generative Pre-trained Transformer) variants, XLNet, Bloom, BERT, LaMDA, Falcon, Llama

  • Experience with advanced NLP techniques, software packages and algorithm development

  • Experience in AI/ML Ops and Data Ops for peta-bytes of data and database optimization

  • Knowledge of large language models, graph learning, deep learning, and generative AI algorithms.

  • Proficiency in Python and/or R

  • Experience with some of the leading AI/ML frameworks including TensorFlow, PyTorch, opencv, openslide, scikit-learn, scikit-image, scikit-LLM, langchain, OpenAI, Hugging Face, llm, lamini etc.

  • Experience with various database technologies including SQL, NoSQL, graph database and vector database.

  • Familiarity with good coding practices (documentation, version control) and modern environments (cloud, high performance computing).

  • Experience with a pharma / biotech environment or with translational research problems is a plus.

  • Experience in deploying models into production environments by leverage cloud (AWS, Azure or GCP), local or hybrid computing environment.


  • Fluency in English (spoken and written)

Sanofi Inc. and its U.S. affiliates are Equal Opportunity and Affirmative Action employers committed to a culturally diverse workforce. All qualified applicants will receive consideration for employment without regard to race; color; creed; religion; national origin; age; ancestry; nationality; marital, domestic partnership or civil union status; sex, gender, gender identity or expression; affectional or sexual orientation; disability; veteran or military status or liability for military status; domestic violence victim status; atypical cellular or blood trait; genetic information (including the refusal to submit to genetic testing) or any other characteristic protected by law.



At Sanofi diversity and inclusion is foundational to how we operate and embedded in our Core Values. We recognize to truly tap into the richness diversity brings we must lead with inclusion and have a workplace where those differences can thrive and be leveraged to empower the lives of our colleagues, patients and customers. We respect and celebrate the diversity of our people, their backgrounds and experiences and provide equal opportunity for all.

Join 27647+ Machine Learning Engineers, receiving daily job alerts.