Data Engineer (Observability) F/M

Sept. 1, 2022


We are looking for an experienced Data Engineer in order to develop DataGalaxy premium connectivity features, on top of the modern data stack services, starting with cloud datalakes and data warehouses solutions. 💡The productThe product you will be working on aims on delivering insights at different metadata levels, such as : Urbanization : data should be organized with respect to urbanization rules. This scope aims at knowing how good these rules are enforced, and drives the way data will be aggregated for all others insights Storage : metrics about storage capacity and repartition, with drill down capabilities along the urbanization model Access : global access map, including recursive permissions given through inheritance Usage : metrics about usages and actions taken by users and applications Cost : global cost metrics and repartition along the urbanization model, that can be used in order to rebuild internally a service usage On top of these scopes, data science algorithms and features are foreseen in order to identify, anticipate and leverage behaviors.The final goal aims at feeding our data governance platform in order to organize and surface all this information for our users. 🌟Your team and colleaguesYour role is to lead data engineering developments within the team.You’ll be working in close collaboration with a dedicated python Dev and Ops Lead in order to help you build reliable artifacts with automated pipelines. Other data engineers and scientists will be involved: being able to collaborate with teams is a key skill for this position.We use Agile SCRUM methodology and you’ll be accountable towards your team Product Owner, as well as product lead guidance from your Product Manager.RequirementsThe ideal candidate is an experienced data pipeline builder and data wrangler who enjoys optimizing data systems and building them from the ground up. The Data Engineer will support our software developers, database architects, data analysts and data scientists on data initiatives and will ensure optimal data delivery architecture is consistent throughout ongoing projects. Build reliable and scalable algorithms Strong Python and Spark competencies, with relevant experiences Cloud Vendor experiences (Azure, AWS, GCP), Databricks experience Create and maintain optimal data systems and pipeline architecture Build processes supporting data transformation, data structures, metadata, dependency and workload management. Conduct complex data analysis and report on top of these results Identify opportunities for data acquisition Design and evaluate open source and vendor tools for data acquisition and data lineage Collaborate with data scientists and architects on several projects BenefitsBy joining DataGalaxy, you become a part of a vibrant team which is instantly evolving in a fast-paced environment. Our expansion continues internationally so we're looking for curious and passionate individuals who are aligned with our values: “Be Intentional: Be Clear, Be Bold, Be Humble”. Ready to start your new adventure? 🚀What can you expect: Offices in the heart of Lyon and Paris Flexible working hours (“forfait jour”) Remote work at will Health Insurance Apicil Meal Vouchers (Swile) Daily coffee & snacks Mid-annual team events

Built with ❤️ for the ML Community by Dom © 2022 RemoteML