Sr. Engineering Manager - ML Platform

Sep 05, 2023

Mountain View, California

At Databricks, we are passionate about enabling data teams to solve the world's toughest problems — from making the next mode of transportation a reality to accelerating the development of medical breakthroughs. We do this by building and running the world's best data and AI infrastructure platform so our customers can use deep data insights to improve their business. 

Databricks Model Serving provides customers with a robust, reliable system to serve ML models at high QPS and low latency using GPU acceleration. We leverage our serverless architecture to provide superior performance than competitors for LLMs out of the box. Customers also receive greater agility and operability with the integrated data lakehouse, allowing them to iterate on feature engineering and easily monitor model performance.

We are seeking a dedicated Senior Engineering Manager to lead our initiatives around LLM performance, data plane reliability & scalability, and overall margins. You will additionally be one of the senior leaders of the ML Platform and craft overall strategy to build integrations with model monitoring, feature stores, vector dbs, and more.

Key responsibilities of the position include:

  • Leading a talented engineering team to deliver amazing performance and reliability at low cost
  • Recruit top-tier talent, build functional team structures, and coach the team to be a world class organization
  • Evolve processes to improve operational excellence and deliver on the roadmap
  • In collaboration with product management and IC leaders, create and iterate on a roadmap to align product goals with the Lakehouse strategy
  • Work closely with platform teams to build rock-solid infrastructure that can be leveraged by all serverless products at Databricks
  • Frequent customer interaction for support, sales, and product planning purposes

The impact you will have:

  • Lead development for the first real-time product at Databricks to serve over 1M QPS
  • Drive company wide impact by making Databricks the best place to create and deploy enterprise LLMs
  • Complete the Lakehouse AI story by making it super easy for customers to to iterate on features and debug production issues
  • Grow a world class team of software engineers working on our data plane from 10 to 20 over the next 18 months, hire top-notch talent including up to the Staff+ level
  • Ensure consistent delivery against milestones and strong alignment with the field working "two-in-a-box" with product leadership
  • Evolve organizational structure to align with long term initiatives, build strong "5 ingredient" teams with good comms architecture
  • Manage technical debt, including long term technical architecture decisions and balance product roadmap

Minimum requirements for the position include:

  • 3+ years of technical management experience, including managing other managers and engineers at the Staff+ level
  • 8+ years of experience working on highly-available multi-tenant systems with a focus on reliability and efficiency
  • Ability to attract, hire, and coach engineers who meet the Databricks hiring standards. Can up level the existing team via hiring top-notch senior talent, growing leaders and helping struggling members. Can gain trust of the team and guide their careers
  • Comfort working cross functionality with product management and directly with customers; ability to deeply understand product and customer personas

An ideal candidate will also have:

  • Experience working with techniques like quantization, pruning, interleaving, layer fusion, and writing custom CUDA kernels to improve model performance
  • Experience building products for real-time serving infrastructure for models, containers, functions, or similar
  • Experience building or supporting ML systems
  • Experience operating Kubernetes in production environments
  • Experience working with an ML Framework like PyTorch, TensorFlow, or similar

Pay Range Transparency

Databricks is committed to fair and equitable compensation practices. The pay range(s) for this role is listed below and represents base salary range for non-commissionable roles or on-target earnings for commissionable roles.  Actual compensation packages are based on several factors that are unique to each candidate, including but not limited to job-related skills, depth of experience, relevant certifications and training, and specific work location. Based on the factors above, Databricks utilizes the full width of the range. The total compensation package for this position may also include eligibility for annual performance bonus, equity, and the benefits listed above. For more information regarding which range your location is in visit our page here.

Local Pay Range

$222,000—$300,000 USD

About Databricks

Databricks is the data and AI company. More than 9,000 organizations worldwide — including Comcast, Condé Nast, and over 50% of the Fortune 500 — rely on the Databricks Lakehouse Platform to unify their data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe. Founded by the original creators of Apache Spark™, Delta Lake and MLflow, Databricks is on a mission to help data teams solve the world’s toughest problems. To learn more, follow Databricks on Twitter, LinkedIn and Facebook.

Our Commitment to Diversity and Inclusion

At Databricks, we are committed to fostering a diverse and inclusive culture where everyone can excel. We take great care to ensure that our hiring practices are inclusive and meet equal employment opportunity standards. Individuals looking for employment at Databricks are considered without regard to age, color, disability, ethnicity, family or marital status, gender identity or expression, language, national origin, physical and mental ability, political affiliation, race, religion, sexual orientation, socio-economic status, veteran status, and other protected characteristics.


If access to export-controlled technology or source code is required for performance of job duties, it is within Employer's discretion whether to apply for a U.S. government license for such positions, and Employer may decline to proceed with an applicant on this basis alone.

Join 27215+ Machine Learning Engineers, receiving daily job alerts.