Data Engineer

May 19, 2022


Labelbox’s mission is to build the best products for humans to advance artificial intelligence. Real breakthroughs in AI are reliant on the quality of the training data. Our training data platform enables organizations to improve their machine learning models far quicker and more accurately. We are determined to build software that is more open, easier-to-use, and singularly focused on getting our customers to performant ML faster.   Current Labelbox customers are transforming industries within insurance, retail, manufacturing/robotics, healthcare, and beyond. Our platform is used by Fortune 500 enterprises including Allstate, Black + Decker, Bayer, Warner Brothers and leading AI-focused companies including FLIR Systems and Caption Health. We are backed by leading investors including SoftBank, Andreessen Horowitz, B Capital, Gradient Ventures (Google's AI-focused fund), Databricks Ventures, Snowpoint Ventures and Kleiner Perkins. About the Role Labelbox is hiring a Data Engineer to build new data pipelines and scale existing ones. As our company grows, this person will build data infrastructure that brings together tech, product, and operational functions and informs strategic decision making at the executive level. You will be responsible for transforming raw data in the data warehouse into clean, reliable, organized data models that allow our organization to make informed data-driven decisions. Our tech stack currently consists of Bigquery, DBT, and Looker along with other tools to replicate all of our data to our data warehouse. What You'll Do Develop and optimize large-scale batch and real-time data pipelines that ingest structured and unstructured data from a variety of sources using a combination DBT, Fivetran, and other toolsBuild, rebuild and performance tune data transformation tasks within the central data storeTake over and scale our DBT and Looker setupManage incoming data requests and prioritize the highest value projects in an organized fashionCommunicate data-backed findings to a diverse constituency of internal and external stakeholdersHelp create best practices and standards for data modeling, documentation, and testingYou will have strong autonomy designing and implementing operationally excellent data interfacesRigorously design data warehouse schemas to allow for performant access to digestible datasetsBecome the analytics infrastructure and tooling expert, supporting business-focused pipelines and data interfacesData modeling, Data warehouse management, and Data orchestration About You Expert-level SQL skillsExperience in a role performing data warehouse and analytics solution design and development using a variety of techniques such as clustering and partitioning on tables over 1B rowsUnderstanding of data architecture design, data modeling, and physical database design and tuningHands-on experience in the implementation of cloud data warehouses using Bigquery, postgres, and Mysql databasesExperience using DBTKnowledge of data visualization tools such as LookerHands-on coding experience in Python Technology You’ll Use Bigquery/GCS, Mysql, PostgresDBTFivetranLookerGithubJira Do great work. From anywhere. We hire great people regardless of where they live. Work wherever you’d like as reliable internet access is our only requirement. We communicate asynchronously, work autonomously, and take ownership of our work. #LI-Remote

Built with ❤️ for the ML Community by Dom © 2022 RemoteML