You are viewing a preview of this job. Log in or register to view more details about this job.

Machine Learning Data Engineer

DESCRIPTION

At PDF Solutions, we’re transforming the semiconductor and electronics manufacturing industry with our AI platform that improves yield and lowers manufacturing costs at some of the largest chip makers in the world. Not just machine learning, but AI. We’re seeking an experienced Machine Learning Data Engineer to join our team, who is responsible for developing Data-Centric AI pipelines to enable and drive production of the world’s most advanced chips. We look for people who are self-motivated and passionate about the transformative potential of AI. You’ll be able to hone your skills while working side by side with industry experts who have decades of experience. The candidate must be an organized and highly motivated team player with strong initiative and communication skills, and possesses the drive to deliver quality results on time in a complex, intensive, and highly productive environment.

RESPONSIBILITIES

● Help design, implement, and validate the Data Pipelines while collaborating with data scientists and engineers. This includes building pipelines for data retrieval, ETL, data cleansing and imputation, and building solutions in the emerging field of Data-Centric AI.

● Coordinate and collaborate with other Software Development group so that Data Pipelines fit well with the rest of PDF Solutions’ software applications.

● Balance adding new features with the need for stability and performance.

● Grow development capabilities to align with the pace of business needs.

QUALIFICATIONS AND SKILLS

● Bachelor's degree or higher in Computer Science, Computer Engineering, Electrical Engineering or similar discipline with industrial experience in software development

● 3+ years of experience with Python coding

● 3+ years of recent experience working as a Data Engineer or similar in industry

● Experience with developing production-grade code, preferably in Python

● Experience with both relational and NoSQL databases such as Oracle/Cassandra/Redis or similar

● Experience with query optimization to improve data retrieval and merging.

● Understanding of the data-centric aspects of machine learning pipelines including data retrieval, data cleansing, feature engineering, and imputation

● Understanding of test automation frameworks and tools such as PyUnit, Jenkins for continuous integration etc.

● Strong professional written and verbal communication skills

● Track record of shipping products in a fast-paced environment.

● Ability to pass a Python skills-based test

● Ability to create model-ready data from raw data, at scale

● Ability to translate business problems into data science pipelines

● Comfort with ML theory to recommend solutions beyond the standard libraries

● Must be able to work independently and as part of a diverse interdisciplinary and international team

● Communicates clearly to technical and non-technical audiences

● Empathy with customer business challenges

● Understanding of hypervisors/containers, especially Docker

DEDESCRIPTION