The Machine Learning optimization and deployment researcher is responsible for designing and implementing optimization strategies for the refinement and deployment of deep learning models. He/she will research and implement model compression techniques such as quantization, pruning and distillation. The job scope also includes developing custom deployment techniques for specific inference engines and optimizing model performance in-situ in a deployed product environment. The candidate will cooperate with other teams to integrate the optimized models into production systems and to refine and scale the methods used for assessing the functional performance of models.
Key Responsibilities:
Optimize AI models for deployment in a wide range of target architectures including desktop, cloud, browsers and mobile devices
Develop and implement algorithms and software for efficient real-time and offline inference
Monitor and evaluate the performance of models in a production context and optimize them for accuracy, speed and compute resource efficiency
Design and implement custom tooling and strategies for the development and deployment of optimized deep learning models
Research and implement appropriate techniques to optimize deployment for product
Work closely with cross-functional teams, including product managers, engineers and researchers, to understand their workflows and design and implement optimized model deployment techniques
Desired Background
Ph.D. or master s with 4 years of experience in Computer Science or similar, with a focus on deep learning.
Strong publication record, with publications in major machine learning conferences (e.g. NeurIPS, ICLR, ICML, etc.).
Strong theoretical and practical background in AI technologies.
Experience with exploration of new technologies: Researching and staying up to date with the latest machine learning technologies, frameworks and inference engines.
Experienced in optimizing algorithms and software architectures in constrained environments.
3 years of experience with frameworks such as PyTorch/Onnx/TensorFlow/etc
3 years of experience with programming languages such as Python, C/C++, or Matlab
Experience with Embedded systems, Computer Architecture, high-performance computing
Experience optimizing ML models for Inference using hardware acceleration is a plus
Experience working in a software development team and making use of good software practice, for example VCS and CI.
Desired: Background in web technologies for real time processing. Basic knowledge of Audio/Video formats.
Strong analytical and problem-solving skills
Strong communication skills and ability to work well in a team environment
Key Skills
About Company