Job Description
We’re looking for a Machine Learning Engineer with a deep focus on optimizing models for real-world performance and deploying them on resource-constrained environments. You won’t just train models—you’ll figure out how to squeeze every drop of performance out of them for low-latency inference, quantization, pruning, and edge deployment scenarios. This role is ideal for someone who understands both theory and systems, and can bridge the gap between cutting-edge research and stable, scalable production.
What You’ll Do
- Optimize model architectures for inference efficiency (e.g., using quantization-aware training, structured pruning, tensor decomposition techniques).
- Build pipelines for automatic benchmarking of models across CPU, GPU, and mobile chipsets (e.g., Qualcomm, Apple Neural Engine).
- Develop conversion workflows using tools like ONNX, TensorRT, Core ML, and TFLite for cross-platform compatibility.
- Collaborate closely with embedded systems engineers and data pipeline teams to make ML models production-ready.
- Profile inference bottlenecks using tools like Nsight Systems, PyTorch Profiler, and Perfetto, then implement strategies to minimize compute and memory overhead.
- Evaluate and implement distillation strategies for large language models and vision models under tight performance constraints.
- Stay up-to-date with hardware-aware neural architecture search (NAS) techniques and apply them when appropriate.
What You’ll Need
Required
- Demonstrated experience in model compression, quantization, or low-rank approximations (show us your projects or papers).
- Strong coding skills in Python and C++, with an emphasis on writing production-grade inference code.
- Familiarity with TensorFlow Lite, ONNX Runtime, TensorRT, or other edge deployment frameworks.
- Solid grasp of computer architecture concepts (cache hierarchy, vectorization, parallel execution).
- Experience profiling ML workloads on devices with limited compute (e.g., Raspberry Pi, Jetson Nano, Android/iOS phones).
Preferred
- Prior experience shipping ML models on consumer hardware (e.g., cameras, wearables, phones).
- Contributions to open-source libraries or tools in the ML inference ecosystem.
- Background in training custom quantization-aware or sparsity-aware models.
- Working knowledge of compiler-based optimization for ML (e.g., XLA, TVM, Glow, MLIR).
Are you interested in this position?
Apply by clicking on the “Apply Now” button below!
#GraphicDesignJobsOnline#WebDesignRemoteJobs #FreelanceGraphicDesigner #WorkFromHomeDesignJobs #OnlineWebDesignWork #RemoteDesignOpportunities #HireGraphicDesigners #DigitalDesignCareers#Dynamicbrandguru