Machine Learning Engineer

Application ends: August 19, 2025

Apply for this job

Email *

Job Description

We’re looking for a Machine Learning Engineer with a deep focus on optimizing models for real-world performance and deploying them on resource-constrained environments. You won’t just train models—you’ll figure out how to squeeze every drop of performance out of them for low-latency inference, quantization, pruning, and edge deployment scenarios. This role is ideal for someone who understands both theory and systems, and can bridge the gap between cutting-edge research and stable, scalable production.


What You’ll Do

  • Optimize model architectures for inference efficiency (e.g., using quantization-aware training, structured pruning, tensor decomposition techniques).
  • Build pipelines for automatic benchmarking of models across CPU, GPU, and mobile chipsets (e.g., Qualcomm, Apple Neural Engine).
  • Develop conversion workflows using tools like ONNX, TensorRT, Core ML, and TFLite for cross-platform compatibility.
  • Collaborate closely with embedded systems engineers and data pipeline teams to make ML models production-ready.
  • Profile inference bottlenecks using tools like Nsight Systems, PyTorch Profiler, and Perfetto, then implement strategies to minimize compute and memory overhead.
  • Evaluate and implement distillation strategies for large language models and vision models under tight performance constraints.
  • Stay up-to-date with hardware-aware neural architecture search (NAS) techniques and apply them when appropriate.

What You’ll Need

Required

  • Demonstrated experience in model compression, quantization, or low-rank approximations (show us your projects or papers).
  • Strong coding skills in Python and C++, with an emphasis on writing production-grade inference code.
  • Familiarity with TensorFlow Lite, ONNX Runtime, TensorRT, or other edge deployment frameworks.
  • Solid grasp of computer architecture concepts (cache hierarchy, vectorization, parallel execution).
  • Experience profiling ML workloads on devices with limited compute (e.g., Raspberry Pi, Jetson Nano, Android/iOS phones).

Preferred

  • Prior experience shipping ML models on consumer hardware (e.g., cameras, wearables, phones).
  • Contributions to open-source libraries or tools in the ML inference ecosystem.
  • Background in training custom quantization-aware or sparsity-aware models.
  • Working knowledge of compiler-based optimization for ML (e.g., XLA, TVM, Glow, MLIR).

Are you interested in this position?

Apply by clicking on the “Apply Now” button below!

#GraphicDesignJobsOnline#WebDesignRemoteJobs #FreelanceGraphicDesigner #WorkFromHomeDesignJobs #OnlineWebDesignWork #RemoteDesignOpportunities #HireGraphicDesigners #DigitalDesignCareers#Dynamicbrandguru