Job Description
About the Role
You will be building machine learning systems that don’t just work in notebooks, but continue to work after 10 million user interactions, 30 model updates, and a few unexpected API failures. We’re looking for someone who understands that shipping a model is the start of the work, not the end.
You’ll work at the intersection of modeling, system design, and product impact — owning ML solutions from experimentation through deployment, including metrics monitoring, retraining strategy, and debugging in production.
Responsibilities
- Design and implement robust pipelines for training, evaluation, and deployment — with an emphasis on modularity, reproducibility, and auditability.
- Develop models (not necessarily deep learning) that solve real-world problems with measurable business impact — using structured, semi-structured, and unstructured data.
- Build model observability tooling: think feature drift detection, failure mode tracking, and latency profiling.
- Collaborate cross-functionally with product managers and backend engineers — translating ambiguous problems into clearly scoped ML solutions.
- Set clear criteria for model promotion: not just accuracy, but stability, latency, interpretability, and retrain cost.
- Participate in post-mortems for model failures — and help design systems that make similar issues less likely in the future.
Minimum Qualifications
- 3–6 years of experience shipping machine learning models into production (not just PoCs or academic work).
- Solid Python engineering practices (type annotations, testing, CI/CD) and experience with one of: Ray, Dask, Airflow, Prefect.
- Demonstrated experience with model lifecycle management — training → validation → deployment → rollback/retraining.
- Familiarity with issues of real-world data: leakage, concept drift, overfitting on proxy metrics, label noise.
- Ability to explain tradeoffs between models that are good for experimentation (e.g. XGBoost) and those good for deployment (e.g. quantized models or ONNX-exported architectures).
Nice to Have
- Experience with model serving frameworks: BentoML, Triton, TorchServe, or custom Flask/FastAPI-based setups.
- Experience implementing online learning or bandit-based experimentation systems.
- Prior work in one of: recommendation systems, fraud detection, time series forecasting, or user personalization.
- Knowledge of data versioning practices and tools like DVC or LakeFS.
Are you interested in this position?
Apply by clicking on the “Apply Now” button below!
#GraphicDesignJobsOnline#WebDesignRemoteJobs #FreelanceGraphicDesigner #WorkFromHomeDesignJobs #OnlineWebDesignWork #RemoteDesignOpportunities #HireGraphicDesigners #DigitalDesignCareers#Dynamicbrandguru