Job Description
Description :
We are looking for a Senior Data Engineer to own, extend, and harden the data infrastructure across organisation. You will work directly with the Head of Data, building and maintaining production-grade pipelines across AWS, Python, and cloud-native data services. This role is broad by design you will touch ASR evaluation pipelines, blockchain data ingestion, API integrations, and data platform tooling.
What You Will Work On :
– Benchmark pipeline maintain and extend the multi-provider ASR transcription system; own audio preprocessing, chunking logic, retry/error handling, and metrics computation (WER, CER, BERTScore, PIER, DER, CS Precision/Recall)
– AWS data lake manage and extend the KGen data lake : Athena query optimisation, Glue crawlers and cataloguing, Apache Hudi table management, Lake Formation column-level permissions, and S3 lifecycle policies
– ETL and ingestion build and maintain data ingestion pipelines from Google Forms, Twitch API, on-chain blockchain events (Aptos, BSC, Ethereum, Polygon), and third-party gaming analytics APIs into DynamoDB and PostgreSQL
– Airflow DAG management author, debug, and monitor Airflow DAGs for scheduled processing and pipeline orchestration
– Cloud data transfers manage large-scale S3-to-Google Drive transfers (rclone), cross-region data movement, and vendor data sharing infrastructure
– Infrastructure and access management maintain AWS IAM, Lake Formation, and S3 bucket policies; manage data engineer access controls; troubleshoot Superset permissions and connectivity
– QC and annotation tooling support the FastAPI-backed audio QC portal used by annotation workers; extend data validation and quality-check scripts across egocentric video and audio datasets
– Schema design contribute to the Universal Data Schema (UDS) for audio, image, and code modalities in the Humyn Labs dataset marketplace
You Should Have :
– 4+ years in a data engineering role with end-to-end pipeline ownership
– Strong Python async patterns, subprocess management, API clients, data processing at scale
– AWS : Athena, Glue, S3, DynamoDB, Lake Formation hands-on, not just familiarity
– Apache Hudi or Delta Lake experience; understanding of schema evolution and partition strategies
– SQL proficiency able to write and optimise complex analytical queries
– Experience with Airflow or an equivalent workflow orchestrator
– Comfort working with audio/media data pipelines (format conversion, metadata extraction, chunking) is a strong plus
– Familiarity with blockchain data structures (on-chain events, wallet transactions, DEX swaps) is a plus
– Experience with rclone, large-scale file transfer, or cloud-to-cloud sync pipelines is a plus
Are you interested in this position?
Apply by clicking on the “Apply Now” button below!
#GraphicDesignJobsOnline
#WebDesignRemoteJobs #FreelanceGraphicDesigner #WorkFromHomeDesignJobs #OnlineWebDesignWork #RemoteDesignOpportunities #HireGraphicDesigners #DigitalDesignCareers# Dynamicbrand guru
Apply Now