Senior Data Engineer

April 21, 2026
Application ends: July 20, 2026
Apply Now

Job Description

Description :

We are looking for a Senior Data Engineer to own, extend, and harden the data infrastructure across organisation. You will work directly with the Head of Data, building and maintaining production-grade pipelines across AWS, Python, and cloud-native data services. This role is broad by design you will touch ASR evaluation pipelines, blockchain data ingestion, API integrations, and data platform tooling.

What You Will Work On :

– Benchmark pipeline maintain and extend the multi-provider ASR transcription system; own audio preprocessing, chunking logic, retry/error handling, and metrics computation (WER, CER, BERTScore, PIER, DER, CS Precision/Recall)

– AWS data lake manage and extend the KGen data lake : Athena query optimisation, Glue crawlers and cataloguing, Apache Hudi table management, Lake Formation column-level permissions, and S3 lifecycle policies

– ETL and ingestion build and maintain data ingestion pipelines from Google Forms, Twitch API, on-chain blockchain events (Aptos, BSC, Ethereum, Polygon), and third-party gaming analytics APIs into DynamoDB and PostgreSQL

– Airflow DAG management author, debug, and monitor Airflow DAGs for scheduled processing and pipeline orchestration

– Cloud data transfers manage large-scale S3-to-Google Drive transfers (rclone), cross-region data movement, and vendor data sharing infrastructure

– Infrastructure and access management maintain AWS IAM, Lake Formation, and S3 bucket policies; manage data engineer access controls; troubleshoot Superset permissions and connectivity

– QC and annotation tooling support the FastAPI-backed audio QC portal used by annotation workers; extend data validation and quality-check scripts across egocentric video and audio datasets

– Schema design contribute to the Universal Data Schema (UDS) for audio, image, and code modalities in the Humyn Labs dataset marketplace

You Should Have :

– 4+ years in a data engineering role with end-to-end pipeline ownership

– Strong Python async patterns, subprocess management, API clients, data processing at scale

– AWS : Athena, Glue, S3, DynamoDB, Lake Formation hands-on, not just familiarity

– Apache Hudi or Delta Lake experience; understanding of schema evolution and partition strategies

– SQL proficiency able to write and optimise complex analytical queries

– Experience with Airflow or an equivalent workflow orchestrator

– Comfort working with audio/media data pipelines (format conversion, metadata extraction, chunking) is a strong plus

– Familiarity with blockchain data structures (on-chain events, wallet transactions, DEX swaps) is a plus

– Experience with rclone, large-scale file transfer, or cloud-to-cloud sync pipelines is a plus

Are you interested in this position?

Apply by clicking on the “Apply Now” button below!

#GraphicDesignJobsOnline

#WebDesignRemoteJobs #FreelanceGraphicDesigner #WorkFromHomeDesignJobs #OnlineWebDesignWork #RemoteDesignOpportunities #HireGraphicDesigners #DigitalDesignCareers# Dynamicbrand guru