Position: Data Platform & Reliability Lead – Client’s Product Family
Location: Mumbai (Onsite)
Experience: 10-18 Years
About our client:
Our client develops enterprise-scale capital market and wealth management platforms handling real-time trading, order management, market data distribution, portfolio management and back-office processing.
Requirement:
We are seeking a hands-on Data Platform & Reliability Lead to own the design, performance, scalability and reliability of our stateful platform services.
The platform is expected to support:
- 10+ crore trades per day
- Large-scale event streaming
- High-volume transactional workloads
- Multi-terabyte data growth
- Mission-critical financial operations
Role Mission
Own the architecture, operations and reliability of PostgreSQL, Kafka and Redis platforms that form the backbone of our client’s transaction processing and event-driven architecture.
Key Responsibilities
PostgreSQL Platform Engineering
- Design and operate highly available PostgreSQL platforms.
- Build and manage Patroni-based PostgreSQL clusters.
- Implement replication, failover and disaster recovery mechanisms.
- Define backup, restore and PITR strategies.
- Optimize database performance for high-volume trading workloads.
Kafka Platform Engineering
- Design and operate enterprise Kafka platforms.
- Define topic, partition and replication strategies.
- Manage broker capacity and cluster scaling.
- Optimize producer and consumer performance.
- Support market-data, order-routing and event-streaming workloads.
Redis Platform Engineering
- Design and operate Redis HA architectures.
- Implement Redis Cluster and Sentinel configurations.
- Optimize caching strategies and memory utilization.
- Improve latency and throughput for critical services.
Reliability Engineering
- Define availability targets, recovery objectives and resiliency standards.
- Lead failover testing and disaster recovery validation.
- Conduct performance benchmarking and capacity planning.
- Drive platform hardening and operational excellence.
Monitoring & Observability
- Define database and messaging platform observability.
- Create dashboards covering:
- Replication lag
- Consumer lag
- Query performance
- Cache utilization
- Cluster health
- Build proactive alerting and anomaly detection mechanisms.
Performance Engineering
- Support workloads involving:
- Real-time trading
- Market data distribution
- Portfolio processing
- Historical reporting
- Design systems capable of scaling to multi-terabyte deployments.
Required Skills
PostgreSQL
Must Have:
- PostgreSQL Administration
- Patroni
- PgBouncer
- Replication
- WAL Management
- PITR
- Backup & Recovery
- Partitioning
- Query Optimization
Kafka
Must Have:
- Kafka Administration
- KRaft Architecture
- Broker Management
- Topic Design
- Replication
- Consumer Groups
- Performance Tuning
- Capacity Planning
Redis
Must Have:
- Redis Cluster
- Redis Sentinel
- High Availability
- Persistence
- Memory Optimization
- Performance Tuning
Reliability Engineering
- Capacity Planning
- Disaster Recovery
- Multi-DC Architectures
- Production Operations
- Root Cause Analysis
- Performance Benchmarking
Strongly Preferred
- Financial Markets / Trading Systems Experience
- Event-Driven Architectures
- Large Scale Data Platforms
- Time-Series Workloads
- Cloud and On-Prem Hybrid Deployments
Success Metrics
- Achieve platform availability targets for PostgreSQL, Kafka and Redis.
- Meet RPO/RTO objectives for critical services.
- Scale platform to support future growth in transaction volumes and data size.
- Reduce operational incidents and recovery times.
- Establish repeatable and validated disaster recovery processes.
NOTE: Interested professional can reach out to us with a copy of their updated CV with current and expected salary detail with Notice Period on: aanchal@teamrecruiter.in

Comments are closed.