Systems Engineer Infrastructure Devops
IN
At Avolta (SIX: AVOL), our people are at the driving force behind our success. With a team of over 76,000 individuals representing more than 150 nationalities, we are a truly global company driven by passion, innovation, and excellence.
Born from the combination of Dufry and Autogrill, Avolta is redefining the travel experience through the dedication and expertise of our diverse workforce. Across 73 countries and 1,000 locations, our teams bring energy, creativity, and commitment to delivering world-class travel retail and food & beverage experiences.
We operate across multiple channels - including airports, motorways, cruise ships, ports, railways, and more - offering endless opportunities for collaboration and growth. Our people are empowered to make an impact, supported by a culture that values teamwork, development, and innovation.
Sustainability and social responsibility are embedded in our strategy, ensuring we grow in a way that benefits both our employees and the communities we serve.
Are you looking for a dynamic, international career where your contributions truly matter? Join Avolta and be part of a team that’s shaping the future of travel - together.
ROLE SUMMARY
The Systems Engineer is responsible for the infrastructure layer that keeps Avolta's competitive intelligence scraping platform running reliably at scale. Where the Lead Scraping Engineer builds the data extraction engine, this role ensures that engine runs with high availability, scales cost-efficiently from 250 to 1,000+ daily jobs, and is fully observable at all times. You will own orchestration, cloud infrastructure, CI/CD, monitoring and proxy infrastructure management.
This is a platform/DevOps engineering role, not a scraping development role — you will write Python and infrastructure-as-code extensively, but your output is operational excellence: uptime, cost efficiency, deployment velocity and mean time to recovery. You must be comfortable owning a production system and being accountable for its performance.
KEY RESPONSIBILITIES
Orchestration & Job Management (Primary)
- Own and operate the workflow orchestration platform (Apache Airflow, Azure Data Factory, or Prefect): DAG/pipeline design standards, dependency management, SLA monitoring, dynamic task mapping for parallelism.
- Design the scheduling architecture for 5,000+ daily scraping jobs: ensure all jobs complete within agreed windows (typically 6-hour batch cycles), handle dependencies, and implement priority queuing for high-value targets.
- Implement dynamic autoscaling for scraping workers: scale-out for peak crawl windows, scale-in during idle periods. Target: cloud spend optimised to <€2,500/month at 1,000-competitor scale.
- Build and maintain the job retry and failure recovery system: automatic retries with backoff, dead-letter queues for persistent failures, alerting for SLA breaches.
Cloud Infrastructure & Cost Management
- Own the cloud environment (Azure primary): Azure VMs / ACI for scraping workers, Azure SQL for operational database, Azure Blob Storage for raw data storage, Azure Monitor for logging and metrics.
- Write and maintain infrastructure-as-code using Terraform (preferred) or Azure Bicep: all infrastructure reproducible, version-controlled and deployable from scratch in <30 minutes.
- Implement spot instance and preemptible VM strategies for scraping workers to reduce compute costs by 60-70%.
- Define and enforce storage lifecycle policies: raw HTML to cold storage after 7 days, structured data tiering, data retention policies aligned with legal requirements.
- Track and report cost-per-crawl metrics; identify and implement optimisation levers on a monthly basis.
CI/CD & Developer Productivity
- Build and maintain CI/CD pipelines (Azure DevOps Pipelines primary; GitHub Actions or GitLab CI also supported): automated linting, testing, Docker image build, and deployment for scraper code changes.
- Implement blue-green or rolling deployment strategies for scraper updates to avoid production downtime.
- Manage Docker container registry: image versioning, vulnerability scanning, base image maintenance.
- Own the local development environment setup: docker-compose environments that mirror production, onboarding scripts, developer documentation.
Proxy Infrastructure & Network Management
- Manage proxy provider relationships and contracts (Bright Data, Oxylabs, Smartproxy or equivalent): negotiate volume pricing, monitor pool health, track cost-per-GB and cost-per-request.
- Implement proxy rotation logic at the infrastructure level: IP pool management, geolocation targeting, session persistence for stateful scrapers.
- Monitor proxy performance: success rate by IP type (residential vs. datacenter), latency, ban rate by domain.
- Implement a proxy abstraction layer that allows the scraping team to swap providers without code changes.
Observability & Incident Response
- Build and maintain the full observability stack: metrics collection (Prometheus), dashboards (Grafana), log aggregation (ELK stack or Azure Monitor Logs), distributed tracing (Jaeger or Azure Application Insights).
- Define and implement alerting: PagerDuty or OpsGenie integration, escalation policies, on-call runbooks.
- Own incident response for infrastructure failures: diagnose, restore service, conduct post-mortem, implement preventive measures.
- Track and report on SLI/SLO metrics monthly: scraper success rate, data freshness, job completion rate, infrastructure availability.
TECHNICAL SKILLS — REQUIRED
|
Azure (VMs, ACI/AKS, Blob Storage, Azure SQL, Azure Monitor, IAM, VNet) |
|
Python 3.8+ scripting |
|
Docker + Kubernetes or AKS (container orchestration) |
|
Experience with infrastructure-as-code tools (e.g. Terraform, Azure Bicep) |
|
Bash scripting (automation, deployment scripts) |
|
Familiarity with observability and monitoring tools (e.g. Prometheus + Grafana, Datadog, New Relic) |
|
Azure DevOps Pipelines; GitHub Actions or GitLab CI/CD |
|
Linux systems administration (Ubuntu/Debian, systemd, networking) |
|
Proxy infrastructure management for web scraping (IP rotation, rate limiting) |
|
Databricks fundamentals (workspace navigation, job execution, notebook workflows) |
TECHNICAL SKILLS — STRONG ADVANTAGE
|
Scrapy familiarity: enough to deploy and monitor scrapers, configure settings, understand concurrency parameters. |
|
Familiarity with in-memory data stores for caching, queuing, and deduplication (e.g. Redis, Memcached) |
|
Familiarity with workflow orchestration tools (e.g. Apache Airflow, Prefect, Dagster) |
|
Database performance tuning: PostgreSQL query optimisation, connection pooling (PgBouncer), read replicas. |
|
Spot instance / preemptible VM cost optimisation |
|
Log aggregation and analysis tools (e.g. ELK Stack, Azure Monitor Logs, Splunk) |
EXPERIENCE & QUALIFICATIONS
- 4-7 years in a DevOps, Platform Engineering, SRE or Infrastructure Engineering role.
- Demonstrable experience operating data-intensive workloads in production at scale (thousands of daily jobs).
- Hands-on experience with a workflow orchestration tool in production (Apache Airflow, Azure Data Factory, or equivalent) — not just familiarity, but ownership of an orchestrated pipeline system.
- Strong IaC discipline: you should be able to describe your infrastructure entirely through code, not click-ops.
- Experience with cost optimisation on cloud: at least one example of measurably reducing infrastructure spend.
- Bachelor's degree in Computer Science, Systems Engineering or equivalent.

Due to certain email system settings, some of our messages may occasionally land in your junk or spam folder. To ensure you don’t miss any important updates regarding your application, please check these folders regularly and mark our emails as ‘Not Spam’ if needed.
We look forward to connecting with you soon!