Apply now »

Lead Scraping Engineer

Brand: Avolta

Country:

Location: Bangalore Office

Job Type: Indefinite

At Avolta (SIX: AVOL), our people are at the driving force behind our success. With a team of over 76,000 individuals representing more than 150 nationalities, we are a truly global company driven by passion, innovation, and excellence.

Born from the combination of Dufry and Autogrill, Avolta is redefining the travel experience through the dedication and expertise of our diverse workforce. Across 73 countries and 1,000 locations, our teams bring energy, creativity, and commitment to delivering world-class travel retail and food & beverage experiences.

We operate across multiple channels - including airports, motorways, cruise ships, ports, railways, and more - offering endless opportunities for collaboration and growth. Our people are empowered to make an impact, supported by a culture that values teamwork, development, and innovation.

Sustainability and social responsibility are embedded in our strategy, ensuring we grow in a way that benefits both our employees and the communities we serve.

Are you looking for a dynamic, international career where your contributions truly matter? Join Avolta and be part of a team that’s shaping the future of travel - together.

ROLE SUMMARY

The Lead Scraping Engineer is the most senior technical individual contributor in the Bangalore competitive intelligence team and the de-facto architect of Avolta's web data extraction platform. This person will design and own the core scraping framework, set engineering standards for the team, and personally implement the most complex and technically challenging scrapers. The role combines hands-on engineering (expect to write code 50-60% of the time) with technical leadership: architecture decisions, code reviews, mentoring, and input to the engineering roadmap.

The platform must be designed from day one to scale from 250 to 1,000+ competitors without linear growth in headcount — this requires rigorous architecture, template-driven extensibility, and a relentless focus on automated resilience. This is not a management role; it is a player-coach position where technical credibility and output quality are paramount.

KEY RESPONSIBILITIES

Platform Architecture & Core Development (50-60% of time)

Design and build the core scraping platform: a modular, extensible Scrapy-based framework with standardised spider templates, configurable middleware stack, and plugin architecture that allows mid-level engineers to implement new competitors with minimal bespoke code.
Architect the anti-blocking layer: proxy rotation middleware, user-agent management, request fingerprint randomisation, ban detection and automatic IP rotation, retry logic with exponential backoff and jitter, and intelligent rate limiting per domain.
Implement headless browser automation infrastructure using Playwright (preferred) and Selenium as fallback: browser pool management, stealth configurations, resource blocking for performance, and session persistence.
Design and implement the request scheduling architecture: Scrapy-Redis for distributed crawling, priority queuing, domain-level throttling, and deduplication via Bloom filters or Redis sets.
Build internal tooling: a spider configuration registry (YAML-driven), a live spider status dashboard, a self-service onboarding tool that allows new competitors to be added via config rather than code where possible.
Own the data extraction quality layer: define output schema standards, implement validation hooks within Scrapy pipelines, write reusable field normalisation utilities (price parsing across 30+ currency formats, date normalisation, brand standardisation).

Technical Leadership (30% of time)

Set and enforce engineering standards: code review guidelines, testing requirements (minimum coverage thresholds), documentation standards (every spider must have a runbook).
Conduct thorough code reviews for all Backend Engineers and Scraper Developers; provide specific, constructive feedback that improves the team's technical level over time.
Mentor Backend Engineers in advanced scraping techniques: TLS fingerprinting, JavaScript deobfuscation, API reverse-engineering, browser automation optimisation.
Own architectural decision records (ADRs): document significant technical decisions with rationale, alternatives considered and trade-offs.
Lead incident response for critical scraping failures: diagnose root cause, implement fix, and conduct post-mortem.

Complex Scraper Implementation (10-20% of time)

Personally implement scrapers for the most technically challenging competitors: sites using advanced bot detection, mobile-only endpoints, binary protocol APIs, or complex authentication flows.
Reverse-engineer private APIs: intercept and analyse mobile application traffic using mitmproxy or Charles Proxy; analyse obfuscated JavaScript using browser debugging tools; replicate API calls programmatically.
Implement CAPTCHA-solving integrations where necessary: 2Captcha, CapSolver; evaluate when CAPTCHA solving is operationally justified vs. alternative extraction routes.

TECHNICAL SKILLS — REQUIRED (MUST HAVE)

▪ Python 3.9+ at senior level (async, OOP, design patterns)	▪ Chrome DevTools Protocol (CDP) — programmatic browser control
▪ Scrapy (advanced: custom middlewares, extensions, signals, crawler process)	▪ HTTP internals: TLS, HTTP/2, headers, fingerprinting
▪ Playwright for Python (stealth mode, browser contexts, CDP protocol)	▪ Anti-bot evasion: Cloudflare, Akamai, PerimeterX, DataDome
▪ Selenium 4 WebDriver + Grid	▪ mitmproxy / Charles Proxy (traffic interception)
▪ BeautifulSoup4 + lxml (advanced parsing, XPath 1.0/2.0)	▪ JavaScript (reading, deobfuscation, light Node.js automation)
▪ aiohttp / asyncio (high-concurrency async HTTP)	▪ Docker + docker-compose
▪ Scrapy-Redis (distributed crawling)	▪ pytest + pytest-asyncio (test framework)
▪ Redis (queues, sets, deduplication, TTL management)	▪ Git (advanced: rebasing, bisect, blame, hooks)
▪ Rotating proxy integration (Bright Data / Oxylabs / Smartproxy)	▪ SQL — intermediate to advanced

TECHNICAL SKILLS — STRONG ADVANTAGE

Scrapy-Splash or equivalent JavaScript rendering middleware.
Experience with Fingerprint evasion libraries: undetected-chromedriver, playwright-stealth, curl-impersonate.
AWS Lambda or Cloud Run for serverless scraper execution; S3/GCS for raw data storage.
Apache Kafka or SQS for high-throughput event-driven pipeline design.
Experience with mobile API reverse engineering: APK decompilation (jadx), iOS traffic analysis.
Knowledge of legal and ethical boundaries of web scraping: robots.txt interpretation, ToS analysis, GDPR considerations for scraped data.
Open-source contributions to scraping tools (Scrapy, Playwright, etc.).

EXPERIENCE & QUALIFICATIONS

6-9 years of total software engineering experience.
Minimum 3 years of specialised, production-grade web scraping experience: managing a portfolio of 50+ live scrapers, handling anti-bot at scale, operating in a commercial data collection context.
Demonstrated experience making architectural decisions for a scraping platform, not just implementing individual scrapers.
Prior experience mentoring engineers and conducting technical code reviews.
Strong portfolio: GitHub repositories, open-source contributions, or ability to present a detailed technical architecture walkthrough during interview.
Bachelor's or Master's degree in Computer Science, Software Engineering or related; exceptional practical experience considered in lieu.

Avolta Logo

Due to certain email system settings, some of our messages may occasionally land in your junk or spam folder. To ensure you don’t miss any important updates regarding your application, please check these folders regularly and mark our emails as ‘Not Spam’ if needed.

We look forward to connecting with you soon!

Apply now »

Provider	Description	Enabled
Google Analytics	Google Analytics is a web analytics service offered by Google that tracks and reports website traffic. Cookie Information Privacy Policy Terms and Conditions	Consent to cookies from provider GoogleAnalytics
Google Tag Manager	Google Tag Manager is a tag management system for conversion tracking, site analytics, remarketing, and more. Privacy Policy Terms and Conditions	Consent to cookies from provider GoogleTagManager