Engineering Manager, Infrastructure Foundations

The Wikimedia Foundation
Job Type
Deadline for Applications
How to Apply
Apply through Greenhouse link
Where to Apply

Location: Remote, EMEA//South America/Eastern United States preferred

Wikimedia’s Site Reliability Engineering (SRE) team is principally responsible for ensuring their global top-15 web site, their public facing services and underlying infrastructure are healthy and developing further in support of Wikimedia’s mission. The SRE Infrastructure Foundations team builds the pillars of their bare-metal infrastructure and global network, the platform, configuration management and automation tools their other SRE and development teams can rely upon. The SRE team as a whole comprises over 30 creative and talented staff members that are globally distributed, and the Infrastructure Foundations sub-team currently comprises 6 engineers, with some additional growth expected soon.

As the Engineering Manager of the Infrastructure Foundations team, you will support engineers developing their infrastructure and supporting the services that depend on it, used by hundreds of millions of people around the world. This is an opportunity to do good while improving one of the best known sites in the world.

Managing one to two globally distributed teams within Wikimedia’s Site Reliability Engineering organization
Recruiting, hiring, and helping onboard new team members
Working with team members to set individual performance goals, and supporting them in meeting and evolving their goals and career path
Triaging incoming workload, maintaining focus on priorities, and setting realistic expectations for both peers and team members
Coordinating and communicating with other members of the Wikimedia engineering teams on relevant projects, and contributing to the organizational strategy
Continuously developing the roadmap of the team in alignment with other SRE and Technology teams, and helping to draft and execute the team’s annual and quarterly plans
Project managing new and existing initiatives
Leading the definition, refinement, and execution of the processes through which the team manages and performs work
Leading incident response, diagnosis, and follow-up on system alerts and outages across Wikimedia’s production infrastructure
Facilitating the definition and establishment of Service Level Indicators and Objectives with service owners and stakeholders
Skills & Experience:
Prior experience managing teams
Prior hands-on experience with software or reliability engineering (within the last 3 years preferred)
Aptitude for automation and streamlining of tasks
Communicate effectively in both spoken and written English
Ability to work independently, as an effective part of a globally distributed team
Willing and able to travel several times a year for occasional in-person meetings
B.S. or M.S. in Computer Science or the equivalent in related work experience
Qualities that are important to us:
Commitment to the mission of the organization and our values
Commitment to our guiding principles
Ability to disagree in a respectful manner and yet work towards a solution even when you disagree
Good at asynchronous communication
Solutions-focused. The Wikimedia ecosystem is complex, resources are limited, and our guiding principles are ambitious. We want you to work to find solutions embracing these factors.
Self motivated with an ability to navigate through ambiguity and bring a project to completion with limited directions
Curiosity and commitment to learn
Additionally, we would love it if you have:
Experience working in a distributed, largely remote environment
Experience contributing to open source projects