Site Reliability Engineer
Role: Site Reliability Engineer
Location: Redmond, WA (Locals only)
Start date: Asap once BGV is cleared.
Project duration: 6 months Standard
Senior Site Reliability Engineer:
Intelligent Conversation and Communications Cloud (IC3) Carrier Operations Team
Intelligent Conversations and Communications Cloud (IC3) powers billions of real-time customer conversations across Microsoft’s first-party (Teams, Skype) and second-party (Dynamics) solutions. IC3 enables reliable and high-quality audio/video calling, meeting, and messaging services that work every time, from anywhere seamlessly across all customer touch points. IC3 makes conversations on our platform more intelligent in real-time empowering the best-in-class productivity tools for the modern workplace where every call, meeting or chat makes the next one better.
As part of the IC3 Carrier Operations SRE team, our mission is to ensure we operate the IC3 PSTN services with end to end high availability, performance and reliability to ensure customer objectives are consistently met or exceeded. To achieve this, we work closely with our product and engineering teams and use a variety of home-grown toolsets aimed at aggressive automation for reliability. We are also a service engineering-focused team running at scale while supporting deployments to support new carriers across the globe.
· Work with team of engineers focused on improving the reliability, scalability, latency, and efficiency of PSTN services powering cloud communications.
· Managing problem resolution with service providers.
· Learning existing tools, enhancing them to meet new scale and features aimed at reducing manual intervention, enhancing prevention, detection and mitigation of service impacts.
· Participate in on-call rotation of the local follow-the-sun team.
· Manage incident response and perform root cause analysis investigations.
· Reviewing existing processes and driving improvements in order to support scale and excellence of PSTN services.
· Analyzing data and providing operational insights into service reliability, customer experience to Design and Product teams.
· Participating in recruiting and developing a team of experienced SRE engineers.
· 7+ years of experience as a software engineer or site reliability engineer directly supporting development and quality in a product engineering team environment.
· 5+ years experience shipping distributed systems, services and highly available infrastructure
· 5+ years experience of scripting/coding using one or more of the following: PowerShell, C#, Python
· Expertise with PowerBI – create data models, write queries, creating powerful visualizations
· Experience with T-SQL, Kusto Query Language (KQL), Azure Log Analytics, Cosmos
· Experience with Microsoft Azure, Azure DevOps, ServiceNow, Microsoft Dynamics or FLOW
· Passionate about Site Reliability Engineering Practices
· Knowledge/experience of cloud-based distributed systems and micro services architecture.
· Knowledge/experience of Internet network architecture and working/functioning principles.
· Experience with Voice over IP highly desirable.
· Experience analyzing network packet captures and signaling traces
· Experience working with SBCs, Media Gateways, Circuit-switched Telephony, SS7, ISDN/ISUP.