Dallas, Texas
9 days ago
Principal System Engineer - Production Operations

Job Description:

Join AT&T and reimagine the communications and technologies that connect the world. Our Consumer Technology experience team is delivering innovative and reliable technology solutions to power differentiated, simplified customer experiences. Bring your bold ideas and fearless risk-taking to redefine connectivity and transform how the world shares stories and experiences that matter. When you step into a career with AT&T, you won’t just imagine the future-you’ll create it.

The Principal System Engineering of Operations Tier 1 is responsible for helping lead a team of people dedicated to proactively ensuring high availability, reliability and resiliency of AT&T's customer & agent facing experiences and shared omnichannel platforms.

Responsibilities

Provide 24x7 Tier 1 support for customer & agent facing applications operating across eCommerce, Care, & Retail platforms built on microservices based architecture on prem & in Cloud including SaaS: Salesforce, Salesforce Marketing Cloud, MuleSoft, etc.

Management of escalated issues, incidents and outages, triage and driving prompt resolution

Provide prompt visibility and status of escalated issues, incidents and outages to leadership, business partners and other key stakeholders.

Responsible for Site Reliability Engineering aspects such as developing functional and technical knowledgebase of the application, creation of run books, developing observability of the application in terms of alerts, monitoring and dashboards that enable proactive incident and problem detection, triaging of the incidents and helping Tier 2 conduct blameless post-mortems (after action reviews).

Oversee daily T1 operations of premise and hosted applications and experiences, including data centers, compute, storage, data networks, monitoring and NOC.

Work with Release Management related to upcoming changes to production to identify risks and mitigate them.

Work closely with Product Development & Tier 2 SRE teams to ensure Knowledge Transfer related to changes to the system well in advance of change getting operationalized.

Optimize the overall T1 on-call process and incident response workflow, including managing the team’s on-call rotation, alert rules, communication methods and incident response plans.

Provide metrics and status reports and review with leadership and stakeholder communities; establish processes surrounding metrics gather, reporting and communication.

Staying current on feature development and how it could affect the system’s overall reliability.

Assist in developing, publishing and continually updating technology operations and support Standard Operating Procedures and detailed T1 documentation based on industry best practices.

Provide technical leadership with great communication skills, with an ability to create and organize self-motivated team.

Conduct rigorous due diligence on all plans.

Drive team engagement. Motivate individuals and teams beyond current scope of influence.

Champion and facilitate breakthrough solutions. Take appropriate, intelligent risks.

Create, enable and cultivate a culture of responsibility and accountability.

Lead by example and operate with transparency, integrity and respect.

Qualifications:

A suitable candidate for this position must possess the following applicable knowledge, skills and abilities. In addition, be able to demonstrate and provide applicable examples to support his/her competencies.

Bachelor's degree in Computer Science or Engineering, or a related field

10+ years of demonstrated leadership experience building cross-organizational consensus

10+ years of demonstrated experience building and managing high-performing teams

10+ years of demonstrated experience with Incident Management, Incident response and managing Tier 1 Production Operations team

10+ Years of supporting large scale eCommerce, Care, & Retail POS platforms & supporting applications in production in a leadership capacity.

Solid understanding and experience in Application Performance Monitoring tools like Dynatrace, AppDynamics, Introscope, etc.

Hands-on experience with Customer Experience Analytics & Session Based tools like Quantum Metric or Tealeaf

Hands-on experience with Synthetic Monitoring tools like Catchpoint

Experience working within scaled agile development team.

Experience developing and implementing customer journey dashboards to enable proactive monitoring of customer experience availability.

Experience designing and managing a world-class technical operations organization including 24x7 support and outage/incident management.

Solid knowledge of Operations practices and demonstrated experience increasing Operational capability maturity within an organization.

Excellent communication and presentation skills; the ability to present complex technical information in a clear and concise manner.

Proficient at analyzing and interpreting large amounts of data with the capacity to synthesize information and translate into effective and actionable insights.

Exceptional organization and planning skills, strong analytical abilities, and process-driven orientation

Unrelenting sense of customer-focus, urgency and accuracy with an execution mindset

Self-starter, creative, enthusiastic, innovative and collaborative outlook

Primary technical skills should include:

Java, Spring, WebLogic, AKS, and CI/CD tools, PL/SQL

Microservices based architecture using Java, J2EE, Jenkins, Maven, Linux, K8s, on both on-prem and in cloud.

Docker, Kubernetes and Microsoft Azure Cloud, Unix

Relational & NoSQL databases like Oracle & Cassandra

Experience with visualization tools like Kibana and Grafana. EFK stack experience preferred.: (Hands-on experience is must)

Creation of Dashboards on Dynatrace, ELK and Grafana. (Hands-on experience is must)

Secondary technical skills (optional, yet highly desirable):

Salesforce Development (Apex, Visualforce, Lightning), Salesforce Sales Cloud & Service Cloud, MuleSoft, Dynatrace and ELK (Elastic, Logstash, Kibana) for monitoring and logging

Hands on experience supporting Salesforce applications

Sales & Service Cloud

Experience with Marketing Cloud

Experience within high tech, software and/or wireless/telecom industry highly desired

Understanding of integration technologies and API Gateway, Mobile and iOS technology stack; Experience with MuleSoft desired

Solid technical background with understanding and/or experience in software development, web technologies and customer communications such as email, SMS and push notification.

Our Principal System Engineering, earn between $158,200 - $237,400. Not to mention all the other amazing rewards that working at AT&T offers. Individual starting salary within this range may depend on geography, experience, expertise, and education/training.

Joining our team comes with amazing perks and benefits:

Medical/Dental/Vision coverage

401(k) plan

Tuition reimbursement program

Paid Time Off and Holidays (based on date of hire, at least 23 days of vacation each year and 9 company-designated holidays)

Paid Parental Leave

Paid Caregiver Leave

Additional sick leave beyond what state and local law require may be available but is unprotected

Adoption Reimbursement

Disability Benefits (short term and long term)

Life and Accidental Death Insurance

Supplemental benefit programs: critical illness/accident hospital indemnity/group legal

Employee Assistance Programs (EAP)

Extensive employee wellness programs

Employee discounts up to 50% off on eligible AT&T mobility plans and accessories, AT&T internet (and fiber where available) and AT&T phone

A career with us, a global leader in communications and technology, comes with big rewards. As part of our team, you’ll lead transformation surrounded by trailblazing industry leaders like you. You’ll be empowered to go above and beyond – making a difference through company-sponsored initiatives or connecting and networking through one of our many employee groups.

And regardless of where you’re at in your career trajectory, you’ll be rewarded by the impact that comes with making a difference in the lives of millions.

With AT&T, you’ll be a part of something greater, do incredible things and be rewarded with a chance to change the world.

AT&T will consider for employment qualified applicants in a manner consistent with the requirements of federal, State, and local laws.

Ready to close the deal on a career with AT&T?

#ConsumerTechnologyeXperience

Weekly Hours:

40

Time Type:

Regular

Location:

Plano, Texas

Salary Range:

$158,200.00 - $237,400.00

It is the policy of AT&T to provide equal employment opportunity (EEO) to all persons regardless of age, color, national origin, citizenship status, physical or mental disability, race, religion, creed, gender, sex, sexual orientation, gender identity and/or expression, genetic information, marital status, status with regard to public assistance, veteran status, or any other characteristic protected by federal, state or local law. In addition, AT&T will provide reasonable accommodations for qualified individuals with disabilities.

Job ID R-34783-1 Date posted 11/06/2024
Confirm your E-mail: Send Email