Sr. Site Reliability Engineer - R1011025.01
Remote, TX 75001 US
Connexion’s mission is to provide "best in class" services to job seekers. We strive to achieve excellence in job placement, staffing, and recruiting services while treating candidates with the professionalism and respect they deserve.
Title: Site Reliability/DevOps Engineer
Hiring Organization: Connexion Systems & Engineering
- Duration: Temp to Perm
- Pay rate: $ 60.00-70.00/hr
- Job Location: Remote (TX)
- Job# 15039
Site Reliability & DevOps Engineer
The Site Reliability & DevOps Engineer is accountable for the availability, reliability, and performance of the services and platforms in a highly transactional 24x7 environment. When error budget is below the threshold/within tolerance limits, SRE works on application development and bug fixes activities as part of DevOps responsibilities.
Role & Responsibilities:
? Help build a Site Reliability Engineering culture by sharing best practices, approaches, documentation, and code with other engineering teams
? Define and setup KPIs to monitor Error Budgets
? Implement strategies to ensure Error Budgets stay above the defined-acceptance levels
? Define and implement response mechanisms when Error Budget thresholds are breached
? Apply automation and software to any tasks or parts of the system that would benefit from it or are performed manually;
? Able to troubleshoot complicated issues handling OS, Networking, Database in a cloud-based SaaS environment and handle live production incidents, debug/troubleshoot infrastructure and application issues, including development and testing
? Monitor application performance, take steps to improve overall application performance and stability and follow through with implementation (design, develop and test);
? Conduct system analysis, configuration management and develops improvements for system software performance, availability and reliability;
? Design, write, ship, and motivate the creation of software and systems to increase observability, product reliability and organizational efficiency;
? Work closely with software engineers and QAs to ensure the system is responding properly to no-functional requirements such as performance, security, and availability;
? Document your system knowledge as you acquire it over time, create runbooks, and ensure critical system information is readily available to those who need it;
? Maintain and monitoring deployment, orchestration, of the servers, docker containers, databases, and general backend infrastructure;
? Keep up-to-date with security and proactively identify, diagnose, and solve complex security issues.
? Design, Develop & Test Java, SpringBoot, GraphQL based REST/JSON Web Services deployed on AWS ECS Fargate.
? Design, Develop & Test Typescript, NodeJS based REST/JSON Web Services deployed on AWS Lambda.
? Design, Develop & Test AWS AppSync based GraphQL services.
? Design, Develop & Test Terraform based Infrastructure as Code scripts to automate AWS infrastructure setup
? Bachelor’s Degree in Computer Science or related; or equivalent combination of education and experience
? 10+ yrs experience in full-stack application development & maintenance in DevOps/SRE role
? 3+ yrs experience in the above-mentioned AWS services to perform the trouble shooting and development activities for platform/application enhancements
? Proficient in scripting languages such as Powershell and/or Python
? Troubleshooting utilizing built-in browser tools
? Ability to distill technical and complex principles or scenarios to all levels of our organization
? Knowledge of DevOps methodologies and the tools involved such as CI/CD concepts, CI/CD tools (Jenkins, CodePipeline, etc.), automation and configuration tools (Puppet, Ancible, etc) a plus.
? Knowledge of public clouds (GCP, AWS, Azure) inclusive of implementing projects on public clouds a plus.
? Ability to self-govern workload and show discipline around priority and time management, even while working remotely or in the absence of direct management for an extended period of time
? Ability and willingness to adapt to new application stacks and new technology concepts as the business evolves over time
? Excellent communication skills, both verbal and written
? Ability to collaborate with local and remote teams in different time zones
? Ability to present/lead technical discussions.
1. SRE practice setup including standards, guidelines, metrics etc
2. Solution design
3. Issue resolution to include documentation, code development, testing and deployment
Solutions and Code reviewed and signed off by Product Owner and App Engineer;
Please use the apply button to submit your resume for consideration. A Connexion Representative will contact you immediately.
When responding to this job posting you MUST include the Job# and Job Title in your subject line.
If you are active in a job search but this job is not for you, please reach out to . We would be glad to help you find the perfect job!