IntroductionObserving and tracking the correct functioning of workloads running on both cloud and on-prem can be a challenge. The scale, distribution and diversity of systems can add complexity to day-to-day operations. Common tasks such as logging, compliance-checking, troubleshooting, patching and upgrades can become time-consuming and tedious, particularly when conducted manually. Show
AWS Systems Manager (formerly Simple Systems Manager or SSM) is the remedy to this common hybrid infrastructure problem. . Systems Manager is an AWS service that allows for the automated monitoring and control of a wide variety of supported AWS and local infrastructure instances. Accessible via a central console, it provides a variety of tools that can perform operation, application, change control and node management. This article is a beginner’s guide to AWS Systems Manager, where we will explore its capabilities, provide guidance on getting started and give a high-level SSM process flow. Then we will describe the installation and management of the SSM agent, discuss automation and provide a practical troubleshooting scenario, highlighting the power and utility of AWS Systems Manager. To conclude, we will look at AWS Systems Manager’s limitations and alternatives. CapabilitiesAWS Systems Manager contains several tools which are organized into the five ‘Capability Categories’ shown below. These tools allow us to perform operational tasks swiftly against various resource objects. Resources such as EC2 Instances, Amazon S3 buckets and even on-premises servers can be associated with resource tags. These tags define membership of a Resource Group, against which we can then view operational and troubleshooting data. AWS Systems Manager Features -Figure 1.1The following table highlights the elements and features of AWS Systems Manager, placed into their Capability Categories.
Getting StartedFrom a high level, using Systems Manager breaks into three stages – Group, Visualize, and Action. Three steps to using AWS Systems Manager – Figure 1.21. Group: We can create logical groupings of AWS resources. Grouping is a foundational precursor to performing operations (such as compliance management, patching, and automation) on AWS resources. Create Resource Group (AWS Management Console) – Figure 1.3(source)2. Insights: AWS Systems Manager automatically displays aggregated operational data for each resource group via a dashboard. You can also integrate CloudWatch Dashboards, AWS Config rules, AWS CloudTrail, and AWS Personal Health Dashboard (PHD). Inventory Insights (AWS Management Console) – Figure 1.4 (source)3. Actions on Insights: You can act upon insights, or perform administrative actions on the resource groups that were defined in step 1, via the central console. AWS Systems Manager Process FlowNow we have covered the basics of AWS Systems Manager. Let’s take a deeper look into the AWS Systems Manager Process Flow, the general set of steps taken to access and use AWS Systems Manager,allowing us to perform actions on AWS EC2 instances, edge devices, and virtual machines. Each Systems Manager capability conveniently follows a similar process, regardless of which one is selected. AWS Systems Manager Process Flow – Figure 1.51. Access AWS Systems Manager: AWS provides three ways to access Systems Manager, via the AWS Management Console, AWS Command Line Interface or AWS SDK. For example, open AWS Management console and type “AWS Systems Manager” into the search bar as shown in the figure below. Access AWS Systems Manager (AWS Management Console) – Figure 1.62. Select Capabilities: Systems Manager provides a variety of capabilities (see figure 1.1). Each capability serves a different purpose, for example, you could select the Fleet Manager option to apply patches against a fleet of nodes. 3. Processing: This consists of two stages. AWS Systems Manager verifies user permissions and then the SSM agent (discussed later) performs relevant actions on the selected resource. 4. Reporting: After making any configuration changes, the SSM agent reports the status of the resource to the Systems Manager or other configured AWS services. SSM AgentSSM (AWS Systems Manager Agent) is a lightweight software agent that allows AWS Systems Manager to update, configure and manage the resource that it is installed on. The concept is similar to the OpsRamp Agent, which can deliver analytics for hybrid asset inventory, incident remediation and OS patching. Many AMIs (Amazon Machine Images), such as Amazon Linux and Windows server 2019, have the agent preinstalled. Manual installation is possible for images that do not. In the following section we will see how to do this. Installing SSM Agent into Ubuntu ServerTo manually install the SSM Agent into a Linux OS, you can use Debian application packages or Snap application packages. Installation with Snap Package: 1. SSH into your EC2 instance with the associated .pem file 2. Run the following command.
SSM Agent installation Command – Figure 1.7 The Output will look like this: The Output of SSM Agent installation command – Figure 1.83. Check the status of the SSM agent with the below command:
Check Status command – Figure 1.9 Active Status of the SSM Agent – Figure 1.10SSM Agent LogsThe SSM agent reports detailed information about state, execution and error status to local log files. These can be examined directly from the resource. We can also send log files to AWS CloudWatch Logs to aggregate and monitor them in greater detail. Sending SSM Agent logs to CloudWatch LogsYou can follow this step-by-step procedure to configure SSM Agent log-forwarding. AWS Systems Manager publishes metrics to CloudWatch about the status of the resource Run Command, including ‘success’, ‘fail’, or ‘delivery time out’. Additionally, you can configure alarms if a status of ‘success’ is not reported for any specified SSM Command document. Run the following command to view the metric using AWS CLI
Metrics using AWS CLI- Figure 1.11 Further information about Run Command Metrics can be found here. Now that we have covered the basics of AWS Systems Manager and the SSM agent, it is time to look at a more practical example. Automation with Systems ManagerThere are several remediation, maintenance, and deployment tasks common to AWS services, including Amazon Elastic Compute Cloud (Amazon EC2), Amazon Relational Database Service (Amazon RDS), and others. Using Automation (a capability of AWS Systems Manager), we can simplify the deployment and management of AWS resources to achieve operational efficiency and minimize errors often associated with manual intervention. AWS offers an “automation toolkit” called “Runbooks.” Runbooks are documents that contain routine maintenance activities or tasks that respond to events. Runbooks are pre-defined by AWS for immediate use. We can also define our own runbooks to meet any specific need that we may have. The following (see below screenshot) is a sample runbook created using YAML, that creates an AMI image, via the following commands:
Sample Runbook to create Amazon Machine Image (AMI) of an instance- Figure 1.12 (source) Troubleshooting ScenarioOccasionally, we may not be able to connect to an AWS Windows (or other) instance. The underlying problem could be a system misconfiguration. But how do we check and change the configuration, if we cannot connect to it? This is the basis of our scenario. Problem with a Windows Instance – Figure 1.13 (source)From the above image, we can see that the Windows EC2 instance is not passing all of its health checks. As a result, we cannot access and investigate the Windows machine directly. Potential causes of this connectivity issue could be:
To aid the troubleshooting process, we could perform a number of proactive and reactive tasks, such as:
However, these investigative steps are manual and require a good understanding of the underlying operating system. They may even carry a degree of risk. For example, the Windows Registry contains many important system parameters and a wrong action, such as the accidental deletion of a key, could significantly impact the system. Instead, we will make use of AWS Systems Manager. High-level SolutionAWSSupport-ExecuteEC2Rescue can be usedto detect and resolve issues with a few mouse-clicks. AWS Support-ExecuteEC2Rescue is an AWS Systems Manager automation document composed of sequential steps that remediate standard Windows issues. These steps are described and numbered below. Automated solution with Systems Manager- Figure 1.14 (source)
Low-level Step-by-Step SolutionIf you are new to the AWS CLI (Command Line Interface), the AWS CLI guide is available here and is an invaluable reference. 1. As the Windows server is not passing any health checks and is not accessible, we will run EC2Rescue via the CLI. To run this document, you will need to pass the following:
Executing EC2Rescue Command – Figure 1.15 2. Now we have to look at the results. If you wish to monitor the execution of the command, you can use the below command with the Automation Execution ID.
Monitoring the progress of EC2Rescue – Figure 1.16 3. After executing the command, we can see from the below AWS Console screenshot that the Windows instance is back online and passing its health checks. Status of Windows instance – Figure 1.17 (source)4. To check where the original problem occurred, run the following command :
Output the execution information – Figure 1.18 Final ResultEC2Rescue has discovered that:
Suggestions by EC2Rescue – Figure 1.19 LimitationsAWS Systems Manager is a useful and powerful tool that allows organizations to operate complex infrastructure at scale, both safely and securely. Despite this, there are still a number of disadvantages.
Final ThoughtsWe have discussed the significance of AWS Systems Manager and have utilized one of its key capabilities; automation. Although AWS Systems Manager is an excellent tool, offering strong tracking and remediation functions, it also comes with several glaring limitations. For a one-stop solution with remediative AI and greater cloud integration, OpsRamp is a viable alternative. It also comes with a free demo. You like our article?Follow our monthly hybrid cloud digest on LinkedIn to receive more free educational content like this. Follow us on LinkedIn What is AWS system Manager documents?An AWS Systems Manager document (SSM document) defines the actions that Systems Manager performs on your managed instances. Systems Manager includes more than 100 pre-configured documents that you can use by specifying parameters at runtime.
Which AWS Systems Manager feature can connect to an instance directly from the AWS management Console?RDP connects into your Windows servers through a few simple steps in the Fleet Manager console, providing access to your server or server-based application. You don't need to install additional software, set up additional servers, or open direct inbound access to ports on the instance.
What is AWS Systems Manager Session Manager?AWS Systems Manager Session Manager is a new interactive shell and CLI that helps to provide secure, access-controlled, and audited Windows and Linux EC2 instance management. Session Manager removes the need to open inbound ports, manage SSH keys, or use bastion hosts.
Which AWS Systems Manager feature will enable John to access EC2 instances without opening inbound ports in ec2s security groups?Session Manager is a fully managed Systems Manager capability that enables AWS Cloud9 to connect to its EC2 instance without the need to open inbound ports.
|