Cloud sprawl happens when development teams spin up new cloud resources, forget about them, then move on to the next urgent task.
Migrating to the cloud offers federal agencies huge advantages in performance and flexibility. Government services can’t effectively scale or adopt new capabilities like big data analytics, artificial intelligence, machine learning and internet of things without migrating to the cloud. But government cloud adoption has empowered an old IT nemesis: shadow IT.
Shadow IT is the use of IT systems, devices, software, apps and services outside the supervision of an organization’s approved IT systems. In the past, shadow IT was typically a business unit creating their own locally developed applications, or LDAs, because the office of the chief information officer engagement was judged too onerous. During my time in public service, I saw personnel surreptitiously use Microsoft Access to address an urgent data processing need that inadvertently turned into a mission-critical mission system. This was only discovered when Microsoft Access reached its scaling limits and then turned into an emergency project to transform it into a web-based application.
Building LDAs is even easier when using cloud services. This opportunity for shadow IT is exacerbated by government mandates to move to the cloud prior to the development of a governance structure that can monitor and manage such a move. Combine all this with the very human tendency of development teams to experiment with creating cloud resources and not clean up after themselves, and the result is more shadow IT and cloud sprawl.
Cloud sprawl is inefficient use of the cloud: over-provisioned, over-scheduled, underutilized or orphaned cloud assets. It often happens when development teams spin up new cloud resources, forget about them, then move on to the next urgent task. Even when cloud servers are terminated, the servers’ storage volumes—in a sense virtual hard drives—are often left behind. This creates orphaned cloud resources.
Teams also size cloud resources too large based upon the legacy technical specifications coming from on-prem data centers, instead of starting small and using cloud elasticity for auto-scaling. This results in over-provisioned and underutilized resources. This cloud sprawl increases costs and often leads to overruns in government program budgets.
Cloud sprawl and the related lack of governance can also make agencies more vulnerable to data breaches. When development teams create cloud resources, they may not fully understand the impact of its related configurations, as was the case in the 2019 Capital One data breach that enabled access to sensitive records stored in Amazon Web Services S3 buckets. To mitigate the risk introduced by misconfigured cloud resources, agencies need to define cloud usage standards and implement ways to monitor compliance to those standards.
Effective implementation of AIOps is the answer to modern-day shadow IT and cloud sprawl. Here’s the Gartner definition: “AIOps combines big data and machine learning to automate IT operations processes, including event correlation, anomaly detection and causality determination.”
One cloud-centric AIOps solution is robotic cloud automation, or RCA, a suite of AIOps capabilities that establishes governance guardrails and enforces usage standards across multiple cloud environments. For critical standards compliance issues, it can also remediate the non-compliance findings by bringing cloud resources back into the desired state configuration. This delivers significant cost savings and security improvements through automated monitoring, reporting and remediation of compliance issues.
For all enterprise cloud hosting teams, the first step to regaining control is to define your standards. When agencies are considering which standards to establish, they should embrace established industry standards. RCA is aligned with some of the most widely respected standards in the industry, including Center for Internet Security Benchmarks, NIST 800-53 and AWS Foundational Security Best Practices. These provide baseline standards to start from, including hundreds of configuration guidelines to safeguard cloud environments against today’s evolving cyber threats.
As mentioned above, for many agencies the genie is already out of the bottle. Cloud adoption preceded a management structure, and teams have already created the cloud sprawl and violated security best practices. In such cases, RCA deployment follows a predictable iterative implementation pattern by first enabling monitoring and reporting to understand the depth and breadth of the compliance challenges. Then agencies need to drive effective communication and change management strategy that engages the cloud users, to adopt the new cloud standards and iteratively drive improved compliance.
Once fully compliant with a standard, RCA can enable automated remediation, which locks-in future compliance by maintaining the desired state configuration of cloud resources in perpetuity. For example, for every new server spun up in the cloud, RCA evaluates compliance to three core configurations: proper tagging, encryption and standardized security group usage. If the server fails any of these tests it is automatically terminated. Cloud sprawl is nipped in the bud. It’s truly governance as code.
RCA is a powerful enforcement tool for any CIO managing a multitenant cloud environment. Yet critically, it’s not enforcement in the old, top-down model of the past. RCA provides AIOps that enable teams to own more of the security responsibility because a cloud hygiene baseline is “baked” into the system. Agencies can save millions by embracing AIOps, shutting down existing cloud sprawl, and preventing it from happening again in the future.
Gone are the days when one central IT team could support 20, 40, 100 separate development groups. It simply isn’t possible due to the complexity of cloud service offerings, even if government agencies had the budget and the talent pool to attempt it.
I do understand the lingering appeal of the “do it ourselves” approach. I remember 10 years ago wondering if government could truly trust the big cloud service providers to support agency infrastructure and mission. That question has been definitively answered: yes. The cloud provides incredible capabilities to agencies we couldn’t imagine a decade ago. For example, the CSPs have perfected automated database failover in their managed database products that enable reliable and consistent failover in minutes.
Long gone are the days of engineering database synchronization and manual failovers. Now RCA enables AIOps for government to eliminate shadow IT, cloud sprawl and securely explore the potential of the cloud.