Note: This post is old and the information in it may not be accurate. Please proceed with caution!
Cloud
Strategy Guide

In part one, I discussed many of the myths around cloud use in government. In this article, I will describe critical strategies to address these myths that every organization should embrace before, during, and after moving to the cloud. These strategies are generally intended for civilian Federal agencies of the United States, but the recommendations below apply to any public sector organization - and even some private organizations as well.

Both guides are available to download as a single PDF

  1. Chapter 1 - Migrate Pragmatically
  2. Chapter 2 - Plan to Your Budget & Staff
  3. Chapter 3 - Embrace New Security Models
  4. Chapter 4 - Understand What You’re Buying
  5. Chapter 5 - Build a Family Farm
  6. Epilogue - Getting More Help

Chapter 1 - Migrate Pragmatically

The first thing to accept is that not all projects are appropriate for the cloud, and not all organizations have the skills necessary to fully take advantage of the cloud. With that as a starting point, an organization needs to come up with a way to rationalize its application portfolio, to determine what should stay on-premises and what should be modernized.  As a general rule, “lift-and-shift” - moving an application without rewriting it for the cloud environment - is almost never cost-effective for Infrastructure as a Service (IaaS) offerings unless it’s already a very modern system in the first place. On the other hand, basic websites with mostly static content are ideal for moving into Software as a Service (SaaS) or Platform as a Service (PaaS) offerings.

The CIO Council’s Application Rationalization Playbook (disclaimer: another document I worked on) is a useful starting point for this evaluation. Specifically, an agency should work up a thorough analysis of alternatives between various SaaS, PaaS, and IaaS offerings against the existing on-prem setup, or a hybrid environment. A major consideration here will be the Total Cost of Ownership (TCO), which should take into account not just service costs, but also staffing, support, and training costs. However, the lowest priced option may not always be the best choice (as I’ll be covering below).

Cloud.gov, is an offering from the General Services Administration (GSA) that bundles several Amazon Web Services (AWS) offerings in a government-friendly “procurement wrapper” can make migration even easier for agencies. It’s an excellent platform for small agencies, or for large agencies that just want to prototype a new concept quickly.

When you do start moving applications, it’s important to start tagging your assets - accounts, virtual machines, workflows, etc. - as early as possible to make accounting easier. Always include the project name and the customer organization at a minimum. Some providers also allow you to easily isolate a project or office’s services into a resource group, and this can also simplify this process. This is very important to allow easy payback or showback of funds, but for these models remember to include in these costs the TCO aspects not captured - e.g. staff time and contractor resources.

I strongly recommend agencies take a very cynical stance on so-called low-code/no-code platforms, customer-relationship management tools (CRMs), and workflow management solutions. Many of you may remember the promises of “Business Intelligence” solutions in decades past, where agencies were fleeced for billions of dollars in configuration costs - these solutions are simply using a new buzzword for the same idea. These all promise to reduce costs but are often vastly more expensive than just building a tool from scratch - and the agency becomes completely locked-in to a single vendor until they replace the application entirely. The brilliant Sean Boots of the Canadian Digital Service has presented a “1-day rule” to help identify these boondoggles.

  Checklist
Rationalize the application portfolio
Don’t lift-and-shift
Use cloud.gov
Properly tag cloud assets
Avoid low-code/no-code/crm snake oil

Chapter 2 - Plan to Your Budget & Staff

The easiest way to avoid risks and unexpected costs is to simplify as much as possible. Civilian agencies should not be investing in bleeding-edge technology solutions - they’re too risky and expensive to maintain. Instead, pick the simplest possible solution that can be supported by your staff. The average agency should be aiming to stay well behind the “hype curve” into the “plateau of productivity.”  Since most of the complexity is hidden from the customer, SaaS and commercial-off-the-shelf (COTS) tools are less risky than PaaS and IaaS options overall (provided you follow the 1-day rule above). This goes beyond just cloud, and applies to most anything you’re building. Most agencies, for instance, also should absolutely not be attempting to build a fancy React/Redux/GraphQL single-page application when a plain Wordpress or Drupal website with a few plugins will fulfill the customer’s needs. Building native mobile applications should be completely avoided by most organizations as these can cost millions of dollars a year just for upkeep - instead they should build mobile-friendly, responsive websites. Any custom application or tool may not be a sustainable solution given the high complexity and cost of engineers. This also means that agencies should be simplifying their requirements to the minimum necessary when comparing alternatives, not just the software itself. Avoiding “one-off” projects and special requests will save massive amounts of time and money.

Instead, agencies must be actively investing in their staff. Agencies should allocate two to three times the standard training budget for IT and technology-adjacent staff, including project managers, program managers, and acquisition professionals. Some vendors provide a limited amount complementary training, but inevitably agencies need more than these free offerings. This training should include non-IT topics as well, including diversity awareness training, accessibility, plain language writing, project management, agile development techniques, and budgeting and procurement. GSA offers a variety of programs covering many of these areas.

This must also include hands-on training - sitting through a webinar is no replacement for actual practical engineering experience. These staff need to be given the time and flexibility to practice these skills to develop them - building small test projects and trying out tools. The best teams are constantly changing and learning, so setting aside up to 10% or more of the staff’s time just for practice is not unreasonable - some private sector companies set aside 20%. All of these investments will pay off richly for agencies. Also, make sure your staff is cross-trained and able to fill gaps as they occur.

As your staff begins to understand the new cloud paradigms, it will be important to modify your existing processes to handle the agility the cloud brings. Instead of slow, end-to-end, waterfall process “monorails”, set operational parameters as “guiderails.” Your acquisition process should be modified so that cloud can be purchased like a utility. You should not need to have a Change Control Board meeting anytime someone wants to create, resize, or destroy a virtual server. Plan a cost range that the entire project will fit within and review as needs change, along with monthly or quarterly portfolio reviews to stay on top of the budget. Instead of codified “gold disk” server images maintained by your team, consider template security rules.

  Checklist
Simplify the requirements and architecture
No mobile apps, avoid single-page webapps
Train and cross-train your staff
Allocate time for personal development
Update processes to set guardrails instead of monorails

Chapter 3 - Embrace New Security Models

Agencies must be able to manage the security of everything they run. Going back to the previous strategy, an agency should not deploy anything it cannot manage, and that goes for security as well. This is equally true in on-premises environments, but new operating models require new security models. Both your operations and security teams will need to be familiar with just about every setting that can be changed in your cloud environment - and how to lock them down to prevent exploitation.

Organizations should no longer assume that a solution is secure just because they did an up-front initial review. The Federal government uses a security review process for services and applications known as the Authorization (or Authority) To Operate (ATO), but the implementation varies from agency to agency. Traditionally this is a series of standard security controls that are reviewed, checklist-style, by an agency once every three years. However, agencies that have excelled at cloud security have moved to Continuous Authorization, using monitoring tools to actively verify that the security controls are being met and maintained, twenty-four hours a day and seven days a week.

However, these monitoring checks still must evolve with the products being monitored to make sure new vulnerabilities have not appeared outside the scope of existing checks. As per usual with cybersecurity, vigilance is key. Since attackers are constantly evolving their methods, tools that automate security responses as well should be used whenever practical - especially built-in, native from the large vendors that are constantly evolving to meet these threats.

To help combat this second issue, the Federal government has been moving away from so-called “castle-and-moat” perimeter-based security methods which only monitor network traffic. Instead, an approach known as Zero Trust has appeared, taking a data-first methodology of protecting systems instead of just the perimeter, verifying user identities in real-time, and allowing staff to only have access to the minimum amount of information necessary to fulfill the task at hand. In this way, when the perimeter is inevitably breached, the data assets contained within are still secure.

It also should go without saying that teams should be using multi-factor authentication on all privileged accounts. Whether developers or administrators, using more than just a username and password will dramatically reduce the risk of exploitation. The Federal government has “PIV cards” that are generally used on most devices, but if the vendor does not support them, implementing a token system via any of the commercially-available platforms is fine: Google Authenticator, 1Password, Microsoft Authenticator, and YubiKey are all worth looking at. However, organizations should completely avoid text-message codes sent to phones, as these are easily intercepted.

For public customers that will need to login or prove their identies, all U.S. government agencies should be using Login.gov.

  Checklist
Research all product configuration settings
Implement continuous monitoring, not just compliance
Use security automation tools
Leverage zero-trust practices to protect your data
Use MFA & Login.gov

Chapter 4 - Understand What You’re Buying

Cloud isn’t going to make your teeth whiter or your breath fresher or fix all of your problems, regardless of what the salespeople tell you. You need to know exactly what you’re buying. Before making an investment, make sure you fully understand what capabilities you’re purchasing and what parts you - and the vendor - will be responsible for.

If your evaluation team does not have technical expertise, bring engineers into the conversation early, to sort the truth from the sales pitch. As discussed in the previous article, you may not be getting autoscaling or load balancing or other features you’ve assumed just happen “automatically” - and if available these features definitely will not be free. You may have to build more “glue” between services than you assume, and someone will have to maintain this connective tissue.

Also keep in mind that the government cloud regions (or “govcloud” by some vendors’ naming) provide different versions of these tools than the commercial ones. As a result, not all features or solutions will be available - so again, plan ahead. Though, in most cases, civilian agencies not dealing with highly-sensitive data should consider using the commercial versions whenever possible - the security differences are not so great as to be insurmountable, but the functionality limitations are huge.

Before implementing a service, do careful research on the service limits - maximum traffic or number of virtual machines or emails that can be sent, etc.. Do not just trust what you are told by a vendor’s engineers or customer representatives - most of the time, they also do not know about these limits until you run aground on them. You should estimate your expected usage - number of site visits and/or users and/or emails, etc., and actually spend the time to search through user forums to make sure no one has hit a limit related to what you’re doing.

Customer Experience (CX) is another area where the private sector has been building people-friendly interfaces into their SaaS solutions, and agencies can skip a lot of the hard work and directly benefit from the results. Metrics and feedback-loops are often built-in as well. Maximizing these built-in elements can radically improve an agency’s public satisfaction scores at little or no additional cost.

  Checklist
Validate assumptions; know your responsibilities
Consider commercial cloud instead of govcloud
Research service limits in advance
Leverage built-in CX tools

Chapter 5 - Build a Family Farm

Given that agency IT budgets continue to be cut, and staffing has not increased in 40 years, agencies are largely unprepared to completely rewrite and replace all of their legacy systems.  Moreover, “IT Modernization” as a concept is an unending pursuit, as in Zeno’s paradox of Achilles chasing the Tortoise, software written today is legacy tomorrow. Agencies will need to use all available funding sources to overcome their deep technical debt, prioritizing those that present the greatest risk: those that are unmaintained, frequently used by customers, and lacking in resilience and redundancy. Under this scrutiny, agencies may find that their public websites are a bigger risk than older backend systems.

Also, rather than replacing entire large monolithic systems, they should pull off pieces and replace them independently as resources are available.  This can be done by isolating functions and building microservices, but that approach can often lead to expensive, unnecessary complexity. Agencies should not be afraid to build a newer parallel monolith adjacent to the existing one - again, keep in mind that it’s not the size that’s the concern, but the complexity and sustainability.

That all being said, the government does have major shortcomings in redundancy today, and too many systems have a single point of failure. At a minimum, agencies should be using cloud for data backup of critical systems whenever possible. I also strongly recommend agencies consider creating load balancing and caching layers in the cloud in front of on-premise public-facing systems to deal with unexpected loads.

One final concern is automation. Many organizations begin their cloud journey with unrealistic goals for maturity. The practice of Infrastructure as Code is incredibly popular at the moment, where we talk about treating virtual servers as “cattle, not pets.” An unprepared agency may immediately think that they need to be using all of the most cutting edge tools and technologies at first, but this would be a critical mistake. Instead, following the principles relating to complexity in the sections above, agencies should aim to create a “family farm” - only automating that which they can realistically manage. For instance, there is absolutely nothing wrong with only using a few virtual machines and load balancers instead of a fully configuration-only architecture. The great thing about cloud is you can evolve as your team grows, but it’s incredibly difficult to reduce complexity you’ve invested in if your team shrinks.

  Checklist
Assess technical debt by risk
Replace monoliths a piece at a time
Don’t over-automate
Use cloud backups and load balancing as soon as possible
Build a small “family farm” to start

Epilogue - Getting More Help

These strategies are a starting point towards a successful cloud rollout. If you run into trouble, want to talk shop with your peers, or would like to share your own strategies and experiences, there are several communities to engage with:

  • The Federal CIO Council Cloud and Infrastructure Community of Practice is the main Federal group for discussing these topics. However, they are currently in the process of changing their charter to allow any U.S. government staff to participate: Federal, state, and local. Membership is free.

  • The ATARC Cloud and Infrastructure Working Group is free and open to any government staff, though private sector companies must pay to be members.

  • Cloud & Coffee (presented by ATARC & MorphWorks) is a biweekly podcast hosted by myself and Chris Oglesby. Each episode, we chat with a guest about their personal experience with technology modernization, and there’s a live Q&A open during the chat. Any ATARC member can participate; old episodes are publicly available on Spotify.