How is Salesforce CI/CD different to any other CI/CD?

Short answer

There are several key reasons CI/CD for Salesforce is different:

  • Versioning metadata is different to versioning code
  • Git is not the best way to version your metadata
  • You need more than just git diff a.txt b.txt
  • Tracking changes in environments is a manual process
  • Existing tools create considerable waste in your workflow

Code vs Metadata

Here is an example of Python code:

# Source: https://bit.ly/3aJOcMI
from typing import Tuple

# Extended Euclid
def extended_euclid(a: int, b: int) -> Tuple[int, int]:
    '''
    >>> extended_euclid(10, 6)
    (-1, 2)
    >>> extended_euclid(7, 5)
    (-2, 3)
    '''
    if b == 0:
        return (1, 0)
    (x, y) = extended_euclid(b, a % b)
    k = a // b
    return (y, x - k * y)

Now, here is an example of Salesforce metadata (it contains the definition for two fields):

<?xml version="1.0" encoding="UTF-8"?>
<CustomObject xmlns="http://soap.sforce.com/2006/04/metadata">
    <fields>
        <fullName>MyCustomAccountField__c</fullName>
        <description>A custom field on the Account standard object.</description>
        <externalId>false</externalId>
        <inlineHelpText>Some help text.</inlineHelpText>
        <label>MyCustomAccountField</label>
        <length>100</length>
        <required>false</required>
        <trackFeedHistory>false</trackFeedHistory>
        <trackHistory>false</trackHistory>
        <type>Text</type>
        <unique>false</unique>
    </fields>
    <fields>
        <fullName>Phone</fullName>
        <trackFeedHistory>false</trackFeedHistory>
        <trackHistory>true</trackHistory>
    </fields>
</CustomObject>

The key difference between code and metadata is that code is a human creation. A piece of code is an expression of thought that may have any of the following elements as a point of discussion:

  • Abstractions and objects
  • Functions
  • Variables
  • Logic
  • Conventions and style
  • Quality
  • Security
  • Etc.

Metadata, though, is machine-generated. You likely won't be having amazing discussions with your team about a chunk of metadata. This is important because platforms like GitHub or GitLab (that are at the heart of any CI/CD workflow) are geared towards collaboration on code. If your CI/CD workflow primarily processes metadata, you're out of luck.

In Salesforce, you almost never have to write code. When you need to change how your company's Salesforce org looks or what it does, most of the time you can make changes in the browser by using the point and click UI built into Salesforce.

When you make changes in Salesforce, Salesforce stores your configuration in large XML (or JSON) files. Manually adding these files to version control and pushing your commits to a Git repository is a substantial amount of needless work. Using a VCS like Git for metadata versioning creates unnecessary labor and is time inefficient. Why? Metadata is not something you would want to store in a repository like GitLab, GitHub or Bitbucket. These repositories are all geared towards working with code written by humans, not machine-generated metadata.

Storing metadata instead of code also overlooks the core functionality and great features of these tools. If you won't be using the core functionality provided by these platforms, then why use them at all? Why not just store all your metadata in S3 or Dropbox, etc.?

Manual change tracking

The changes made to Salesforce environments are tracked manually. The steps needed to deploy your changes from a Sandbox environment to a Test or Production environment are:

  • Remember all previous changes
  • Be aware of all dependencies
  • Know how package.xml works
  • Know all 300 Metadata types that go into the package.xml
  • Work with Ant or SFDX to retrieve the changes (retrieving changes could take several minutes depending on the size of your changes)
  • Know how to interpret Salesforce API errors
  • Debug Salesforce API errors
  • Manually cherry-pick the XML lines you want to commit

If you take into account the number of times per day you have to repeat all the steps, you’ll realise how much time is wasted.

No isolation of work

So, the workflow goes something like this:

  1. Make changes in the Salesforce environment using an internet browser (cloud)
  2. Download the changes as metadata to your laptop (laptop)
  3. Add the changes to Version Control (laptop)
  4. Push the changes to the remote repository (laptop)
  5. Create a pull request (cloud)
  6. Run all tests and review the changes (cloud)
  7. Merge the pull requests (cloud)
  8. Deploy your changes to a Test environment or Production

If you are sharing a sandbox environment with your team, it's likely that several people will all be working on the same type of metadata. Individual work is not isolated. In practice, this means your changes will be interleaved with your team's changes when you download them. Here are several possible scenarios originating from a lack of work isolation:

  1. Someone from your team overwrites your changes without your knowledge.
  2. Extending the previous scenario: Because you are unaware of the change, and believe yours is the most recent, you decide to deploy it to Production. Not only would this lead to rework, but worst case the change contained a bug that's now in Production.
  3. Someone from your team deletes one of the dependencies you're relying on. Then, when you try to deploy your work strange errors arise that you don’t understand.

You are probably thinking that Scratch Orgs is the solution, but they too have their own problems. For example, time-consuming setup, volatile changes, SFDX, etc.

Diff Hell

To see the difference between your last committed version, and the latest version of the metadata, a simple git diff a.xml b.xml command won't do. Here is why:

  • Your changes are not isolated, but are interleaved with your team's changes, making the diff result unreadable and collaboration difficult.
  • You’d have to remember all the changes you made, including all dependencies, when you're downloading the changes to your laptop. If you forget to include something in your package.xml, and later try to deploy your metadata, you'll get a deployment error!
  • None of the diff algorithms available in Git (Myers, Minimal, Patience or Histogram) can accurately compare XML documents, because XML documents require specialised algorithms that can handle trees or graphs. As a result, the difference between versions generated by Git is difficult to understand and can lead to subtle bugs that cause weird deployment issues.

Remember, you might have to generate diff several times a day, so the cognitive load and the error rate could easily multiply.

Version Control System (VCS)

VCS is an integral part of any CI/CD workflow. In fact, CI/CD workflow starts with Version Control — it's the entry point into CI/CD. So, let's examine if the existing VCS options are a good fit for Salesforce.

When using Google Docs, you don't really think about Version Control and how the document history is created. Everything is done by Google, in the cloud. This is a great feature, as it helps users focus on core tasks and stay productive. Imagine a scenario where Google Docs didn't have Version Control built in. If you wanted to keep track of your change history, you would first have to download the Google document to your computer. Then, create a branch, stage, commit and push your changes to remote. Everyone else in your team would be required to do the same to have a central repository with the history of all of your company's documents. Sounds ridiculous? Well, that's exactly how a majority of companies set up their CI/CD workflow for Salesforce projects.

If you were to start from scratch, and pick a VCS today, you would have four main options:

  • Subversion (written in C; initial release 2000)
  • CVS (written in C; initial release 1990)
  • Git (written in C; initial release 2005)
  • Mercurial (written in Python; initial release 2005)

In 2005, when Git was created, Salesforce was just beginning to gain recognition. None of the VCS above had Salesforce in mind when being developed. At the same time, Salesforce's Technology budget in 2005 was no bigger than $10M (As reference: Today, Salesforce has customers who pay it over $10M a year alone). Salesforce was not going to start building its own VCS, so why not just use what's already out there? Moreover, building a proprietary VCS is no simple task. "Wave took 2 years to write and if we rewrote it today, it would take almost as long to write a second time." — said Joseph Gentle, a former Google Wave Engineer. The VCS technology behind Wave is what Google Docs uses today! That technology is called Operational Transformation and its concurrency model is different to Git.

Which VCS would you pick today? Probably, Git. Git has extensive support and is used by some of the world's top companies. Windows, Linux, Node, React, Python — all are versioned using Git.

When Linus Torvalds invented Git in 2005, he needed a fast, distributed, and non-linear Version Control System for the Linux kernel (which is 95.7% written in C). Git wasn’t designed to version tree or graph data structures originating in the cloud (such as Salesforce metadata). When companies start using Git for tasks it was not initially designed for, they start having unexpected problems. The same happens when Git is used on Salesforce projects. A Linux project has little in common with a Salesforce project, but the VCS they use is the same.

Another aspect of Git that's worth mentioning is it's complexity. Mastering Git is a challenge for many developers, not just Salesforce Administrators. Because Git is so confusing, the learning curve, onboarding and (ongoing) training costs for your team can be significant. So, if you use Git, your development costs can increase.

Additionally, as Git commits are manual, your project history can get chaotic very quickly. In fact, another challenge of using Git is keeping project history clean and knowing how to efficiently search through it. If your team struggles with keeping a clean history, you’ll find debugging harder and taking longer.

Overall, having Git a part of Salesforce CI/CD is a problem for many reasons. Using the right kind of VCS, on the other hand, can completely transform how your team works and what kind of results they deliver.

Workflow Waste

Creating a workflow that’s supported by a set of generic tools is not always the best idea. As your project's requirements differ, there will be extra costs related to setting up and maintaining its generic tools.

Here is a typical CI/CD setup for a Django app (Python):

  1. Atom or Visual Studio Code — your IDE
  2. Git — Version Control
  3. Fork or SourceTree — Git client
  4. Docker — Containerization technology
  5. Docker Hub — Docker image repository
  6. GitHub for your CI/CD
  7. CircleCI — In case you need extra power and advanced configuration

Here is a typical CI/CD setup for a Salesforce project:

  1. Atom or Visual Studio Code — your IDE
  2. Git — Version Control
  3. Fork or SourceTree — Git client
  4. Docker — Containerization technology
  5. GitHub for your CI/CD
  6. Ant or SFDX — Salesforce API client

Django and Salesforce projects are completely different, but still use a very similar CI/CD workflow. What works for a Django project does not necessarily work for Salesforce. Typically, Salesforce projects that use the above setup have numerous manual tasks and excess waste in their workflow.

Finally, we estimate that up to 1/3 of a typical Salesforce project costs relate to the manual tasks of the CI/CD workflow.