At Gearset, our engineering team are constantly making iterative updates to our solution to make our users’ lives even easier. In this article, one of our engineers, Steve Ballantine, talks through some recent semantic merge improvements they’ve made to Gearset’s platform and their impact.
Merging changes in Gearset
Gearset is a DevOps platform for Salesforce. A core component of our solution is helping users maintain Salesforce metadata within a source control repository, just like any other source code.
As a software engineer, you’re probably familiar with using the git merge command to merge changes from a branch you’ve been working on with some long-lived branch.
Our customers targeting the Salesforce platform need to do the same, so within Gearset we need to perform the same kinds of operations. However, instead of more traditional source code files, we’re often dealing with XML formatted metadata.
If you’re interested in how this looks in practice, then take a look at our documentation on resolving merge conflicts.
Our “semantic merge” solution leverages our knowledge of the structure of Salesforce metadata, so our users see outcomes beyond what’s possible with the standard git merge. With our semantic merge, users can:
- Automatically resolve conflicts that git merge wouldn’t be able to handle.
- Identify cases where merging changes will result in Salesforce validation failures further down the line. These failures wouldn’t be noticed by git merge, but semantic merge will highlight them as conflicts for the user to resolve manually.
It’s important to emphasise that semantic merge isn’t making any decisions about which changes should be ignored. We’re simply using our knowledge of Salesforce metadata to identify cases where there isn’t actually a conflict that needs to be resolved.
Semantic merge process
In order to explain how this works and the improvements we introduced, we first need to define a few terms:
- 3-way merge — A type of merge that uses 3 versions of the same file from different points in source control:
- Left & Right — The versions of the file with changes to be merged. Typically, one will contain ‘your’ changes, the other contains changes from the long-lived branch.
- Ancestor — The version of the file before the changes in left or right were made.
- Diff — A specific difference that has been identified between two versions of the file. For example, adding or removing a line of text. There are 4 main types of diff:
- Insert — Adding an XML element.
- Delete — Removing an XML element.
- Move — Moving an XML element.
- Content change — Changing the inner value of an XML element.
The first part of merging is to identify the diffs between ancestor vs left, and ancestor vs right. Semantic merge does this in a 2-step process for each pair of files being compared:
- XML matcher — Try to match up XML elements we see in the 2 files being compared.
- Diff generator — Figure out all the individual diffs that are needed to change the source file into the destination file.
In an ideal world, we could then simply apply these diffs to a copy of the ancestor to generate a merged version of the file containing changes from both left and right.
In reality, we often find that diffs will conflict with each other — i.e. make different changes to the same part of the file — so we need to identify any diffs that might be conflicting and then try to resolve them.
If we’re able to resolve them all, then we can generate the final merged version of the file. If not, we’ll have to let the user know about the conflicts we’ve found so that they can resolve them manually.
State of semantic merge
The first iteration of our semantic merge solution worked well for users. However, at Gearset, we take an iterative approach to feature development and we felt there was still room for improvement.
We decided to focus on 2 main areas:
- Adding more automated testing to try and identify weak spots. In particular, we already used tests with randomly generated XML (fuzz tests) to find issues with the diff generation logic and wanted to expand this to conflict identification and resolution.
- Reviewing customer-provided repository metadata to see what common conflicts we might be able to resolve automatically.
Our stats showed that approximately 35% of PRs had merge conflicts. Our goal for this project was to reduce this by a quarter, to around 26% of PRs.
But there were still a couple of issues we identified. Here’s how we fixed them.
Missing deletions
In this scenario, we were testing the conflict resolvers by randomly generating a ‘source’ XML document and randomly applying some mutations to create a ‘target’ document.
We could then generate diffs and pass them through the conflict identification and resolution logic, to verify we were getting the expected output.
When we took a source document:
<seasonal>
<sink>
<seed>
<queue>NDGUNNORQVVEVME</queue>
<seed>
<queue>IVPXNHGKBXLDTMK</queue>
<copper>GZQDVQVKQWBVABC</copper>
</seed>
</seed>
<seed />
</sink>
</seasonal>
And its expected output:
<seasonal>
<sink>
<seed>
<seed>
<copper>GZQDVQVKQWBVABC</copper>
</seed>
</seed>
</sink>
<sink>
<seed>
<queue>NDGUNNORQVVEVME</queue>
</seed>
</sink>
</seasonal>
We were finding an unexpected element in the output:
<seasonal>
<sink>
<seed>
<seed>
<queue>IVPXNHGKBXLDTMK</queue> <—— unexpected element
<copper>GZQDVQVKQWBVABC</copper>
</seed>
</seed>
</sink>
<sink>
<seed>
<queue>NDGUNNORQVVEVME</queue>
</seed>
</sink>
</seasonal>
Looking into the details of what happened, we could see that 4 diffs were generated. These are shown below using XPath to identify elements.
A. Insert new <sink>
containing the expected <seed>
and <queue>
elements after the end of the existing <sink>
element. B. Move seasonal/sink/seed[0]/seed
to within seasonal/sink/seed[1]
C. Delete seasonal/sink/seed[0]
D. Delete seasonal/sink/seed[0]/seed/queue
The logic will identify that Diffs B, C and D might be in conflict because they act on some of the same elements. Specifically, this generated two sets of conflicts:
- B+C — because the moved element is within the deleted element.
- C+D — because D is happening to a child of the element deleted by C.
B and D are not considered in conflict because no matter which order you apply the diffs in, you get the same result.
The first conflict can be resolved by executing B first, then C. For the second conflict, we can ignore D because it’s nested within C and would be deleted anyway.
However, the combination of these two resolutions means that the element deleted by D was no longer within the element deleted by C.
The solution is to always perform deletions, even when they are nested like this. If the element really has already been deleted by the time we get to it, then we ignore it at that point instead.
XML matcher mixups
In some fairly complex 3-way merge scenarios generated by our fuzz tests, the system was finding conflicts when we didn’t expect there to be any.
When we looked at the diffs that were generated, they didn’t quite match what we would have expected. This wasn’t unusual in itself. The XML matcher can often match elements in ways that seem nonsensical to human eyes, leading to counterintuitive diffs. However, as long as the generated diffs lead to the correct result, it doesn’t really matter.
In these new scenarios, we found that it did matter because these counterintuitive diffs were conflicting, whereas the diffs generated from a more intuitive matching of XML elements didn’t conflict.
The primary difficulty is that the XML matcher is very performance sensitive. It’s the most significant contributor to the overall time taken by the semantic merge driver, so we had tightly optimised and tuned the XML matcher to consider as little as possible, in order to make an accurate match. This was what was causing the counterintuitive matches we were seeing.
In the end, we addressed this by adding some extra checks to also consider the next/previous siblings of an element when deciding which element to match it to. For performance reasons, these checks only happen if the element has multiple potential matches.
Results
In total, we identified at least 12 different scenarios where we could improve on the existing behaviour. Addressing these reduced the percentage of PRs with conflicts by around 20%. Not quite our 25% target, but still a massive improvement for users.
We also reduced the size and complexity of the diffs that users have to deal with when conflicts do occur.
Future work
These improvements are fantastic, but we believe we can do better and are excited to improve even further in future.
Partial resolution
One thing we’re already looking at is partial resolution.
This change to the underlying systems will allow us to automatically resolve more diffs than we currently do, by accepting some of the diffs involved in a conflict as they are while leaving others as conflicts that’ll require manual input to resolve. This won’t help with reducing the total numbers of conflicts we see, but will reduce the size and complexity of the diffs that users need to manually resolve when conflicts occur.
At present, we don’t actually use partial resolution very much, but there are many situations where we might be able to apply it. More work is needed to confirm the best way to identify these scenarios and make sure that we handle them in the correct way.
Targeted investigations
During this work we also improved our logging, to give more context about what types of conflicts we’re seeing and what metadata types they are occurring in. Using this intelligence, we should be able to focus our efforts on the biggest real-world problem areas.
Want to hear more from engineering?
If you want to find out more about what our engineering team gets up to, take a look at our careers blog! And, if you’re interested in joining the engineering team at Gearset, take a look at our open job roles.