Quality is a Journey, not a destination

Catch up on this talk from Richard Clark (Chief Strategy Officer at Provar) at DevOps Dreamin’ where he explains how quality in terms of software development is a journey not a destination.

Richard discusses:

Why testing matters for your deployments
How to integrate testing into your CI/CD pipeline
How you can measure the quality of your DevOps solutions

Learn more:

Related videos:

Measure your Salesforce DevOps performance with Gearset’s Reporting API

Next, in the lineup. So, we got one more talked for you before a short afternoon break pleasure to introduce somebody that I've known in the ecosystem for a long time, Mr. Richard Clark. So he's gonna be talking to us about quality being a journey, and hopefully you can see the evidence, the spectacularness this shirt and, yeah, exactly.

Exactly. Give us a tour. We need a gear set, pro of our collaboration there, I think. I think that would be quite good.

Richard, take it away. Thank you so much. Thank you, Jack. Thank you, everyone coming along.

So quality is a journey, not a destination. What do I mean by that? All things in life, quality isn't something you finished doing. It's something you do continuously.

So we go on multiple journeys in our life. We certain stages we reach, we go places, we do things, we have certain goals, but you're never finished. It's not the destination of what you're trying to do.

Click would help.

So, I'm Richard Clark. I'm Chief Strategy Officer at Provar Limited. Firstly, don't let that worry you. Okay?

I'm technical as well. Okay. I was doing DevOps back in twenty twelve, quite early on. I've been in your shoes.

I've done those two AM deployments and rollbacks.

I've lived it all from as a system integrator, end customer, and partner of Salesforce.

So at ProvAR, we produce integrated quality life cycle Management Solutions, and the most robust test automation solution designed for Salesforce and other applications to test that.

So today, I want to give you an idea of why testing matters and why matters for DevOps, how can integrate it into your CICD pipelines, how you can measure quality of your DevOps solutions, not just about testing, that's just one measure. So we're gonna look at measures. We can look at some of the common techniques and metrics people use, how you can collate that data together, into one quality hub. And when I talk about quality, I'm not talking testing quality assurance.

That's one minor thing. If you're testing something, that's great. You're observing the results. You improve quality at the start.

So we're gonna talk about that too.

So hands up if you know what Dora metrics are.

Okay. Keep your hands up if you actively use them.

Not one. One at back too far to shout rob. Okay.

So door metrics are important, but they are not the goal. For that everyone else and everyone online, and I've had this ready today upstairs.

This is not Dora. And yes, it's Dora. Okay? Dora the Explora. That's not what I'm talking about.

Okay? But kind of she goes on adventures, she goes on journeys. So, yes, it is.

I mean, Dora, DevOps research assessment okay, acquired by Google in twenty eighteen, used by Google and sort of looks independent still. They have sponsors as they publish an annual short, and that annual report's made up a survey of over thirty six thousand customers or end users, I should say, companies.

So they survey them and they find out what's really made a difference. Is AI really transforming the way we do dev ops, like Jack said this morning? Actually, it's not. If anything, it's holding back ops in some cases. So the Dora report is much more.

However, so most people would be aware if they know Dora, they think the Dora metrics.

So starting at the top there have I got a laser? Yes. We've got the mean lead time for change.

So from a commit of a change, how long does it take for that to reach production? We can measure that time in minutes, hours, day, unfortunately weeks in some cases. How often do you deploy to production? Okay?

You could deploy to production quite quickly if you do all your change production, we do it several hundreds of times a day if you wanted.

How what is your change failure rate? So how often does a change in production mean you have to roll it back? Not how many bugs do you get, but how often does it cause remedial action to occur?

And then the me mean time to restore, which has been re clarified as failed deployment recovery time. So basically, when there is a failure, how long does it take you get back to the working states you're in.

So each of those metrics, you can improve individually if you wanted to.

But if I decrease my change lead time by making those changes in production, it's gonna improve my deployment frequency yes, but it's gonna be at the expense of the change failure rate. Okay? I can put test automation on my application integrated into my debit box process, and it's gonna reduce my change failure rates. But if I use the wrong product or do it badly, I'm gonna increase my time to make changes, how long to commit to production. I'm gonna reduce my deployment frequency.

So we need to think about the metric what we mean by improving them.

We also need to think about the human cost. Okay?

If I add making changes twenty times a day to my production system, what's the impact on our users? How do they know how to use the application when every time they enter a trans action it's changed on them. So more is not always better. Okay.

So because of issues like this, Dora started to talk about a fifth element. It's not trick, but it's an element, something at the middle.

Don't hands up you know what the fifth element I'm going to talk about is other than Rob. He probably does. No. Good.

The fifth element is reliability. We can bring the four metrics together by measuring the reliability of making that change.

So by measuring reliability, we can tell if those changes to the door metrics made a positive or a negative impact on our overall systems.

So let's take a closer look at Dora, when I talk about Dora being a good thing, I'm not talking metrics. I'm talking about Dora as a methodology.

So Dora thought this. And if you look across the top, capabilities predict performance, predicts outcomes, Gora isn't about the software delivery performance we just saw. It's about commercial and non commercial outcomes.

It's about you as teams, your well-being, less deployment pain, less rework, less burnout.

This is what we should be measuring. These are the things the outcomes we wanna see. And in order to get that, by all means, we have to deliver all the capabilities, but I've highlighted the ones that apply to me for test automation code maintainability. I can measure the quality of that.

The monitoring and observability of our systems, including production, test automation, it's important you can't do dev ops about test automation.

Test data management, reusing your test cases, having test data drive iterations. You don't maintain lots of test cases is you maintain lots of scenarios for your test data, and even trunk based deployment. So lot of us did a lot of work in the last ten years. Oh, git flow, git flow, git flow, actually, if we want to be CI, we should be looking at trunk based deployment.

And all the other things, shift, left, and security, streamlined change approval. These are all important.

So I did some research, and I found GitLab, please don't boo, gearset. GitLab published their Dora metrics, okay, quite and I can look point in time if you had really good eyesight, but the metrics, the numbers don't matter.

They're their metrics. You can't really compare a Salesforce project to a GitLab project. But what we can look at is their trends. So we can look across the history over time, and we can see if things better or worse. Again, those numbers tell us a trend, and we can correlate across them so we can see how we change one metric, moves another, and so we can visualize that, it's very powerful.

But that's all quantitative analysis. It's all numbers based. What about some context, we've got a change freeze for December that's gonna hit my deployment frequency.

Had we just hired a bunch of new junior developers that hit the number of failures in production. Did we just lay off half our QA team? These are gonna change metrics. So the number lovers alone give you a an item to look at, but you need the qualitative data as well to understand what sense would have caused that to change. Your team may be performing just as well as before, but you've now got an issue with the, other things that are affecting them. Outside changes.

So in the twenty twenty three Dora report, it's worth reading. I've put a link at the end, and it's also worth reading the gear set state of dead ops report as well, some context about what's normal for Salesforce.

We see top teams are balanced teams. They're high performing, they have low burnout, they have high job satisfaction. These aren't Dora metrics. These are personal human cost. Human metrics we want to understand. And as a result of that, the top performers deploy on demand. There's not numbers say how many times per day, when they need to deploy, they can deploy, like we heard earlier from Jack.

Lead time to change is less than one day. We've got an idea. We need to do this. I prepared it. Let's ship it into production. Okay?

The change failure rate is less than five percent. I think that's quite high. There we go. And their failed deployment recovery time is less than one hour. Pretty good. Pretty good.

So teams that focus on the user rather than the metrics two percent higher organization performance, twenty percent higher job satisfaction.

These are things I'm interested in.

So apologies to people from sales in the room. I'm going to do a bit of Salesforce bashing. Okay.

Two weeks ago, Salesforce did a patch on production and sandboxes on a Thursday. Anyone's experienced this, the, when you hit the new button on a related list and you tried to save, and there was a mandatory relationship, master detail, the save was failing on the related record because it hadn't automatically copied across the primary or say the account or the opportunity, whatever it was, was the primary object.

I don't know how on earth that would reach production, but the impact was up four hours for Salesforce to roll back those changes. It shouldn't have happened. Okay?

It just shouldn't have happened. The the coverage should have been there to protect you against that. Would have thought you don't need to test that. Unfortunately, you do.

Unfortunately, on the same day, there were multiple all other unrelated outages, a, data center power went down with the storms over Europe. In Asia Pac, some servers went and people were doing site switching. So it looks like these things are related, but they weren't. You would never have thought to test for all these things.

Now if you were using Dora metrics, you would need this context because this would affect your availability or system. This would affect your ability to deploy changes or roll them back if you needed to do that as well. So it's important to understand these other events happen that could influence your dora.

Just a time check.

So this led me to look at something called site reliability engineering, and this is really cool. And I've seen a few ex colleagues from Provar have gone on to do site reliability engineering instead of being a QA person.

So SRE has been a doctor with a lot of big tech companies, and what they're investing in is preventative maintenance for software.

So rather than fix things that are broken, design them from the start to be reliable, think about that in the early stages.

So the debt conditions vary, but the things I call out automation. Again, automation of processes of deployments, of scripting, your deployments, of testing, important.

Only engineer don't over engineer solution folks on the necessary level of liability.

Observe what's happening production through continuous testing. Don't just think if you deployed your finished. Use log monitoring as well, not just UI testing.

And another thing I found really interesting was chaos engineering one heard of Ks engineering before.

Two, three. I expected a lot of people. That's good.

I was really in by this. So back in nineteen eighty three, Apple, we're doing the macintosh computer, you know, the one with the old a four screen that got rid of that. That was perfect for looking at the page. And they wanted to test it, and they couldn't use their automated testing software.

The reason they couldn't do it, there wasn't enough memory on the machine to run it. So someone built a machine to basically hammer the keyboard, randomly to test and it worked. They were doing chaos in June. Things you would not predict random events happening to the keyboard to see if the occasions still worked.

Twenty years later, we see Amazon creates a total game day. So Amazon, I think, was still mainly a bookshop at that time, they went to improve their website reliability, and they did that by deliberately creating major failures in their induction environment to see how they could respond to it. And they were motivated by seeing five fighters at work. So five fighter teams practice on burning planes, burning buildings, I don't think practice cats up trees, but maybe that as well. So they saw how they were getting better at recovering from incidents by practicing So they thought we need to practice this as well. We'll deliberately introduce problems so we can practice.

Two thousand and six, Google asked her recovery testing, is it testing? So again, they looked at catastrophes, natural disasters, data center going down.

Network connections, undersea cables being cut by anchors of ships dragging across them before we had satellites. They looked at all these things. In two thousand eleven, Netflix states that could chaos monkey, clearly influenced by Apple's the Apple monkey. So they created software that would take their production instances and randomly shut one down without their engineering team knowing which server would be taken down when how many of them.

And that developed a thing called Seemun Army. They've got a whole suite of tools around making sure they evaluated their platform. The form's not available. It's not available to their users.

They lose subscribers. It's their business. But more recently, we've seen something called failure as a service. This was quite interesting.

So tools such as proof doc and Gremlin, where actually you can test the reliability, not only of your applications, but for your people, Bob doesn't turn up to work, what happens? I'll be still able to start the business or, you know, to unlock the office. There's other things to test than just civil software and processes. You've got your happy process, what happens to that process deviates?

What happens to that step doesn't happen? We can look at those things.

So we looked at reliability earlier.

Reliability is only one measure of quality.

I usually list these as the five dimensions of quality. Other people choose some slight variations.

So we think about accuracy.

So our application, can work without any bugs. Okay?

If the data we're capturing is duplicate, if the data we capture is incorrect, then we're gonna be making bad decisions on that data. We're putting also liable for things like GDPR fines if you've got bad data.

So we know duplicate data is bad, we know we can measure duplication of data, and we know we can measure therefore the data accuracy of all the different things where data could be bad.

Completeness.

So my marketing team often say things like, oh, we got this many leads from this event. At the end of the event recently when I said, Oh, you've got you've got hundred leads from that event.

Oh, can I have the, company names, email addresses? Oh, no. I can't give you that. Sorry.

Respondering. Yeah. I can't give you that. Why is that? GDPR. Can't give it to you, say?

No. That's not right.

Completeness of data. If I don't have an email address, a company name, or a phone number, that data is useless to me. Instead of meeting GDPI by excluding the data, they should have stored it securely and got the, the attendee to opt in to give permission for their data to be used. So again, the solution we choose can impact the data completeness if we the wrong solution.

So we want our data to be reliable. That doesn't just mean about bugs. We want to make sure there's no website outages. We make sure if an integrations failing or fail silently, we know about it.

There may not be a bug. There may not be a Jira ticket for it. If the save button doesn't work. We want to know about that.

It may look like it works, but there's no data in the table. These are common scenarios.

One, I often is people often ask for a new field to be added to lead object, but they never tell our admins where they want their data to go on conversion. For me, a reliability issue. I've captured some data, and I just throw it away when I convert the lead. It's not good.

Relevancy, so unless you're a data scientists doing AI, you can store all the data and world, but we only want what's relevant to us.

So at Provar, I want to know what DevOps solution you're using because it makes a difference in terms of my recommendations and what tools to integrate.

At Pratt, they sell coffee and sandwiches.

I don't need to know what food allergies you have. They don't need to know what DevOps tool you're using. So data needs to be relevant to the business and the process that you're using And then timeliness.

So some people say data currency. We don't mean pound shillings and pence or dried fruit. What we need to the data we're storing needs to timely, it needs to be relevant. If we're acting on stale data that's changing fast, again, it's inaccurate data.

So the, timeliness you can get that data, the timeliness you can respond to it. On that point, one of the things I often come across is people during UAT on a project will often say things like, oh, this is a bug. It's a change in requirements. And I always say to them, when did you capture requirement.

And if they say three months ago, I said, well, you're building what they wanted three months ago, you should be building what they need today. So your job as a consultant to check with them as you go through the process, when you start building that, is this still the right requirement? Is not their fault if the requirement of the business has moved on. But also people don't tell you everything, so you get those.

So we looked at Dora. We talked about capabilities briefly.

We can see how we can influence capabilities from things like test automation, We talk about quality measures. So how can we integrate that into our dev ops pipeline? So Salesforce now talk about testing.

So John's here. Thank you, John. DevOps Center has his suite of tools, and these tools are helping you with the quality. The quality of your applications, they're helping you things about code analyzers, static code analysis, and we recognize there's multiple stages to deploy something to production. This is good.

At provar, we think of this at his two split infinity strips. So you've got your normal DevOps process, where you've got your functional test thing in the same place. But actually, we say you should be testing throughout that process, but the types of tests you do is different.

Here, I'm testing the functionality I'm delivering hasn't broken anything, the things I'm delivering work. Over here, I'm doing static code analysis over here and deploying to production and maybe looking at logs and maybe looking at performance, what's the performance of the system after I deploy it.

Even in production, you're still looking at what bugs, how many issues are being created, but seeing other ways that we can measure quality, other ways we can test quality throughout our life cycle.

So on our stand upstairs, we've got this banner. I'm going to talk you through it how much time have I got quickly.

So the left the right side is indicative.

Okay? You can use gear set in here. You can use other tools. You can use Azur dev ops.

Yes. Agile accelerator. It doesn't matter what you're using, but we have a system where we're capturing our user stories. Those user stories being built either declaratively or co did by developers.

In parallel to that, our QA team are looking at that story as well and thinking about what is it they need to test, where they're planning their testing.

Unfortunately, unless you're using CAMban, and if you're using scrum, you're probably thinking that a sprint plus one, you typically execute the tests after the developers committed work because in my experience, you only get those commits three days before the end of the sprint, and that's not enough time to do your testing. But if you're fortunate and you're more mature than that, by all means in sprint testing. We build tests using prover automation.

We can test those on those development environments. We can give developers and admins tools through the Salesforce CLI to run the tests where we already have the regression pack.

And then when we're in a part a release pipeline, we can do more deep level of testing. We can do things like code coverage. We can do things like test coverage, as well as code coverage. We can integrate the gear sets we can do AI test creation. We can create tests for you automatically based on the metadata that's being changed.

We can do a static code analysis. We can do AI orchestration. So we can look at the tests. We can say, which tests do you have that are new?

Because they're probably gonna break first. They may not be good tests. What tests always go wrong, run first. So if we optimize our execution of tests, again, instead of waiting six hours for your results, we condone fifteen minutes that this is a bad release, or this is a adds of tests in some cases.

Now as we move down the pipeline, we see each stage we do more regression testing, we can capture manual test results during your AT, they can see the results of what you've already tested against which user stories and why and what the results were and the performance of those tests.

But you can also log what was it they tested, what were they trying to do? What did they break when they were testing? And ultimately, when you then deploy to production, you'd also do some monitor monitoring some non destructive testing of your production environment.

Okay. Quick break for drink.

So in the testing world, we talk about a software test life cycle. So unit testing, integration, system, acceptance testing.

Now when we measure this, when we put metrics on this, this isn't just about the number of provar tests that worked. How many commits did it take make that change. You might wanna measure that. It might tell you something quite interesting.

When you start looking at that by developer, you get some very interesting results. You might need to do some retraining.

How many unit test failures were there, is part of that? How many actions came at that retrospective meeting end of sprint. That's quite interesting. Is that going up? We have more of respect back to them, or we have less? Are they different?

Integration test failures? How often are you in metadata, are we changing the same bit of metadata over and over and over if so, why are we not thinking more holistically about those changes?

Merge conflicts, everyone's favorite thing, many times you get merge conflict, are we using the right branching strategy should we move from git flow to use trunk based developments to avoid that?

On the system, is our business process failing, okay, and how many times does it fail?

What is the relative performance of the application? So when I run my tests, I get a load of timings about how long the save button on opportunity took. If it's taken ten seconds on opportunity save, your user's gonna be unhappy. Okay? Mass internal users. So you can measure that. You can look back over time, like we saw with Dora, and you can measure relative performance.

We can automatically run tests when those messaging changes occur, acceptance, how many UAT defects do they find, what are our ador metrics for getting to deduction.

What's our customer satisfaction as well? So how what impacts us that had on our users?

So we can bring this together, and we bring it together through test analytics.

So we can take all that data, and we put it in the Salesforce org, or we can put it in Tableau CRM, just through a drug with the name off, okay, or you can put it in a complete different application. We can take that and can report across Salesforce as a quite good reporting tool. We can put quite a lot of data in it. We can actually start looking for correlations between data.

So this is the Pro VAR managed application available in the AppExchange, free trial thirty days yada yada yada. But in it, you can slice and dice your information, you can look at things like what the chamber's buying in terms of which environment, which systems, how often, how frequently things break in, how how frequently things work, and what's the relative performance over time is our code coverage and our quality coverage getting better. So we can look at all these things Salesforce.

And this helps us understand that quantitative data, but I can also, because it's Salesforce, I can add notes. I chatter records. I can add qualitative information to this too.

So the reason I would do that is because I had a previous career, whereas the go around companies doing CMMI, so capability maturity models. So we've been percent tank called the quality maturity model. And CMMI and maturity models in general have five levels. I always add level zero, I keep coming across level zero.

We've got nothing. But the five levels are initial. And in our case, I've defined it as it worked. I've got no evidence it worked.

I tell you what we did, but I can tell you that it worked in NASA level below that. Number two, defined. So we've scripted what is it we're gonna test, in this case, or what we're gonna observe, we've got the evidence, and I can repeat it. That's good.

A lot of people still working to this, still trying to to defined.

Integrated, I can automatically execute tests as part of my CICD pipeline.

I can schedule regression as well as part of that. I understand that when I'm doing regression, I'm testing different things to when I'm testing a individual change.

I'm starting to look at Dora metrics. I'm starting to collect the data. May not be doing anything with it, but I'll be collecting it. And I've got a single solution for looking at my DevOps, the results of that, my manual tests, my automated tests in one place.

Quantitively managed or just managed, we started to think about the automatic defect creation. So we observe something's gone wrong. We automatically create a defect. It automatically gets routed to the right engineer based on the type of defect.

We automatic test plans based on those changes I talked about. And we can historically look back at the quantitative data and understand where we may have a problem or eating issue occurring. So I can do something in a very managed way. And then the optimized level, never seen it optimized, so don't worry.

Maybe it's aspirational. I can start to see evidence of a continuous improvement, because that's what it's about. This is a journey we're on continuously improving, using site reliability engineering. I've shifted left in my processing to avoid finding bugs and focus on not having the bugs in the first place.

I'm looking at the whole Dora methodology. I've got outcomes we saw on the right. I'm thinking about my staff and employee welfare, thinking about commercial and non commercial outcomes.

So this is a thing you can measure. And when you do a CMMI or a QMI in this case, you'll measure multiple factors. So the factors I would choose for my industry process infrastructure planning scope.

So how mature were the processes? So I've asked the questions. How mature are your processes? For finding bugs and executing tests?

Is it ad hoc? Who does it? How often have you got scripts? Show me the scripts?

These are the questions I would ask. And I would rate between naught and five on that score, then I would look at things like environments and tooling. Do they have enough sandboxes? Are they trying to do everything with a dev sandbox?

Okay?

Do that for the right tools, do they have a proper dev ops product? Are they still doing things using the metadata API or chain sets?

How mature is their planning? Are they thinking ahead, are they proactive or reactive in their test efforts?

How quickly can they adapt to someone say we need this tomorrow?

Was the scope of the testing they're doing? So I look at things like, you know, are they just checking happy day scenarios? Are they looking at performance? Are they looking at data quality? Are they looking code quality, are they thinking about user experience, okay?

Were the QA team recognized as team members and this a lot. So a lot of companies, people that are doing QA, may be in the team doing other jobs as well, they're being overloaded with work potentially.

So if people are doing testing, have they had the right training then? Do they have a career path? Are they recognized?

Because we get to see a lot of devs, admins, and BA's to do testing because having QAs is still seen as a luxury in many companies. It's not easy to justify as a budget.

So hopefully from today, you've learned quantity is a journey. I hope you agree.

Dora metrics are only comparative measures, and that Dora is a methodology. It's than just the numbers. It's more than just metrics. Please go and have a look at Dora and learn about it.

Test automation also integral for DevOps, platform engineering, which I forgot to speak about, and site reliability engineering.

Quick one on Nope. I haven't got time.

So quality of your journey, and it's unique to you.

Some links, future dates Quitra, the last, do a report. You can go and have a look at that. You download it. Don't have to pay anything.

They don't spam you too much. We've also got our own survey, so you'd like to do a survey with Provar, bitly slash Qualcomm Matt or take a picture with the QR code, it takes just Google form survey where you can find out more and submit your quality, and we'll come back to you with what we think about where you are in that quality maturity model. And then lastly, the set that let me talk for this one, their state of dev ops report as well. So specific to the Salesforce industries, what's considered good, what's considered to leap formers.

If you'd like to know more about Provar, please come see us upstairs, contact us on our website, free training LMS, like launchpad pro r dot me, you can just sign up and start using that access and all our content, or connect with me on LinkedIn or Twitter.

I've refused to call it X. Any questions? Any time for questions?

Must be one. We forgot to plant a question.

No.

Thank you very much.

Compare & deploy

CI/CD pipelines

Backup & restore

Quality is a Journey, not a destination

Compare & deploy

CI/CD pipelines

Backup & restore

DevOps done right

Ebooks & whitepapers

Webinars

Blog

Podcast

DevOps report 2025

DevOps training

Help center

DevOps assessment

Why choose Gearset

Customer stories

Integrations

Security & trust

Events

DevOps Leaders

Feedback forum

Upcoming event

TDX

New from the blog

Salesforce solution design: step-by-step implementation guide

New from the blog

How to deploy Salesforce Agent Script metadata

Quality is a Journey, not a destination

Description

Transcript

Contact us

Customer support