Description
Agentforce can be a game-changer for AI-powered Salesforce interactions, but AI hallucinations can erode trust and impact business outcomes. In this session, you’ll learn the five most common causes of Agentforce hallucinations, how to assess your data to identify risks, and actionable strategies to ensure data readiness for effective agent interactions.
Mehmet Orun is GM and Data Strategist at PeerNova. Salesforce veteran and data management Subject Matter Expert with ~20 years in the ecosystem as a Customer, Consultant, Salesforce Product and Practice Lead, ISV Partner, and most recently, Data Matters Trailblazer Community co-founder.His mission is to empower organizations to make confident decisions with reliable data through process excellence, product innovation, and community collaboration.
Transcript
Excellent. Hello, everyone. Welcome to the last session of DevOps streaming. My name is Mehmet Oran. I am, my day job is being the GM and data strategist for Piernova. We are a Salesforce IFE partner and we work on data management challenges.
Today, we are here to talk about the five reasons for enterprise AI hallucinations, specific techniques on how to detect and mitigate them, and thinking about data operations as an extension of DevOps as a skill set.
Among beyond my day job I am a Salesforce data topics community leader. I talk and write about data management a lot. I invite any of you to reach out to me and connect if you want to continue the conversation.
The goal for today's session is to leave you with three bits of knowledge. What are the five causes for enterprise AI hallucinations?
What are the key components of enterprise AI solutions that give you the ability to mitigate these, and how you can assess your data or if you're a consultant your clients' data against the risk so you can mitigate it.
To be realistic though, any event full of content is not going to let you remember anything based on what you saw. What I'm hoping is you're going to feel better equipped based on the specifics in this presentation.
So you're going to choose to either watch the recording, reach out, and request the material. I sought to provide you with as many details as possible to guide you along your journey.
So, the five reasons presentation title for Enterprise AI Hallucination are number one your understanding of your customer based on available data in your systems is incomplete.
We'll talk about some examples of this shortly.
Number two, you actually don't have the data in your system that you need especially if, people did not enter the data due to usability challenges or interfaces had gotten broken and some of the feeds were off. This is going to impact what you do.
Incomplete metadata is number three especially when we deal with reasoning engines.
Updated data providing incorrect out of date results is number four and insufficient post deployment monitoring of your data sources of your content is number five and this is one of the items that is done the least.
Now there's a lot of material out there that talks about the what. This is the what the details are in the URL in a guidance article.
Given we're on between you and the happy hour, if you want to say this this is what I was looking for awesome I won't mind. If you want to know why these are the top reasons and how you can actually deal with it please stay with me for the next thirty minutes.
Thank you for not leaving.
So before we talk about AI hallucination risk, first we need to agree to what AI is.
So let's start with a conversation on what makes enterprise AI different than Gemini, ChatGPT, what we are using in consumer pieces. Can anyone shout out how would you define AI?
Volunteers? Anyone?
Okay.
Look, the simplest definition of AI which is universally true regardless of technology whether we go back to nineteen eighties or now is it is the simulation of human processes by machines.
That's it.
Do you accept this definition? Any disagreements?
So, if we accept this definition here is a proposal for you that may be a little different than what you think about AI technology adoption today.
If AI is simulation of human processes then analytics engines, automation engines are versions of AI, predictive AI which we have known in the Salesforce ecosystem through Einstein discovery or some of the prediction models is AI, summarization, content generation is a type of AI and so is the ability to interact and reason is a version of AI.
So if you're thinking about enterprise AI solutions to support your organization you need to realize all of these are part of the enterprise AI solutions and the good news is whatever allows you to provide reliable data to any one of these solutions will help every other solution as well. So, a little bit of a foundation.
So to apply this to a use case, let's imagine that you work for a non profit and you want to engage your constituents in the most effective way possible. You get a donation.
The first thing you want to know for that donation is is this a new donor or is this someone we have seen before.
Now the fact that your match rules may have a probabilistic nature does not change the fact that you're running an automated process and you want consistent predictable outcomes every time you do this and get an indication of do we think with high confidence we've seen this person before, do we think we might have seen this before, or do they appear brand new.
Then based on what you know about that person you want to put them in their engagement journey. You want to figure out the next best action. Predictive AI solutions can tell you not just based on the donation date but based on the person, based on their demographics, based on which emails they have opted in the past, what you should include in your message as opposed to a generic thank you, here is our tax ID for your deductions.
That is predictive AI.
The fact that you want to decide when to send out that message is something you are going to automate in your system, probably adjusted for the time zone based on where the person is, optimum time to open messages, and then you can use generative AI techniques to create the content that's going to speak the most based on what you know about the individual.
Multiple types of AI all in one flow for a single business purpose of make the person feel good about their donation, come back, donate, and engage more.
Behind every enterprise AI solution, we have three types of data that needs to come together.
And this is one of the differences the way it needs to be treated compared to, you know, putting your content into ChatGPT or any other generative ALM solution.
First is data about your customers or partners, suppliers, candidates. This tends to be structured data for the most part and this data is going to be distributed across multiple records, multiple objects, intentional and unintentional duplicates.
It exists and your goal is to have a complete understanding of that interaction in a contextual and compliant way.
Part of the context comes from how are they interacting with you. This is the transactional data.
If you have an inside and outside sales teams not every piece of data should be visible to every user group. So, the permission set model idea we have in the Salesforce CRM between internal users and experience cloud users, this has to carry forward when you are taking advantage of AI solutions and you are bringing in unstructured data into the mix.
Your unstructured data can be your knowledge articles that is powering your chatbots, but it is also your emails, your Slack messages, the marketing messages that they might have responded to. This is what is going to give you the holistic insight.
And, not just for field metadata or object metadata but metadata about your unstructured sources is also incredibly important for Enterprise AI.
If you want to associate meeting transactions with an individual simply put you have two choices I can copy and paste or save the file and load the file or I can associate my meeting with the attendees of the meeting and the transcript as I feed it in. The first one gets me the content, the second one preserves the context.
So, an enterprise solution has to be mindful of all three.
If the AI technology you are currently working with does not allow you to ensure complete consistent current contextual compliant and when possible correct or at least not obviously incorrect information, It's simply a journey on your roadmap. You do not have a complete stack yet.
Look, there are new AI vendors showing up every single day. It is hard to predict the future. However, I do believe that we can learn from the past and having been around a while, I have a point of view on what type of solutions are going to be easier to implement, easier to maintain especially from a DevOps perspective based on what has been happening historically.
But depending on where you are in your journey, you are probably starting with unstructured data with chatbots. This makes people be familiar with the technology. You may be dealing with metadata, trying to assess technical debt. There are a number of solutions in the ecosystem.
These help with productivity. The efficiency we are seeking is going to come together when we are acting on unified customer insights. Only sophisticated solutions can help deliver.
So, let's take a little bit of time travel through the journey for anyone that noticed the Tardis image at the beginning of the deck.
In nineteen eighty's we had batch integration tools. We built data warehouses. Everything was custom, we were working with relational databases and we would deliver operational reports with year long projects.
As the technology patterns evolved the pattern you see is like functionality gets bundled together.
So, in two thousand's you aren't buying a batch ETL tool really anymore you were buying an integration suite that dictates in batch messaging and real time.
When you go forward the fact that you may need to select, implement, integrate, and MDM with a data warehouse with a various set of integration tools also was too slow and too complicated so we started seeing marketing CDPs.
Instead of having standalone OLAP and reporting engines we started seeing these come together. Tableau and CRMA getting closer and closer is a very good example of this.
Where we are now is the idea of modern data platforms also providing capabilities for unstructured data while analytics platforms not just providing a graphical interface but they are providing natural language processing interfaces and even voice interfaces.
Where I believe we are moving to today and this is where companies like Salesforce is betting on is the idea of an enterprise AI and data platform that gives you access to all of these while giving you choice of what features you want to use, what data sources you want to bring in, and how to leverage existing infrastructures in API gateways, in data lakes, in other LLMs if you already have one that is working with the necessary security and stewardship controls.
One of the interesting things about this is if you compare this as an IT professional who has never looked at a single Salesforce presentation and then you compare it to the picture of Data Cloud, functionality not surprisingly overlaps.
And, for any of you that might have seen the earlier presentation on what are all of the things you need to worry about for data cloud, sandbox, and DevOps management.
If you've been in enterprise IT, you know that when you just need to have one set of metadata for all of the functionality in the same box, it is so much easier than figuring out how to do deployments for your MDM and data lake and ETL and federation and reports because these are at least bundled.
So, my point of view, Enterprise AI solutions are going to be led by platforms and extended through additional functionality as the path rather than bring your own.
One of the key capabilities you need to have in any enterprise AI platform is the ability to understand the data within it. So, a very quick intermission on a key data upscale one of you should have if you don't have it already, data profiling.
Data profiling focuses primarily on structured data and it will tell you what is the fill rates of your fields, distinct value counts, top and bottom value frequencies, ability to compare the data content in your source versus the configuration.
This takes observability to the next level where SQL queries are going to be insufficient and they allow you to do this for a given business scenario as opposed to tell me what's happening in this database object without context.
The reason this is incredibly important is your user adoption assessment, your data reliability assessment, your scoping, whether you may have all data that may impact the results, all of this starts with an understanding of your data.
Otherwise, you are taking bets that you don't know whether you are going to achieve business results or not.
There is a link at the article that talks about what is data profiling and different techniques to select one if you don't have one already on Salesforce for your post conference reference.
So now that we talked about key capabilities of an enterprise AI and data platform and we are going to need to understand our data to see if we have a hallucination risk, let's look at the same list again and talk about how you can assess if you have a risk and what you can do to mitigate them.
Risk number one is incomplete understanding of the customer.
So, like definition of AI, let's start by defining what does it mean to have a good understanding of the customer.
Would you say based on what you see on the screen, super simple data model, are these duplicates or not duplicates?
Any opinion welcome?
Don't be shy.
I see a shaking of head no.
I see that would be a duplicate.
Any consultants in the room to say it depends? Yes sir?
So, the statement for everyone's benefit is it may not be duplicates because the person is in two different companies.
Most of the conversations around data quality or CRM data quality seems to revolve around do we have duplicates or not? And people make decisions based on what they think should happen without business outcomes context.
The right question is do we have duplicates or not? The right question is how do we want to understand and engage the people we are interacting with? So, in this example these are two distinct business contacts that relates to a single individual which means the way we treat these two records should allow us to both recognize the individual in one context and recognize the business contact in a different context. So whether you are doing automation workflows, you are doing generative AI, you are doing agentic interactions, you have to actually know what is the data you are allowed to see and whether you are seeing what you should see. So when we talk about security rules, your identity resolution logic and data security rules that go into it is a key part of that consideration.
If you're just dealing with individual records with the disconnects, it's not gonna be great. You're gonna have hallucinations. People are going to say, what do you mean? I had one of these with the hotel chain, not Marriott yesterday. They're like, oh we don't see a reservation for you it must be a different department even though the app was the one that was telling me, you know you need to act on your reservation.
Good is when you at least have consolidation around business contacts.
Great is when you know your interaction with the individual across companies while being compliant and contextual with security mandates.
I have never seen a CRM org or any data source that is, bereft of duplicates intentional or unintentional how do you find out what the problem is and what you can do about it? Well, you analyze your data and let your data tell the story rather than discussion and debate. Data profiling is both way faster in terms of producing outcomes and you may not like the results but that's what the data is telling you.
What I like to look at is, because I don't know that a phone number field is going to have the data type phone or an email field is going to have the data type email, I don't assume, I like looking at contact point and string fields and look at if they are less than one hundred percent unique but more than seventy seventy five percent. That's my heuristic.
And then I see what could it tell me. Here is a subset of fields from a customer I had worked with used with permission.
What we found out is the primary phone field was unique only forty percent of the time.
If you know of how phone number or country fields get defaulted when you convert a lead to a contact, it's going to inherit the attributes of the account.
Most of the time this was the corporate phone number and since you can have a person with the same first name and last name in the same company of a reasonable size, knowing this would mean that we may have incorrect matches. Raise your hand if you watched a recent agent force demo in the last nine months at a Salesforce event.
The demo starts with a phone call, the agent says hello, I know who you are. Well, in this case they wouldn't be able to tell who they were sixty percent of the time, right? Because if you use the phone field for that interaction it's not going to work.
But when we look at the contact data field whatever it may be insufficient metadata number three when we look at which email we can use I can quickly get a sense that I have a risk my minimum exposure for data completeness is fifteen percent I take the lowest number in this particular range What we should do is see if we brought this data together how much can we improve it.
It is however not enough just to look at field names because field rates is a metadata property.
What you also need to look at is is the data behind those fields reliable?
Another real example used with permission.
For our phone number twenty four thousand plus of the cases the phone number was equal to a single dash.
It's going to impact your match rules, it's not going to impact automation use case but you need to know which data needs to be scrubbed and what data may need to be excluded in order to build effective solutions and these are all going to need to be a part of the methodology in data operations.
Once you do the analysis the good news is the amount of time it takes to normalize the data, remove known bad values, retain the context using a solution like data cloud data transforms, if you know what you are doing takes like one to five days.
And I've done this enough times. We give workshops on this at community events. This is a subset of the exact same flow you would set up whether you have two or twenty contact points that you are seeking to bring together.
And if you do that you now have the ability to unify your data to have a complete understanding.
But what data do you have within that complete understanding which is when we talk about risk number two. We have missing data at either record or field level.
One of the items I want to leave you with is field usage decreases over time as processes change, as people change, as requirements change.
So, if you are working on any type of an agent for its deployment you need to know what fields are reliably populated today that powers the business outcome.
You would look at fill rates. That doesn't mean ignore everything that is populated less because those exceptions may be valuable for the like most impactful cases, but be mindful of your field fill rates when you do your prompt design and guardrail design within the prompts.
Also just because the field is populated do not assume that it is reliably populated. My favorite example is a field that is hundred percent populated with the default value because it's a required field with a validation rule.
These are going to be worthless and you do not want to include them in your design.
And if the agents, do not notice they may try to leverage data content based on the field name or description alone. So be explicit sometimes in the fields you want to or don't want to use.
The other thing that you can look at which may explain why you are not capturing certain data elements is what was configured was not exposed well in the user interface. So when you configure the available pick lists for example to what is actually being captured by end users or what was captured by a migration process you can quickly see something appears to be wrong and this helps you identify a risk to see if it is an issue so you can come up with a mitigation plan.
If you go through this process, when I work with one particular customer we started with five eighty nine fields in the contact object alone and ended up with only seventy nine fields that mattered.
If we were able to do this, what it helped us with is by identifying which fields that matter and you either compare success or unsuccessful outcomes or you track what data gets captured as a whole across stages of a transaction, you know prospecting discovery proposal or it's a new case it's being worked on it's been escalated. It's going to tell you the type of information that seems to be important for this business function and then you can use it to optimize your UI design, you can use it to optimize your help text, you can discover what data management rules are actually backfiring and producing bad results, And this is all about getting your data ready for any type of an AI solution.
With profiling tools depending on what you're using this is that has a comparative profiling feature.
You look at how distinctly populated two fields are between two different scenarios.
The recommendation I make is whenever the difference is more than five to fifteen percent interesting things are happening.
Look at that list and use your knowledge to create a series of formula fields which Salesforce data quality dashboards have talked about having for fifteen years. This is how you guide what should be in your formula because the moment you define something like this not only you're going to know at a record level what important field completeness is and then you can ask your prompt evaluator and say, Hey, if I don't have good enough data on this record do not have an automated flow, send it to a human, you can also guide the human to say, hey, can you make sure we are asking for these data points so we can use automation or agentic AI solutions in the future.
The other benefit of going from five eighty nine fields to seventy nine for your use case, how many of you have completed documented your data dictionary including data honor and classifications?
I see zero hands. Okay. Is it easier to do that for seventy nine fields or five hundred ninety nine fields?
Especially if you know those fields are going to be used in a new business solution.
Metadata is essential in agentic AI because ultimately you're leveraging your reasoning engine. When you have a field name such as score, if you don't have a definition behind it, Using it incorrectly is as bad as not using it to produce a result.
So for your indicative fields for business outcomes you want to make sure that you have a business data owner to verify that you're going to have the right definitions and to evaluate is this the data we meant to capture.
You also want to make sure that this is your starting point to be able to maintain it.
Number four, outdated data skewering results. Now this is a structured data example but you also need content curation processes for unstructured data that may not be as indicative.
I love knowing how much of my data has not been touched in more than three years. In this case the example is ninety one percent because if I bring this data is it going to be irrelevant? If I bring it in I'm definitely paying for it because it's a consumption model with any technology these days.
And if it is not necessary should I just archive it which by the way frees up my no longer use fields? It reduces my overall cost, it increases my performance in my CRM solution, and it gives me only the more relevant data in my AI platform.
Because I archived it, if I ever need to go back and bring in, let's say, three year old data, two years was not enough, I can always do this. I haven't lost it and maybe I experimented with it using zero copy.
And finally number five. So let's envision you follow the guidelines, you chose a platform, you configured everything right, your data model is beautiful, you go live, you go on vacation.
Like we monitor for things in DevOps, right? You also need to monitor things in DataOps.
Could you find out if your data volume in your system that was growing steadily all of a sudden had an unexpected drop or a random spike?
Maybe it's not about data volume. What if you used to capture five distinct fields for the last several years and now you have fifteen distinct fields? Was that the data migration error that they forgot to standardize?
When we are doing deployments and release management especially when we are testing in a sandbox being able to test data content before and after migrations is something every organization should do. And the nice thing is like if you're looking at the UI, there's nothing custom about this. This is a standard out of the box feature of an ISP solution.
So, I want to make sure we had a few minutes left for questions and we are on time.
My goal was to help you have a sense of what are the five four five causes of AI hallucinations. Thumbs up if we covered this.
Feedback?
Do you feel like you can think about enterprise AI solutions differently than what a Gemini does or ChachiPT does?
Have you learned about how a new skill or how to use that skill to assess AI hallucination risk and augment your DevOps skills?
If you don't remember this tomorrow don't worry it's been a long productive two days. I hope you feel this was impactful so when DevOps streaming shares the video you will take a look at it.
Here is my LinkedIn profile QR code if you want to reach out. I'll send you the full deck. It has links to the guides underneath it and if you want to have a chat about this I'm always here for it as well. With that, that's the end of the presentation and I'm here for any questions you may have. Thank you.
Yes sir.
So the question is how is the C suite responding to the question of enterprise AI?
And what I'm seeing is executives are feeling a sense of urgency that they need to embrace AI and have an AI strategy because they are afraid of being left behind.
They are feeling the pressure but because of the unknown cost and because technology is changing so much there is significant hesitance in how to go through the journey.
Doing experiments on the left hand side is incredibly easy, right? You do LLMs, you do chatbots, the cost factor is lower.
If any of you were around in the nineteen nineties in IT this is like people starting to build their own access databases which ends up increasing risk, increasing uncertainty, but at a business unit level people felt like it got the job done.
I'm seeing a lot of appetite for experimentation.
According to Salesforce the number one reason AI agents did not go to production or were used in production last year and the ratio was a little over eighty percent of what was developed did not get used was mistrust in underlying data reliability.
Now, mistrust is interesting because if you never measure it you cannot prove it's going to be reliable.
Many of these projects did not have a data readiness assessment component which is being incorporated if you look at the latest trail Salesforce is recommending you should assess your data and you should focus on what's reliable.
The second part for those that assessed it, they discovered there were challenges and this is what the holistic roadmap is going to be.
Another prediction is by the time it is Dreamforce this year in October you're going to be seeing stories not just on agents providing summaries but agents act on unified insights where that data came from multiple instances or multiple data sources and combining structured and unstructured data. And these sessions need to be not just a vision but practical implementation stories because we as practitioners are getting more and more reps in the space.
As executives are presented with use cases that give them higher degree of confidence, I think we are going to see a broader adoption of these capabilities.
Let's see.
So the question is do you need to sell the whole stack and like this is to me what the Salesforce play is, right? They already have reports and analytics, they already have workflow, they have some of your business data, and they provided the capability with data cloud to bring in and unify the data if you do not have an effective data lake or data warehouse and or augment it with data from your data lake or data warehouse using data capabilities so you can act on it faster.
There are a number of features that are still needed and we will keep seeing more and more items on the roadmap but the reason platform plays to me and my day job is being a PNL leader leader is important is I don't want to invest resources learning lots of different technologies and then how to integrate those lots of different technologies and then continuously checking their data processing agreement changes to make sure we remain compliant because I decided a BYO AI approach.
As an executive, my perspective on this is I want to go forward confidently. I want to run multiple experiments, but what I am going to use in production has to be based on verifiably reliable data that is continuously monitored and maintained which is why I think DevOps teams are going to be asked to take on additional data ops type of capabilities also running on the platform.
I do think we are out of time just for being able to manage the room but I'm happy to stay to take any more questions or chat in general. I want to say thank you once again and let's continue the conversation.