4 variables that make AI visibility scores differ across tools
A complete answer to "Why two AI visibility tools never give your brand the same score?"
AI search visibility scores have become a North Star Metric in many teams now. You probably will hear such a question in 2 cases:
You or your customer decided to use 2 and more tools to monitor brand performance in AI search (to generate more insights).
You are in the process of choosing one tool and comparing it for a couple of months to determine which is the best for your needs.
In both cases, you have to keep in mind 4 variables that influence such scores.
1. Which prompts do you track?
AI search tracking tools often use AI to generate prompts for your brand. Even if you enter the same keywords to get prompt ideas, you’ll get different results.



The only way to have the same list of prompts in different tools is to enter all of them manually and ignore suggestions.
However, it’s hard to maintain because when you work with tools, you generate some ideas during the work and you add new prompts in one tool, forgetting to add them in another tool.
2. Which AI chats do you track?
In the old keywords rank tracking world, you needed only Google tracking in most cases. And Google core updates didn’t change anything in the tracking process.
Now we have:
multiple chats;
multiple models of specific chat.
It’s hard to run a fair test in different tools because:
1) The pricing structure varies a lot. For example, Promptwatch includes all LLMs in each plan; Profound includes only 3 LLMs even in the $399/mo plan; Peec includes Claude only in the custom plan, and so on.



Buying 3 enterprise plans doesn’t make sense for a complete comparison.
2) It’s hard to find which specific LLM model version the specific tool uses. This variable is the most underrated.

For the last 2 years, Anthropic and OpenAI have released around 5-10 model updates. Different models produce different outputs.
3. How does an AI visibility tracker actually track results?
Here is a good explanation of 4 methods of prompt tracking from Ethan Smith. In our case, most of the differences between tools are related to the 3rd and 4th methods, because I don’t know of tools that use the first two methods.
And again, the problem that AI tracking tools often don’t want to share which method they use. Spend time on the home page and pricing to find it. Tools that do not use an API, but a real UI, are proud of it and use it as differentiation.
4. How does the AI visibility formula work?
The basic visibility is easy to count: (% of brand mentions / % of all answers). This is the most popular approach to track visibility across tools, and if previous variables are similar, you’ll see similar scores across the tools.
However, the problem is that this formula doesn’t represent the real world. People delegate a lot of decision-making to AI chats, and being mentioned is only the first step.
The second step is to be mentioned as the 1st one and as the best solution. That is what most brands aim for, and that’s why I believe AI tracking tools will implement more sophisticated formulas that include the brand position and context.
The pitfalls of focusing on only one metric
Also, considering 4 variables mentioned above, setting up a goal to increase your AI search visibility score is a tricky question.
You can say let’s grow from X (the baseline) to 100%, but you should remember that:
1) Each time we add new prompts, the Visibility Score changes because of a new variable.
If we focus too much on improving such a score, we’ll avoid adding new prompts where we know we have low visibility.
2) Prompts also have different values for us.
We have to keep an eye on whether we grow visibility by the most valuable prompts. And that’s why I believe each visibility report by prompt should contain some metrics from GSC and Google Ads for the top keywords used in this prompt (we’ll add it to the AI search visibility checker at Sitechecker soon).
Let me know what I missed.
Check my Data Studio templates:



