Visual attention and saliency

Eyetracking is a well known method used in high cost marketing campaigns to assess how visuals drive observer's attention. However, eyetracking is time consuming and costly, requiring the involvement of many people in viewing experiments. Using predictive models of how humans perceive and observe visuals allow to instantly analyse designs and compositions.

Computer saliency models have demonstrated high accuracy in predicting the first seconds of eye tracking experiments just considering the visual information in the image. It makes a pure graphic analysis to predict eye catching elements without taking into account any observer intention.

saliency heatmap prediction rebook advertising Saliency prediction of Reebook advertising.
eyetracking heatmap evaluation rebook advertising Eyetracking heatmap of advertising assesment.

Abraia implements one of the most advanced available models of saliency - AWS, Adaptive Whitening Saliency by Antón García-Díaz - [1]. With an accuracy over 90% compared to that achieved in an eyetracking audit experiment. It has demonstrated top performance in predicting visual fixations in large scale datasets with a great variety of scenes (MIT saliency benchmark). Also recent third party reviews by top researchers in the field support this statement [2][3].

aws adaptive whitening saliency model Adaptive Whitening Saliency model diagram

Prediction of visual attention

Visual attention is a complex preprocessing step that enables biological systems to select the most relevant regions from a scene, while higher-level cognitive areas perform complex processes such as scene understanding, action selection and decision making.

In such a way, visual attention is categorized into two distinct functions:

  • Bottom-up attention refers to externally driven factors - the scene - that highlight salient image regions that are different from their surroundings.

  • Top-down attention refers to task driven factors, based on prior knowledge and intentions of the observer.

Abraia focus on raw saliency due to the physics of the scene -bottom-up attention free of biases-. This factor is supposed to be universal -invariant across individuals- and determined just by image content. This makes clearly the most interesting attention driver for design purposes, because you cannot modify biases of the human visual system and you usually cannot be sure of the intentions and the experience of your observer, but you may modify a color or a texture to catch attention by boosting saliency at the desired point.

Based on state-of-the-art computer models for prediction of saliency, it reproduces different adaptation, aggregation, and pop-out mechanisms observed in psychopysical experiments (i.e. desconposing images in a similar manner to neural responses observed in the visual cortex and applying operations that take place all along the visual pathway).

This makes Abraia a suitable tool for working with general purpose visuals and compositions, since it does not make any assumption on scene category or observer intentions.

Comparison to black box approaches

Models that resort to extensive machine learning usually omit any reference to psychophysical validation beyond prediction of fixations. They are black boxes that learn where fixations go on a specific dataset of images. As a result, they learn mostly biases of the human visual system (e.g. the tendency to fixate at the center) or biases of the data used for training.

This is important because biases associated to the "gist of the scene" do not generalize to different scenes, degrading the performance of machine learning approaches as scene variety increases. These factors limit the true accuracy and applicability of those approaches for general design optimization purposes.

These models usually explain their accuracy through raw AUC (or NSS) metrics, which do not take into account these biases. As an example of the issue, a generic map representing center bias (with maximum "saliency" in the center and decreasing values towards the borders) achieves very good AUC values without taking into account anything about the image content. However, it is widely accepted that sAUC, a modified metric to discount biases is the best approach to benchmark saliency models. In this case, the previous example performs as a random map when assessed with sAUC.

aws performance salincy mit benchmark AWS performance with sAUC in the MIT saliency benchmark

In contrast, we provide you with a measure of saliency free of bias. Moreover, since our model is not a black box we can trace back which features are firing (or failing to fire) attention and give you visual advice on how to modify your design to meet your goals.

User attention and design

The first few seconds of exposition to a scene have a significant impact. Designers must constantly analyze their compositions to know where and how they are directing the user's attention, because this determines the message that is perceived by the user at a first glance.

But when you create a design is very difficult to honestly view and analyze it. Your eyes will tend to look at your favorite components: that logo that you spent hours perfecting, the texture that you are so proud to have created from scratch or the headline that you carefully crafted with pixel perfect kerning. So you may require an objective tool to be sure that your design is viewed the way you intent.

The importance of intentionally directing user attention to the portions of the design merit makes objective analysis critical.

Abraia is a powerful tool which is able to drive the design process in the refinement of layouts, placement of components, defining the key message, etc. Use Abraia console to confirm that you are achieving your visual goals, or to guide revisions, uploading images that you use to easily optimize the composition. You may also use it just to demonstrate your customers that the design you are proposing is a ready good choice.

Abraia implements state-of-the-art perception models to provide instant feedback about how your compositions and designs drive the attention of the user at a first glance.

Introducing visual attention predictions in your standard workflow will help you to critically analyze a design to see if it is meeting your goals. Of course, software will never be as good as a focus group of fifty users, but it is definitely a nice alternative for those short on time and money. Because it is really nice to see an objective viewpoint.

Visual attention optimization

Visual attention is a broad concept generally divided between bottom-up and top-down factors. While top-down attention deals with high-level cognitive factors that make image regions relevant, such as task demands, emotions, and expectations, saliency is mainly referring to bottom-up processes that render certain image regions more conspicuous: for instance, image regions with different features from their surroundings.

An image detail appears salient when one or more of its low-level features (e.g. size, shape, luminance, color, texture, or motion) differs significantly from its variation in the background.

Correspondingly, saliency models provide just image-based visual attention predictions where you can get the eye-catching elements through the visual attention heatmap. It provides a view where areas that have a deep red overlay are most likely to be seen, while areas with no coloring are likely to be ignored. - Note that this map predicts fixation volume not fixation duration -.

Reebok spot heatmap Abraia heatmap for a Reebok spot

Thus, visual attention is mainly a subconscious process guided by the composition of the design. Abraia provides a fast and accurate way to predict where viewers will focus their attention in a subconscious process. The software instantly creates an “attention heatmap” of your image that predicts where someone would look during the first few seconds of their view (replacing eye-tracking studies).

Visual clutter score

The visual clutter score is an instant metric that tells you how is perceived a design. It instantly answer the question of how clean and clear your designs are in a objective way, rating any design from 0 to 10.

A score of 0 indicates an extremely clean and clear design whereas a score of 10 means your design is extremely cluttered (this score is strictly evaluating the visual clutter of the design, not the clarity of your messaging).

Cleaner designs tend to provide more pleasant experiences for users, and achieve higher engagement and conversion.

In most cases, a low score is desirable. However, clutter needs to be evaluated in context. Some designs are more cluttered than others because they contain more content.

Optimizing designs and mockups

With the attention heatmap you get a prediction of which content people will see or miss during the first 3 seconds after they arrive on the page. This makes it a great way to determine whether or not your page effectively directs user attention to conversion-critical content.

Heatmaps provide valuable information about what attracts attention and what distract the viewers from the core content of your image of design.

With the opacity map view you get illuminated areas which represent content that is likely to be within the “foveal view” or direct sight (not in peripheral vision) within the first 3 seconds of a user landing on the page. Shaded areas of the image are “blind spots”.

With the region of interest -also based on the saliency prediction- you can get a numerical score that indicates how eye-catching a particular piece of content is on a page. When you highlight a region of interest, the average saliency measure of the pixels inside the selected region is shown as a percentage value of that mean.

The absolute ROI score for any given region is not particularly meaningful (there's no rule of thumb for what is a “good” score), but this feature becomes extremely useful when comparing different versions of a design, or 2 different elements on the page.

Modify and test your visual until it meets your expectations, analyzing several options (e.g. with different background or rearranged elements) as a help to decide on the best choice. This is a quicker route from draft concept to approved design.

How can it be used?

It can be used to refine designs improving its subconscious visual perception. It can identify which elements are being looked at and which are being ignored. This allows the designer to focus attention on the correct parts of the design increasing the likelihood of successful.

Heatmaps identify the most salient elements in the image, so you can use it as an objective indicator when reporting a new design or discussing between different options with a customer or a boss. Attention heatmaps can be created several times during the design process to ensure that the refinements are having the intended effect.

How can I compare different designs?

The areas of interest help you to compare areas inside the image and determine the percentage of the attention this areas is attracting (attention score of your marked target). The percentage value is the value of attention comparing the attention at the rest of the image (0% means no attention at all and 100% means this area will be mostly moteivable).

Distracting elements in a design

A distracting element in a composition can be easily detected using heatmaps. You can learn where to place the contents that are most important just comparing between designs an compositions.

We can predict the best creative comparing several designs in a simple look, for example various banners in place. Then analyze each screenshot to see what draws the most eyes, and you may find the ad that will draw the greatest visual optimization rate.

Improving the clarity score

The most powerful way to improve your clarity score is to reduce unnecessary text. Only include content that is 100% necessary for users to have a good experience.

A few other ways you can increase your score:

  • Human perception of clutter is much more forgiving of imagery than of text.
  • Increase the amount of whitespace or padding around content.
  • Use imagery that has less texture and fewer lines.
  • Organize the page into easily distinguishable content blocks.

[1] Garcia-Diaz, A., Leboran, V., Fdez-Vidal, X. R., & Pardo, X. M. (2012). On the relationship between optical variability, visual saliency, and eye fixations: A computational approach. JoV 2012

[2] Borji, A., Sihite, D. N., & Itti, L. (2013). Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. IEEE Trans. on Image Processing, 22(1)

[3] Rahman, S., & Bruce, N. (2015). Visual Saliency Prediction and Evaluation across Different Perceptual Tasks. PloS one, 10(9)