Sample Size in Neuromarketing

By Carla Nagel

A common topic to discuss in neuromarketing is the sample size. There is no such thing as the “archetypal neuromarketing study”. A variety of different technologies, approaches, and combinations are used. Furthermore, the design of the study affects the sample size. We asked a few experts to elaborate on this topic.

Elissa Moses: “Many people do not understand the rules of neuro statistics”

Technologies used: Ipsos uses a full array depending upon the objectives and needs of each type of study. Most in demand right now are implicit reaction time and facial coding but a fair amount of eye-tracking, biometrics and EEG are also used.

“There is a confusion in the industry regarding what neuro samples are required because each tool has different points of conversion for statistical stability. What many people do not understand is that the rules of statistical power, representativeness and reliability do not change because a measure is ‘neuro’ as opposed to ‘cognitive’ or a traditional question. The number of subjects varies by method and whether or not we are conducting a study online or in-person. For online we typically look for a complete minimum sample of about n=100 consumers but depending upon the need for subgroup analysis this can vary upwards. For in-person CLT studies, where controls are greater, we nd statistical stability with smaller samples such as n=65 for most tools.

We explain guidelines for sample size to clients by discussing how neurometrics are governed by the same laws of statistics as survey research and explain how sample sizes are derived for each method. Ultimately, we give our clients assurance because Ipsos adheres to strict power requirements for statistical standards.”

Elissa Moses is CEO at BrainGroup Global & Managing Partner at HARK Connect™

Professor Dr. Bernd Weber: “Sample size greatly depends on the question of the client.”

Technologies used: functional MRI methodology complemented by simultaneous eye-tracking.

“The sample size we usually use is about 30 - 40 subjects, but this greatly depends on the question and the client. If the client is interested in individual di erences or specific customers, the sample size can also increase.

We consult our clients with respect to the adequate sample size. If a client is only interested in an “average reaction” of subjects the sample size can be smaller than if inter-individual di erences need to be taken into account.

We explain to our clients that they have to be clear about what they want to get out of the study. In our discussions with clients, we usually close into the specific question and needs in an iterative process. We discuss in detail what methods would reasonably be applied and how this leads to differences in sample size needed. It is usually recommended not to analyze samples below 20 subjects because the influence of individuals on the observed overall effect is then too big.”

Bernd Weber is a Professor at the University of Bonn / Life&Brain gmbH

Dr. Thomas Ramsøy: “We rarely find clients asking about the variance of a score”

Technologies used: mobile or stationary eye-tracking and EEG. Other technologies: NeuroVision (a computational neuroscience cloud-based system), facial coding, fMRI, and physiological measures.

“Our regular omnibus copy-test solution typically tests 100+ participants. Smaller and more dedicated studies, such as in-store or packaging studies, usually test something like 40-50 people. This all depends on the scope of the study, how well de ned the group is, and what the desired representatively of the sample is.

The question we see most often is about statistical power, and whether our power is su ciently strong. Neuroscience methods often have an incredible statistical power. With EEG, we’re generating an incredible number of data each millisecond. Clients should rather focus on how representative the sample is, and whether their basic questions are answered with 40, 100 or 200 people. Another question we often see is related to benchmarks. How are these responses tting to some normative material? What constitutes a ‘good’ or a ‘bad’ ad? The answer here is to focus on building a good database of representative samples and sub-samples.

Interesting to note is what clients do not address. We rarely nd clients asking about the variance of a score, that is, how consistently a group of people are responding in a particular way. For example, one group may show a mean score of x, but show a huge variance (some score high, some low). Another group may show the same mean score of x, but here the variance is very small. In the rst group, you should be very careful to interpret much from the mean score; in group two you can be pretty sure that the score you observe is reliable. It is important to provide an understanding of the consequences of operationalization of the questions that clients have.”

Thomas Ramsøy is the Founder & CEO of Neurons Inc

Professor Richard Silberstein: “Neuroscience doesn’t change the laws of statistics”

Technologies used: Steady State Topography (SST). This technology measures the speed of neural information processing in different parts of the brain to determine the psychological responses to communications.

“Our minimum cell size is 50 people. A typical two cell study involving, say customers and non-customers of a particular brand would thus involve 100 people. We don’t get many questions on this topic but a common one we encountered with clients who had previously used other neuro suppliers was ‘Why so many people?’ The number of people you need depends on the question you are asking and the statistical strength of the effect you are examining. Sampling theory is employed to ensure that the population used in our research projects (and the resulting insights obtained) accurately reflect the consumer group(s) targeted by our clients.

As a simple example, for a client in the automotive sector, we might look at a group of 50 respondents who are in- market for a new vehicle. This is a fairly simple criteria to draw a sample from. Even so, that doesn’t guarantee that the in-market group is a homogeneous one. Whilst the in-market sample may ‘match’ according to their future purchase intentions, they may well mismatch on many other criteria (such as brand preference, vehicle preference or socio-economic status). Again, we can control for these additional criteria through careful recruiting - but there will always be other unknown biological and cognitive factors which can make an apparently homogenous group quite heterogeneous. These other factors are generally quite diverse across individuals in the sample - and this is where the sample size is important. By using a sample that is sufficiently large, the variance due to these diverse and unrelated other factors tends to average out, diminishing their impact. Leaving the results as a much more accurate reflection of the client’s true target group. As we say, Neuroscience doesn’t change the laws of statistics.”

Richard Silberstein is Chairman at Neuro-Insight

Dr. Jaime Romano: “Questions about the statistical error are difficult to address if the client is unfamiliar with inferential statistics”

Technologies used: Event-related potentials, quantitative electroencephalography, electrooculography, galvanic skin response, electrocardiography, blood volume pulse, neuropsychological assessment

“We justify the sample size on the power of the statistical tools used to test the experiment’s hypotheses. If we are searching for a correlation r>0.85 in a P300 experiment, the sample size is as small as 18 subjects. On the other hand, if we want to compare the purchase intent between four different ad campaigns and also to perform an analysis of variance with age, the sample is as large as 96 participants.

Client questions regarding study design are explained by the difference between inferential statistics and descriptive statistics. The rst is the one used by NMKT to ’reject a claim’ about the population whereas the latter is used in a survey to ’describe a claim‘ about the population. A client’s questions about the statistical error are di cult to address if the client is unfamiliar with inferential statistics. We usually explain that the 0.05 Type I error used at NMKT means that we have a 5% chance of getting the wrong answer, whereas a 5% error in a survey means that we are under or over-representing 5% of the population.

We have a statistical FAQ for our customers that addresses the most common concerns using formulas and we
have Java applets for clients who are more advanced in mathematics. The most famous example we use is the problem ’how many people do I need to accept the claim that Substance X kills people’. A client’s intuitive answer is ’well, just one‘, but when we revisit the problem, the client understands the role statistics play in recognizing the relevance of a professional and accurate sample calculation.”

Jaime Romano is Director at NEUROMARKETING MX

Dr. Hirak Parikh: “Brain scans of 25 to 40 subjects provide significant results”

Technologies used: a 64-channel Biosemi EEG system (at 8 kHz ). Additional expertise in eye-tracking, galvanic skin response (GSR), respiration rates, hormone and enzyme analyses. In addition, fMRI imaging provided in partnership with Neurensics.

“From our experience in the eld and several studies, brain scans of 25 to 40 subjects (depending on the specifc question, as well as how diverse the target group is) provides signifcant results which predict the behavior of
a much bigger population. It is always our top priority to scan the right number of subjects to deliver statistically signi cant and reliable results to our clients. In most cases, we also require validation data from clients.

As has been proven in many recent scientifc studies, such small samples can predict the behavior of the entire population if you use brain scans instead of questionnaires. A study by Berns and Moore showed that brain scans of 27 teenagers were better able to predict the market success (sales) of pop songs across the general population and more closely correlated with sales as compared to data collected from questionnaires [Berns & Moore, A neural predictor of cultural popularity, Journal of Consumer Psychology 2011]. In a 2012 study, conducted by The Neuromarketing Labs using a latte macchiato co ee vending machine, it was shown that brain scans of 35 college students were su cient to predict the buying behavior of the entire college population. These predictions better matched actual buying behavior as compared to self-reports from questionnaires. [Baldo & Müller, “Latte macchiato study” conducted by The Neuromarketing Labs, 2013 ].”

By Dr. Hirak Parikh Early-stage Venture Fund Manager & Entrepreneur

Dr. Christophe Morin: “We determine the level of accuracy and risk our customers are willing to take with the data”

Technologies used: voice analysis, biometric studies and facial imaging

“We use voice analysis for qualitative exploration of emotions evoked during in-depth interviews (ILI). Samples vary from 6 to 24 subjects typically We use facial imaging for qualitative explorations as well (24 subjects) and quantitative studies (200+) We conduct only quantitative biometric studies (start at 72 subjects, better size is 200+)

Questions about the sample size are always related to the statistical significance of the data we collect. Some of the techniques we use involve clustering data, which is a way of revealing subgroups of subjects (clusters) that may share common characteristics. When we do that, the size of the cluster is important, otherwise the signifficance of the variance between each cluster is compromised.

We determine the level of accuracy and risk our customers are willing to take with the data and then we make the best recommendation. We have 4 PhDs on our scientific team so we feel pretty con dent we can make a good call between sample size, statistical validity and cost.”

Dr. Christophe Morin is CEO and Co-founder at SalesBrain

Martin de Munnik: “The appropriate size of the target group for MRI- based research is signi cantly lower than for traditional research”

Technologies used: functional MRI methodology complemented by simultaneous eye-tracking

“One of the unique features of neuroscience research is that with relatively few subjects reliable results can be obtained about the quality of a product, brand, advertisement, packaging design or sales promotion. The simple reason being that the true opinion of a group as a whole, resides in the brains of just a few subjects representatives of that group.

Emily Falk (2012) predicted the effectiveness of anti- smoking campaigns in the US by scanning the brain activation of only 30 smokers. Neurensics has predicted national TV-commercial success with up to 88% accuracy, using a benchmark of proven effective ads called the E es (Lamme & Scholte, 2013). Highly signi cant di erences in TV-commercial effectiveness were obtained with just ~20 subjects. MRI- based research can use such small target groups because the difference in brain activations between subjects is much smaller than the variation in subjective survey scores.”

Martin de Munnik is Founding partner at Neurensics

Professor Arnaud Petre: “The setup of the experiment following a random effect design is most important to us.”

Technologies used: fMRI (3, 7 and 9.4 Tesla). Sometimes combined with embedded eye-tracker, embedded EEG, heart rate measurement, skin conductance and/or an olfactometer to smell fragrances.

“We don’t need a lot of people, but at least 12 per tested segment. It’s one of the plus points for MRI. With EEG-ERP you need, for example, more subjects and more repetition because the signal is noisy.

Our design is always event-related and our analyses always follow the random effect model. There are two common assumptions made about the individual specific effect, the random effects assumption and the fixed effects assumption. The random effects assumption we use in our research is that the individual-specific effects are uncorrelated with the independent variables. The fixed effect assumption instead assumes that the individual-specific effects are correlated with the independent variables.

We insist a lot on the random effect model. For us, this is way more important than the sample size itself. Results in the random effect setup don’t change a lot when we add more people in the sample. In a large-scale research project we conducted earlier, the difference between a sample of 12 and a sample of 48 people was less than 2%.

We multiply the number of segments by 12 to arrive at the minimal sample size allowing a split between the data analyses. Generally, our samples are between 36 to 72 subjects.”

Arnaud Petre is CEO at Brain Impact - Consumer Neuroscience

Professor Gemma Calvert: “Explaining sample and effect sizes can be challenging if clients do not have previous experience with statistics”

Technologies used: Implicit priming paradigms and fMR

“For implicit studies, we typically require 400 respondents (or 100 per cell or data cut) to achieve 95% con dence interval and an effect size of 0.8. For fMRI, it depends on the paradigm. Ideally, if a study is similar to one run previously, a power calculation can be carried out on the prior data to establish how many subjects were required in order to obtain the results in the initial study. This yields an estimate of the effect size and will inform the sample size for subsequent studies.

The sample sizes we use in implicit studies are typically consistent with those used in the standard quantitative online panel which clients are familiar with.

With fMRI, the question that comes up most often is how we can generalize to the normal population with so few subjects. It is a complicated challenge to explain this if a client does not have experience with statistics. However, we explain that brain responses are far less variable than explicit measures, and this being the case, you need fewer subjects in order to detect an effect. This is because the detection of statistical differences between groups or conditions depends on the variance of the responses. If the variance is large (people’s explicit responses can vary widely), then large numbers in each group are required in order to detect a clear group difference. If the variance is small (as is often the case in fMRI BOLD signal patterns), then simply adding more subjects does not impact on the detection of the activated brain areas, nor in the size of the fMRI signal against the baseline noise.”

Gemma Calvert is Professor of Consumer Neuroscience, Nanyang Business School

This article was originally published in Insights. Become a member and get access to the full Insights' archive!