The Importance of Sample Size in Social Engineering Tests

Wednesday, January 16, 2013

Matt Neely


We Have a Problem

Information security has a problem. We make far too many decisions without having reliable data to assist in our decision making process. Because of this, far too many information security professionals use what I call Gut 1.0 to make decisions based on gut feel. To help address this, security professionals need to start gathering reliable and accurate data they can use in the decision making process. To make this data useful, a number of factors need to be looked at. Today we’re going to talk about selecting sample sets that are statically significant so data gathered from a small subset of the population can be applied to the population as a whole. I want to dive into this specifically because I see a lot of companies perform social engineering tests on a handful of their user population and falsely try to apply these results to the entire user population at the company.

Ways to Determine Sample Size

There are a number of different methods and online calculators which can be used to select a sample size that is statistically significant. Today we’re going to be using an online tool provided by Creative Research Systems. In this blog post we won’t go into the math behind this method, but if you are interested you can learn more here. Instead we’ll focus on a practical approach to use this method when performing a social engineering test geared towards measuring the effectiveness of a security awareness training program.

As a general rule, the larger the sample size the more accurately it will represent the lager population. It’s also important to note the relationship between sample size and overall user population is not a simple linear relationship. Before we get into some examples, there are two terms, Confidence Interval (CI) and Confidence Level (CL), which are important to understand when calculating sample sizes.

Key Definitions

The Confidence Interval is what we commonly refer to as the margin of error. The Confidence Interval says the data is accurate plus or minus the CI. For example, if you have a Confidence Interval of 5 and your social engineering tests shows 40% of the sample population fell prey to the attack, you can say the attack would be 35-45% successful against the entire population.

The Confidence Level is basically how sure you are in the reliability of the data. For this article we’ll be running all calculations with a CL of 95%. We chose 95% because that’s a confidence level used by most researchers.

How the Numbers Break Down

In the chart below we show the sample size needed to achieve various CI for different user populations. For example if you have a 500 user company and want to have data that has a CI of 2 you would need to test 414 users, which is pretty close to the entire population. However if you have a 10,000 person company and want test results with the same CI level you need a sample size of 1,936. However if you want to change to a larger CI of 5 the numbers change pretty drastically and you need only 217 users from a 500 person company or 370 users from a 10,000 person company. You can explore the chart below to see the sample size needed for different user populations and CI.


100 Users

500 Users

1,000 Users

2,000 User

10,000 User

CI 2

CL 95%






CI 5

CL 95%






CI 7

CL 95%






CI 10

CL 95%







What to Aim For?

What is the correct CI value to aim for? That depends on how you plan to use the data. The important item is to understand what the CI is when interpreting data you gathered or creating a sample size for a test where the results will be applied to a larger population.

Now that we have a general idea on how to the numbers break down lets apply this to a traditional social engineering test where only a handful of users are tested.

Example: Interpreting Test Results

Bob’s Aerospace Widgets has 300 users and wants to run a social engineering test to determine how well users can apply knowledge learned during the security awareness training program. They want to send phishing emails to 10 employees and see if the users recognize it as a malicious email and do not click on the link. After the test of 10 users the company finds that 80% of the users clicked on the link. Now the company wants to use this data to figure out how well this data applies to the overall user base at the company. Using the information above we run the calculations and determine if you have an overall population of 300 and taking a sampling of 10 users you will have a CI of 30.52 when applying this to the general population. So in this case, if the same phishing email was presented to the general population, it would be clicked on by 49.48 to 100% of the population. Overall a range that large isn’t helpful to judge the effectiveness of the program. Is the fact that 8 out of the 10 users clicked on the link a problem that needs to be addressed? Yes. But can this same percentage be applied to the general population? No. In this case the company should determine what a useful CI is and rerun the test with a sample of the proper size to give meaningful data.

SecureState uses this approach when performing social engineering assessments designed to test the effectiveness of a company’s security awareness training program.

Possibly Related Articles:
Information Security
Enterprise Security Social Engineering Penetration Testing
Post Rating I Like this!
The views expressed in this post are the opinions of the Infosec Island member that posted this content. Infosec Island is not responsible for the content or messaging of this post.

Unauthorized reproduction of this article (in part or in whole) is prohibited without the express written permission of Infosec Island and the Infosec Island member that posted this content--this includes using our RSS feed for any purpose other than personal use.