A/B testing got you elected, Mr. President
This proposal by <a href="/users/samphippen">Sam Phippen</a> has been chosen by the community to be given at Ruby Manor 4.
updated almost 5 years ago; latest suggestion almost 5 years ago
The Obama Campaign this year ran what could be described as an utter insurgency. What most people don't know is that they used a simple statistical technique, A/B testing, to drive up donations and volunteers, which inevitably led to more votes. I'd like to do a talk explaining how A/B testing works, how to do it in ruby (rails & sinatra) and some protips, as illustrated by real world examples.
The structure of my talk will be something roughly like:
- Examples of A/B testing
- How to do it (demo simple sinatra or rails app to add A/B testing)
- How to interpet the results (warning, this will involve statistics)
- What not to do
- What to measure
- Changes suitable for A/B testing
- Updated outline for recommendations from James Adam
This sounds great - I'd like to echo James' suggestion that you should definitely include something on how to interpret the results of A/B tests though - even if it's just a really high-level look at the principles of statistical hypothesis testing, and some of the more common errors to avoid.
Please cover caching errors! Oh these hurt so much..
As an obsessed follower of the 2012 election, the title seems an unnecessary stretch. Willing to let it go if "47%" finds its way into the statistical examples.
Agreed with the others that touching on interpretation would be good. Also helpful would be discussion of the costs of managing tests (in time, effort, and attention), and consideration of when one should and should not A/B.
The statistics side of things would be pretty useful. I suspect a lot of developers would have difficulty with calculating the duration of the test from the sample size needed to reach a certain confidence level.
Would be glad to hear discussion on how to decide when to interpret results. Lots of frameworks/tools I've used show results when they are in progress. Should sample size be pre-determined? Why not? How? What value can be gleaned from experiments that are in-progress? How does this play into the use of bandit algorithms?
Would this presentation talk about how to gather and interpret the results of A/B testing as well? Does that involve/require particular analytics systems? I would definitely be more interested if there was at least one concrete example of how to interpret the results for an A/B test.
Also, are there any "wrong" ways of using A/B testing? Are there situations where A/B tests aren't suitable, or where the results can be misleading or inconclusive? I think this would be really useful guidance too.