Nicole Romano is a Data Scientist at RadiumOne and was a participant in Insight's Visualization Lab in May 2015. Here Nicole describes the tool she built during the two-day workshop.
A Sankey diagram depicting one week of testing for a single campaign. The nodes of the Sankey diagram show membership, in this case a snapshot of the web domains in the testing, holding, and highly targeted strategies on each day. The links between nodes show how these web domains are moved amongst the testing, holding, or performant strategies as we get more information about their active performance.
How do you use data to decide where to advertise? At RadiumOne, we spend a lot of time on that question. Considering that there are a vast number of websites that sell advertising space, and that different ads will perform better with different audiences, it can get to be pretty complicated. We recently developed a new visualization tool to help make sense of advertising performance data. As you can see from the chart above, we can easily track the performance of our ad testing platform as it shuffles advertising domains between three buckets: Test, Performant, and Holding. This tool came out of our participation in Insight’s Data Visualization Lab.
Dynamic testing pipeline
At RadiumOne, we use proprietary data sources to identify high-performance advertisement opportunities in the Real-Time Bidding (RTB) Marketplace. Our Data Science team leverages this data to build algorithms that identify users who are likely to convert for a particular advertiser. Because RadiumOne evaluates over 800,000 advertisement opportunities per second, we also build operational tools that provide automated feedback on the success of our algorithms and engineered features in real time.
When we participated in the Visualization Lab in May, our team had just finished building a dynamic testing pipeline that evaluates the performance of our web placement algorithms. These algorithms comb through millions of web domains to find contextually relevant places to display advertisements for a particular brand. At full scale, a false positive prediction -- predicting an ad will perform well where it will not -- for even one web domain could mean thousands of wasted advertising dollars. Our new testing platform evaluated the actual performance of these domains, using statistical tests to identify false positives with high certainty and minimal wasted ad impressions.
Our new testing platform targets the most promising web domains for a limited number of ad impressions. Once the number of testing impressions is reached, the domain is shuffled to either a holding area to await feedback (‘Holding’), or a highly targeted list (‘Performant’) if the feedback was positive. Domains can be shuffled back to the testing area (‘Test’) if more information is statistically necessary. Over the course of an ad campaign, a domain may be shuffled hundreds of times.
We chose to visualize the activity on this testing platform to see if any large-scale patterns were emerging that could inform future web domain algorithms. The platform was shuffling thousands of web domains amongst the testing environment, a holding area, and a highly targeted environment for performant domains (above). Each web domain could be shuffled hundreds of times in one test, its path regulated by feedback latency, performance goals, and statistical power. Visualization of such a complex and unique system was a daunting task, so we turned to Insight’s Data Visualization Lab.
After a series of talks and exercises on effective visualization, I was introduced to Silvia, a data engineer from SVDS with a Ph.D. background in data visualization. Silvia looked at my use case and recommended the Sankey Diagram, a type of flow diagram usually used to depict the transfer of energy or money in systems. This suggestion was the first in a series of breakthroughs during the Lab that demonstrated to our team the practical value of the Visualization Lab.
Using the Lab’s visualization resources wiki, I rapidly identified several open-source tools that would allow me to build a Sankey Diagram. The visualization expert talks had introduced me to several concepts, such as tooltips and cross-filters, which allowed me to build a visualization interface that was both simple and powerful. After consultation with the Visualization Labs experts, I chose to work with Google Charts’ wrapper for d3.js.
I returned to my team with a visualization product, a new familiarity with open source tools for visualization, and a catalog of code blocks that I could use for other visualization products in the future. Perhaps more importantly, I returned with new insights into our testing platform, which have informed a new model for ad placement algorithms. With every iteration of this algorithm, we can easily visualize emergent patterns using the Sankey tool built in Insight’s Visualization Lab.
To learn more about data visualization and to receive new tutorials, join the Data Labs email list.
Want to learn more advanced data visualization topics?
Join Insight for a two-day advanced workshop led by tech industry experts.
Interested in transitioning to career in data science?
Learn more about the Insight Data Science Fellows Program.