Yahoo!
Yahoo! is the premier digital media company. Their business focus is on creating a content, communications, and community platform that delivers rich consumer experiences and advertising solutions across all digital screens. Yahoo attracts more than half a billion consumers every month in more than 30 languages—making Yahoo! one of the most visited and trusted Internet destinations.
Overview download as pdf
Setting the prices for display ads is the responsibility of Yahoo!'s Pricing and Yield Management (PYM) team. They are also responsible for pricing analytics and pricing-yield business operations for all of Yahoo!'s display advertising business. This involves sophisticated analysis of very large volumes of impression-level Web data by highly skilled analysts within the PYM team. With the move toward finer and finer ad targeting (using more attributes to determine which ad to display), the process of pricing has become more complex as the team must price each specific set of attributes. In addition to setting the right price in the guaranteed
display ad market, the PYM team must also determine how impressions are monetizing in their secondary marketplace—this is required for establishing appropriate pricing and deal evaluation.
Challenge
Not surprisingly, as more attributes are captured for analysis, the volume of data collected every day continues to increase. For the PYM pricing analysis application, that equals 20 to 30-million records per day. Previously Yahoo! used a traditional row-based database to capture and store the data. However the high cost and high administrative effort it took was a barrier to storing all of the data the PYM team needed. Only summary data of user behavior could be stored, and the data history was limited to 30 days. This severely limited the number of attributes that could be considered in the pricing analysis, thereby limiting the analysis the PYM team could do and making the analysis less accurate. The PYM team decided they needed a new database to meet their needs:
- Allow the storage of all of the detailed impression data instead of only summary data
- Extend data history to 6 months versus the 30 days they could keep previously
- Provide fast queries and flexible ad-hoc analytics
- Reduce the amount of hardware required
- Reduce the administrative effort involved in managing and tuning the database
Using Infobright allows us to do pricing analyses that would not have been possible before.
- Arvind Hariharan, Senior Director Pricing and Yield Management
Solution
Now in production, the Infobright database has allowed the Yahoo! PYM team to meet the requirements listed above:
- Store all detailed data: Yahoo! is currently loading close to 30-millions records every day. This now includes all of the detailed data that the team wanted to have access to, rather than just the summary information they were able to store previously.
- Keep much more history, in much less space: Yahoo! now has the ability to store the six months of data they need for the most accurate analysis. This means that there are about 6 billion records in the database. What's more, as Infobright provides outstanding levels of data compression, the approximately 6TB of raw data that has been loaded only uses 600GB on disk (10:1 compression).
- Faster queries, flexible ad-hoc analytics: Queries that took several minutes previously now run in seconds with Infobright.
- Easier to maintain and support: This allows the PYM team to focus on their primary responsibilities rather than devoting effort to database maintenance.
We now have access to all of our detailed Web impression data, and we can keep 6x the amount of data history we could previously. This enables us to quickly determine the value of certain combinations of attributes and price accordingly.
- Arvind Hariharan, Senior Director Pricing and Yield Management
download as pdf
Next Steps
Customer Stories

Canadian Space Agency
“This [Infobright] solution permits real time compression, compact storage and quick retrieval of relevant data segments using SQL query processing of measured data. Performance of this solution along with its…
A New Approach
The Analytic Data Warehouse
Traditional data warehouse products put a tremendous burden on IT in order to create and maintain an environment that will allow users to query against large volumes of data.


