Project Dreamcatcher
How cutting-edge text analytics can help the Obama campaign determine voters’ hopes and fears.
“Share your story,” Barack Obama’s Pennsylvania website encouraged
voters just before the holidays, above a text field roomy enough for
even one of the president’s own discursive answers. “Tell us why you
want to be involved in this campaign,” read the instructions. “How has
the work President Obama has done benefited you? Why are you once again
standing for change?” In Obama’s world, this is almost a tic. His
transition committee solicited
“[a]n American moment: your story” on the occasion of his inauguration.
The Democratic National Committee later asked people to “[s]hare your
story about the economic crisis.” It’s easy to see where this approach
fits into the culture of Obama’s politicking: His own career is founded
on the value of personal narratives and much of his field staff takes
inspiration from Marshall Ganz, the former labor tactician who famously
built solidarity in his organizing sessions by asking participants to
talk about their backgrounds. But might a presidential campaign have
another use for tens of thousands of mini-memoirs?
That’s the central thrust of a project under way in Chicago known by
the code name Dreamcatcher and led by Rayid Ghani, the man who has been
named Obama’s “chief scientist.” Veterans of the 2008 campaign snicker
at the new set of job titles, like Ghani’s, which have been conjured to
describe roles on the re-election staff, suggesting that they sound
better suited to corporate life than a political operation priding
itself on a grassroots sensibility. Indeed, Ghani last held the
chief-scientist title at Accenture Technology Labs, just across the
Chicago River from Obama’s headquarters. It was there that he developed
the expertise Obama’s campaign hopes can help them turn feel-good
projects like “share your story” into a source of valuable data for
sorting through the electorate.
At Accenture, Ghani mined the mountains of private data that collect
on corporate consumer servers to find statistical patterns that could
forecast the future. In one case, he developed a system to replace
health insurers’ random audits by deploying an algorithm able to
anticipate which of 50,000 daily claims are most likely to require
individual attention. (Up to 30 percent of an insurer’s resources can be
devoted to reprocessing claims.) To help set the terms of price
insurance marketed to eBay sellers, Ghani developed a model to estimate
the end-price for auctions, based on each sale item’s unique characteristics.
Often, Ghani found himself trying to help businesses find patterns in consumer behavior so that his clients could develop different strategies for different individuals. (In the corporate world, this is known as “CRM,” for customer-relationship management.) To help grocery stores design personalized sale promotions that would maximize total revenue, Ghani needed to understand how shoppers interacted with different products in relation to one another. The typical store had 60,000 products on its shelves, and Ghani coded each into one of 551 categories (like dog food, laundry detergent, orange juice) that allowed him to develop statistical models of how people build a shopping list and manage their baskets.
Ghani’s algorithms assigned shoppers scores to rate their individual
propensities for particular behaviors, like the “opportunistic index”
(“how ‘savvy’ the customer is about getting better prices than the rest
of the population”), and to see whether they had distinctive habits
(like “pantry-loading”) when faced with a price drop. If there was a
two-for-one deal on a certain brand of orange juice, Ghani’s models
could predict who would double their purchase, who would keep buying the
same amount, and who would switch from grapefruit for the week.
But Ghani realized that customers didn’t see the supermarket as a
collection of 551 product categories, or even 60,000 unique items. He
points to the example of a 1-liter plastic jug of Tropicana Low Pulp
Vitamin-D Fortified Orange Juice. To capture how that juice actually
interacted with other products in a shopper’s basket, Ghani knew the
product needed to be seen more as just an item in the “orange juice”
category. So he reduced it to a series of attributes—Brand: Tropicana, Pulp: low, Fortified with: Vitamin-D, Size: 1 liter, Bottle type: plastic
—that could be weighed by the algorithms. Now a retailer’s models could
get closer to calculating shopping decisions as customers actually made
them. A sale on low-pulp Tropicana might lure people who usually
purchased a pulpier juice, but would Florida’s Natural drinkers shift to
a rival brand? Would a two-for-one deal get those who typically looked
for their juice in a carton to stock up on plastic?
The challenge was, in essence, semantic: teaching computers to decode complex product descriptions and isolate their essential attributes. For another client, Ghani, along with four Accenture colleagues and a Carnegie Mellon computer scientist, used a Web crawler to pull product names and descriptions from online clothes stores and built an algorithm that could assess products based on eight different attributes, including “age group,” “formality,” “price point,” and “degree of sportiness.” Once the products had been assigned values in each of those categories, they could be manipulated numerically—the same way that Ghani’s predictive models had tried to make sense of the grocery shopping list. By reducing it to its basic attributes—lightweight mesh nylon material, low profile sole, standard lacing system—a retailer could predict sales for shoes it had never sold before by comparing them to ones it had.
Ghani’s clients in the corporate world were companies that “analyze
large amounts of transactional data but are unable to systematically
‘understand’ their products,” as his team wrote.
Political campaigns struggle with much the same problem. In 2008,
Obama’s campaign successfully hoarded hard data available from large
commercial databases, voter files, boutique lists, and an unprecedented
quantity of voter interviews it regularly conducted using paid phone
banks and volunteer canvassers. Obama’s analysts used the data to build
sophisticated statistical models that allowed them to sort voters by
their relative likelihoods of supporting Obama (and of voting at all).
The algorithms could also be programmed to predict views on particular
issues, and Obama’s targeters developed a few flags that predicted
binary positions on discrete, sensitive topics—like whether someone was
likely pro-choice or pro-life.
How cutting-edge text analytics can help the Obama campaign determine voters’ hopes and fears.
(Continued from Page 1)
But the algorithms the Obama campaign used in 2008—and that Mitt Romney has used so far this year—have
trouble picking up voter positions, or the intensity around those
positions, with much nuance. In other words, the analysts were getting
pretty good at sorting the orange juice drinkers from the grapefruit
juice drinkers. But they still didn’t have a great sense of why
a given voter preferred grapefruit to O.J.—and how to change his mind.
Polls seemed unable to get at an honest hierarchy of personal priorities
in a way that could help target messages. Before the 2008 Iowa
caucuses, every Democrat’s top concern seemed to be opposition to the
Iraq war; once Lehman Bros. collapsed not long after the conventions,
the economy became the leading issue across demographic and ideological
groups. But microtargeting surveys were unable to burrow beneath that
surface unanimity to separate individual differences in attitudes toward
the war or the economy. If a voter writes in a Web form that her top
concern is the war in Afghanistan, should she should be asked to enlist
as a “Veterans for Obama” volunteer, or sent direct mail written to
placate foreign-policy critics?
Campaigns do, however, take in plenty of information about what
voters believe, information that is not gathered in the form of a poll.
It comes in voters’ own words, often registered onto the clipboards of
canvassers, during a call-center phone conversation, in an online signup
sequence or a stunt like “share your story.” As part of the
Dreamcatcher project, Obama campaign officials have already set out to
redesign the “notes” field on individual records in the database they
use to track voters so that it sits visibly at the top of the
screen—encouraging volunteers to gather and enter that information. And
they’ve made the field large enough to include the “stories” submitted
online. (One story was 60,000 text characters long.)
What can the campaign do with this blizzard of text snippets?
Theoretically, Ghani could isolate keywords and context, then use
statistical patterns gleaned from the examples of millions of voters to
discern meaning. Say someone prattles on about “the auto bailout” to a
volunteer canvasser: Is he lauding a signature domestic-policy
achievement or is he a Tea Party sympathizer who should be excluded from
Obama’s future outreach efforts? An algorithm able to interpret that
voter’s actual words and sort them into categories might be able to make
an educated guess. “They’re trying to tease out a lot more nuanced
inferences about what people care about,” says a Democratic consultant
who worked closely with Obama’s data team in 2008.
Obama’s campaign has boasted that one of their priorities this year is something they’ve described only as “microlistening,”
but would officially not discuss how they intend to deploy insights
gleaned from their new research into text analytics. “We have no plans
to read out our data/analytics/voter contact strategy,” spokesman Ben
LaBolt writes by email. “That just telegraphs to the other guys what
we're up to.”
Yet those familiar with Dreamcatcher describe it as a bet on text
analytics to make sense of a whole genre of personal information that no
one has ever systematically collected or put to use in politics.
Obama’s targeters hope the project will allow them to make more
sophisticated decisions about which voters to approach and what to say
to them. “It’s not about us trying to leverage the information we have
to better predict what people are doing. It’s about us being better
listeners,” says a campaign official. “When a million people are talking
to you at once it’s hard to listen to everything, and we need text
analytics and other tools to make sense of what everyone is saying in a
structured way.”
No comments:
Post a Comment