China has some of the world's worst pollution. But tracking it in all but the biggest cities can be impossible since local governments don't release any data to the public.
So researchers at the University of Wisconsin-Madison have come up with an innovative solution: If you can't follow the pollution itself, follow the complaints about it on social media.
"There's not enough information about pollution, and sometimes people suffer from heavier air pollution. We wondered, 'How can we use a new information source to help people understand [the severity of] the pollution around?'" said graduate student Shike Mei, who, along with Han Li and Jing Fan Mei, conducted a study published in the IEEE/ACM International Conference on Advances in Social Network Analysis and Mining in August.
To map pollution in areas where information is lacking, they mined the Chinese Twitter-like site Sina Weibo for posts related to air quality. The team developed a machine-learning model that would recognize posts that contained terms suggesting a bad air day -- words like "haze," "indoors" or "pollution" -- and those, such as "sunshine," that indicated clearer conditions.
The model uses the word choices and the location of their authors to estimate the air quality of a given city or region. It also factors in time-and-space correlation among cities and days, since pollution flare-ups typically cover large amounts of territory and can last for days.
Between 350,000 and 500,000 Chinese citizens die prematurely each year because of air pollution, according to the medical journal The Lancet. The country also routinely dominates the list of the world's most polluted cities and must take extreme measures just to give the appearance that the air quality is good.
Ahead of the Asia-Pacific Economic Cooperation summit held in Beijing earlier this month, Chinese state media said authorities shut down factories within 125 miles of the city center, and they halted construction work during the summit. Cars with even and odd numbered license plates were allowed on the road only on alternate days. Government workers and students were told to take a six-day holiday.
The Wisconsin researchers tested their system over 30 days in the 108 cities in China that do keep what is called air quality index data. They found the number of tweets indicating either bad or good air correlated with the levels pollution in those cities.
Jerry Zhu, an associate professor in the university's department of computer sciences and the lead researcher on the project, said the next step is for air pollution experts to take their mathematical model and apply it to the "real world." That is, tracking social media sentiment to gauge air quality in many of China's smaller cities, which are often among the most polluted and the least likely to have the resources for measurements.
"I wouldn't venture to say this will solve any air pollution problem anytime soon," Zhu said. "Social media contains valuable information that compliments other sources of information. This can be used to combat air pollution."
Mark Dredze and his team at Johns Hopkins University are also mining social media data in China to study air pollution. But rather than trying to quantify pollution levels, they are using the same methods to understand how citizens perceive pollution.
"Are they noticing it? Are they changing their behavior? Are they getting sick?" Dredze said of the work that is ongoing.
"The reason we care about perception is that it is what drives policy," he said. "We want to know where people are most sensitive about the problem and, in those cities, there will be more pressure to do something about it."