Zachary Steinert-Threlkeld has long been fascinated by crowd dynamics, especially among those drawn to mass demonstrations. As a Ph.D. candidate in political science, Steinert-Threlkeld knew that social media generated at protests were a rich source of data — but he could find few tools to help him analyze it.

Now, in a world awash with popular uprisings and social movements — from Tahrir Square in 2011 to the Women’s March following the 2017 presidential inauguration — the assistant professor of public policy at the UCLA Luskin School of Public Affairs has used data generated by millions of posts on Twitter to learn more about crowd behavior and mass motivation.

Steinert-Threlkeld created a guide for acquiring and working with data sets culled from Twitter, which has more than 320 million global accounts generating more than half a billion messages every day.

His efforts culminated this year with the publication of “Twitter as Data,” the first guide in Cambridge University Press’ new Elements series on Quantitative and Computational Methods for Social Science. The series provides short introductions and hands-on tutorials to new and innovative research methodologies that may not yet appear in textbooks.

“When I was learning this as a graduate student, there was a lot of piecing together this information,” said Steinert-Threlkeld, who said he relied on sources such as Twitter documentation and online Q&A forums such as Stack Overflow. “I was able to do it, but it would have been a lot nicer if I had a textbook to show me the lay of the land.”

Twitter as Data BookCover

Steinert-Threlkeld, whose work combines his interest in computational social science and social networks with his research on protest and subnational conflict, said the book includes an interactive online version that allows users to click on links to download information and even sample data.

“It is differently comprehensive than a book,” Steinert-Threlkeld said. He described it as a “more interactive book experience — the first in social science that does this.”

In the book, Steinert-Threlkeld writes: “The increasing prevalence of digital communications technology — the internet and mobile phones — provides the possibility of analyzing human behavior at a level of detail previously unimaginable.” He compares this to the development of the microscope, which “facilitated the development of the germ theory of disease.”

He adds: “These tools are no more difficult to learn and use than other qualitative and quantitative methods, but they are not commonly taught to social scientists.”

To remedy this, Steinert-Threlkeld provides a systematic introduction to data sources and tools needed to benefit from them.

For example, people always want to know who’s protesting and how that influences others who might protest, Steinert-Threlkeld said. Most information has been restricted to surveys, which have limitations. “And so the researcher either gets lucky and happens to have scheduled a survey that occurs during a protest, but usually it’s after the fact.”

That is what’s exciting about using big data to study crowd behavior. “It’s like people always answering surveys,” he said. “Basically, every second you’re giving me survey data. Now we can tell in real time who’s protesting.”

One application of Twitter data is estimating crowd size, Steinert-Threlkeld said. In the past, he has had to rely on reports from organizers, police and the media to gauge the size of protests. “But I’m collecting tweets with GPS coordinates so I can say, ‘Oh, there are these many tweets or these many users from L.A. at this time or Pershing Square at this time, and explain whether that’s a reliable estimate or not of actual protesters.”

Twitter information can also be used to create data based on images shared from protests, Steinert-Threlkeld said. “The work I did before was all text based: What are people saying? Who’s saying it? When are they saying it? That sort of thing. But people share a lot of images online. They share more than they did three or four years ago. It’s really where the space is moving.”

Steinert-Threlkeld said that getting data into a form that a researcher can use requires a different skill set than designing and administering a survey. “But it’s still in some ways survey-like at the end of the day,” he said.

And “it’s fun,” he said. “Now we can tell in real time who’s protesting. We don’t know where the person lives, or their income, or their name. It’s still anonymous. We don’t know if the person who shares the image was there so we’re not incriminating anyone, but we can get a lot of information about protesters that we couldn’t before.”

In the final section of his guide, Steinert-Threlkeld writes: “These data are not a ‘revolution.’ Instead, they represent the next stage in the constant increase in data available to researchers. To stay at the forefront of data analysis, one needs to know some programming in order to interface with websites and data services, download data automatically, algorithmically clean and analyze data, and present these data in low-dimension environments. The skills are modern; the change is eternal.”