In this study, we develop methods to identify verbal expressions in social media streams that refer to real-world activities. Using aggregate daily patterns of Foursquare checkins, our methods extract similar patterns from Twitter, extending the amount of available content while preserving high relevance. We devise and test several methods to extract such content, using timeseries and semantic similarity. Evaluating on key activity categories available from Foursquare (coffee, food, shopping and nightlife), we show that our extraction methods are able to capture equivalent patterns in Twitter. By examining rudimentary categories of activity such as nightlife, food or shopping we peek at the fundamental rhythm of human behavior and observe when it is disrupted. We use data compiled during the abnormal conditions in New York City throughout Hurricane Sandy to examine the outcome of our methods.
With the growing interest in how online sedentary activity can mediate offline health practices, we present a study of social media activity related to personal health and fitness. We aim to identify the type of content and motivations for sharing health-related activity in social media outlets. To this end, we performed a qualitative analysis of Twitter posts, as well as an extensive set of interviews with experienced users who post messages on Twitter about exercise, diet, and weight loss activities. The qualitative analysis exposes varying levels of activity actualization and message sentiment. The interviews help us reason about the users practices and motivations for posting activity related to the pursuit and maintenance of volitional health behaviors. Our findings extend existing theoretical frameworks and can inform the design of technology that uses social media to help initiate and maintain challenging activities like exercise and diet.
As more people tweet, check-in and share pictures and videos of their daily experiences in the city, new opportunities arise to understand urban activity. When aggregated, these data can uncover invaluable local insights for local stakeholders such as journalists, first responders and city officials. To bet- ter understand the needs and requirements for this kind of aggregation tools, we perform an exploratory study that in- cludes interviews with 12 domain experts that utilize local information on a daily basis. Our results shed light on current practices, existing tools and unfulfilled needs of these profes- sionals. We use these findings to discuss the requirements for hyper-local social media data aggregation tools for the study of cities on a large scale. We outline a list of key features that can better serve the discovery of patterns and insights about both real-time activity and historical perspectives of lo- cal communities.
This paper presents an ethnographic fieldwork on Foursquare, a location-based social network and recommendations application for mobile phones. Contrary to its intended purpose of location sharing among friends, I discuss how the service offers users to engage in local interactions with strangers based on Stanley Milgram’s concept of the ‘familiar stranger’. By revisiting this theory, I proffer the notion of a ‘Networked Familiar Stranger,’ an updated term influenced by both the virtual sphere and the local physical place, nuanced by three main factors: shared stranger, enhanced storytelling and virtual filtering. I conclude by discussing the possible implication these services might have for interactions among local communities.
Geo-tagged social media data provide a way to uncover users’ perception of the city, namely, their mental map. In this article we examine the elements that assemble an online mental map and discuss the differences between the depiction of an individual and collective mental maps using these data.
People turn to social media to express their emotions surrounding major life events. Death of a loved one is one scenario in which people share their feelings in the semi- public space of social networking sites. In this paper, we present the results of a two-part investigation of grief and distress in the context of messages posted to the profiles of deceased MySpace users. We present coding system for identifying emotion distressed content, followed by a detailed analysis of language use that lays a foundation for natural language processing (NLP) tasks, such as automatic detection of bereavement-related distress. Our findings suggest that in addition to words bearing positive or negative sentiment, linguistic style can be an indicator of messages that demonstrate distress in the space of post- mortem social media content. These results contribute to research in computational linguistics by identifying linguistic features that can be used for automatic classification as well as to research on death and bereavement by enumerating attributes of distressed self- expression in a post-mortem context.
Social media activity in different geographic regions can ex- pose a varied set of temporal patterns. We study and charac- terize diurnal patterns in social media data for different ur- ban areas, with the goal of providing context and framing for reasoning about such patterns at different scales. Using one of the largest datasets to date of Twitter content associated with different locations, we examine within-day variability and across-day variability of diurnal keyword patterns for dif- ferent locations. We show that only a few cities currently pro- vide the magnitude of content needed to support such across- day variability analysis for more than a few keywords. Never- theless, within-day diurnal variability can help in comparing activities and finding similarities between cities.
Social media is already a fixture for reporting for many journalists, especially around breaking news events where non-professionals may already be on the scene to share an eyewitness report, photo, or video of the event. At the same time, the huge amount of content posted in conjunction with such events serves as a challenge to finding interesting and trustworthy sources in the din of the stream. In this paper we develop and investigate new methods for filtering and assessing the verity of sources found through social media by journalists. We take a human centered design approach to developing a system, SRSR (“Seriously Rapid Source Review”), informed by journalistic practices and knowledge of information production in events. We then used the system, together with a realistic reporting scenario, to evaluate the filtering and visual cue features that we developed. Our evaluation offers insights into social media information sourcing practices and challenges, and highlights the role technology can play in the solution.
In terms of technological change and participatory media, the phenomenon of taking and sharing videos of live music events offers an insightful case study for discussing the individual production of online content and interpersonal interactions on social media sites. We use interviews with YouTube users who post videos of live music events to investigate motivations for the capture of personal video recordings, the protocols for sharing of videos, and the roles videos play in online fan activities. Analysis of interviews identifies key motivations for capture and sharing, and exposes tensions between short- and long-term goals of these activities. Further, the results expose differences in attitudes, motivations and practices between mainstream and ‘indie’ concert goers. These findings have implications for understanding participation on social media sites, as well as broader issues of online communities, fan cultures and individual production of media.
User-contributed Web data contains rich and diverse information about a variety of events in the physical world, such as shows, festivals, conferences and more. This information ranges from known event features (e.g., title, time, location) posted on event aggregation platforms (e.g., Last.fm events, EventBrite, Facebook events) to discussions and reactions related to events shared on different social media sites (e.g., Twitter, YouTube, Flickr). In this paper, we focus on the challenge of automatically identifying user-contributed con- tent for events that are planned and, therefore, known in advance, across different social media sites. We mine event aggregation platforms to extract event features, which are often noisy or missing. We use these features to develop query formulation strategies for retrieving content associated with an event on different social media sites. Further, we explore ways in which event content identified on one social media site can be used to retrieve additional relevant event content on other social media sites. We apply our strategies to a large set of user-contributed events, and analyze their effectiveness in retrieving relevant event content from Twitter, YouTube, and Flickr.
By examining the information practices of a punk-rock subculture, we investigate the limits of social media systems, particularly limits exposed by practices of secrecy. Looking at the exchange of information about “underground” shows, we use qualitative interviews to examine uses of social media among fans. This initial analysis centers on understanding the tactical practices of information and technology to avoid police detection, particularly by comparing uses of more traditional online forums, such as message boards, with social network sites, such as Facebook. Understanding the uses and preferences for distinct technologies sheds light on how localized social context drives technological use. These findings are furthermore useful in their implications for design of applications sensitive to granular needs of users for secrecy.
Social media platforms such as Twitter garner significant attention from very large audiences in response to real-world events. Automatically establishing who is participating in information production or conversation around events can improve event content consumption, help expose the stakeholders in the event and their varied interests, and even help steer subsequent coverage of an event by journalists. In this paper, we take initial steps towards building an automatic classifier for user types on Twitter, focusing on three core user categories that are reflective of the information production and consumption processes around events: organizations, journalists/media bloggers, and ordinary individuals. Exploration of the user categories on a range of events shows distinctive characteristics in terms of the proportion of each user type, as well as differences in the nature of content each shared around the events.
User-contributed messages on social media sites such as Twitter have emerged as powerful, real-time means of information sharing on the Web. These short messages tend to reflect a variety of events in real time, making Twitter particularly well suited as a source of real-time event content. In this paper, we explore approaches for analyzing the stream of Twitter messages to distinguish between messages about real-world events and non-event messages. Our approach relies on a rich family of aggregate statistics of topically similar message clusters. Large-scale experiments over millions of Twitter messages show the effectiveness of our approach for surfacing real-world event content on Twitter.
Social media sites such as Twitter contain large amounts of user contributed messages for a wide variety of real-world events. While some of these \"event messages\" might contain interesting and useful information (e.g., event time, location, participants, opinions), others might provide little value (e.g., using heavy slang, incomprehensible language) to people interested in learning about an event. Techniques for effective selection of quality event content may therefore help improve applications such as event browsing and search. In this paper, we explore approaches for finding representative messages among a set of Twitter messages that correspond to the same event, with the goal of identifying high quality, relevant messages that provide useful event information. We evaluate our approaches using a large-scale dataset of Twitter messages, and show that we can automatically select event messages that are both relevant and useful.
Twitter, Facebook, and other related systems that we call social awareness streams are rapidly changing the information and communication dynamics of our society. These systems, where hundreds of millions of users share short messages in real time, expose the aggregate interests and attention of global and local communities. In particular, emerging temporal trends in these systems, especially those related to a single geographic area, are a significant and revealing source of information for, and about, a local community. This study makes two essential contributions for interpreting emerging temporal trends in these information systems. First, based on a large dataset of Twitter messages from one geographic area, we develop a taxonomy of the trends present in the data. Second, we identify important dimensions according to which trends can be categorized, as well as the key distinguishing features of trends that can be derived from their associated messages. We quantitatively examine the computed features for different categories of trends, and establish that significant differences can be detected across categories. Our study advances the understanding of trends on Twitter and other social awareness streams, which will enable powerful applications and activities, including user-driven real-time information services for local communities.
We investigate the breaking of ties between individuals in the online social network of Twitter, a hugely popular social media service. Building on sociology concepts such as strength of ties, embeddedness, and status, we explore how network structure alone influences tie breaks -- the common phenomena of an individual ceasing to "follow" another in Twitter's directed social network. We examine these relationships using a dataset of 245,586 Twitter "follow" edges, and the persistence of these edges after nine months. We show that structural properties of individuals and dyads at Time 1 have a significant effect on the existence of edges at Time 2, and connect these findings to the social theories that motivated the study.
This work explores the intersection between infographics and games by examining how to embed meaningful visual analytic interactions into game mechanics that in turn impact user behavior around a data-driven graphic. In contrast to other methods of narrative visualization, games provide an alternate method for structuring a story, not bound by a linear arrangement but still providing structure via rules, goals, and mechanics of play. We designed two different versions of a game-y infographic, Salubrious Nation, and compared them to a non-game-y version in an online experiment. We assessed the relative merits of the game-y approach of presentation in terms of exploration of the visualization, insights and learning, and enjoyment of the experience. Based on our results, we discuss some of the benefits and drawbacks of our designs. More generally, we identify challenges and opportunities for further exploration of this new design space.
The relationship between social sharing of emotions, social networks and social ties is an ongoing topic of research. Such sharing of emotions occurs frequently in "social awareness streams" platforms like Twitter and Facebook. We use Twitter to address research questions about the association of properties of a user's network, such as size and density, with expression of emotion in the user's Twitter posts. Our analysis suggests that expression of emotion can explain some of the variance in users' Twitter networks, and that the use of emotion in interactions between users is a strong explaining factor.
With the growth in sociality and interaction around online news media, news sites are increasingly becoming places for communities to discuss and address common issues spurred by news articles. The quality of online news comments is of importance to news organizations that want to provide a valuable exchange of community ideas and maintain credibility within the community. In this work we examine the complex interplay between the needs and desires of news commenters with the functioning of different journalistic approaches toward managing comment quality. Drawing primarily on newsroom interviews and reader surveys, we characterize the comment discourse of SacBee.com, discuss the relationship of comment quality to both the consumption and production of news information, and provide a description of both readers' and writers' motivations for usage of news comments. We also examine newsroom strategies for dealing with comment quality as well as explore tensions and opportunities for value-sensitive innovation within such online communities.
We present an automatic method which leverages word lengthening to adapt a sentiment lexicon specifically for Twitter and similar social messaging networks. The contributions of the paper are as follows. First, we call attention to lengthening as a widespread phenomenon in microblogs and social messaging, and demonstrate the importance of handling it correctly. We then show that lengthening is strongly associated with subjectivity and sentiment. Finally, we present an automatic method which leverages this association to detect domain-specific sentiment- and emotion-bearing words. We evaluate our method by comparison to human judgments, and analyze its strengths and weaknesses. Our results are of interest to anyone analyzing sentiment in microblogs and social networks, whether for research or commercial purposes.
In recent years, various Web-based sharing and community services such as Flickr and YouTube have made a vast and rapidly growing amount of multimedia content available online. Uploaded by individual participants, content in these immense pools of content is accompanied by varied types of metadata, such as social network data or descriptive textual information. These collections present, at once, new challenges and exciting opportunities for multimedia research. This article presents an approach for \"social multimedia\" applications. The approach is based on the experience of building a number of successful applications that are based on mining multimedia content analysis in social multimedia context.
Journalists increasingly turn to social media sources such as Facebook or Twitter to support their coverage of various news events. For large-scale events such as televised debates and speeches, the amount of content on social media can easily become overwhelming, yet still contain information that may aid and augment reporting via individual content items as well as via aggregate information from the crowd's response. In this work we present a visual analytic tool, Vox Civitas, designed to help journalists and media professionals extract news value from large-scale aggregations of social media content around broadcast events. We discuss the design of the tool, present the text analysis techniques used to enable the presentation, and provide details on the visual and interaction design. We provide an exploratory evaluation based on a user study in which journalists interacted with the system to explore and report on a dataset of over one hundred thousand twitter messages collected during the U.S. State of the Union presidential address in 2010.
In recent years we have witnessed a significant growth of social-computing communities -- online services in which users share information in various forms. As content contributions from participants are critical to the viability of these communities, it is important to understand what drives users to participate and share information with others in such settings. We extend previous literature on user contribution by studying the factors that are associated with various forms of participation in a large online photo-sharing community. Using survey and system data, we examine four different forms of participation and consider the differences between these forms. We build on theories of motivation to examine the relationship between users' participation and their motivations with respect to their tenure in the community. Amongst our findings, we identify individual motivations (both extrinsic and intrinsic) that underpin user participation, and their effects on different forms of information sharing; we show that tenure in the community does affect participation, but that this effect depends on the type of participation activity. Finally, we demonstrate that tenure in the community has a weak moderating effect on a number of motivations with regard to their effect on participation. Directions for future research, as well as implications for theory and practice, are discussed.
Social media sites (e.g., Flickr, YouTube, and Facebook) are a popular distribution outlet for users looking to share their experiences and interests on the Web. These sites host substantial amounts of user-contributed materials (e.g., photographs, videos, and textual content) for a wide variety of real-world events of different types and scale. By automatically identifying these events and their associated user-contributed social media documents, which is the focus of this paper, we can enable event browsing and search in state-of-the-art search engines. To address this problem, we exploit the rich "context" associated with social media content, including user-provided annotations (e.g., title, tags) and automatically generated information (e.g., content creation time). Using this rich context, which includes both textual and non-textual features, we can define appropriate document similarity metrics that will enable online clustering of media to events. As a key contribution of this paper, we explore a variety of techniques for learning multi-feature similarity metrics for social media documents in a principled manner. We evaluate our techniques on large-scale, real-world datasets of event images from Flickr. Our evaluation results suggest that our approach identifies events, and their associated social media documents, more effectively than the state-of-the-art strategies on which we build.
In this work we examine the characteristics of social activity and patterns of communication on Twitter, a prominent example of the emerging class of communication systems we call "social awareness streams." We use system data and message content from over 350 Twitter users, applying human coding and quantitative analysis to provide a deeper understanding of the activity of individuals on the Twitter network. In particular, we develop a content-based categorization of the type of messages posted by Twitter users, based on which we examine users' activity. Our analysis shows two common types of user behavior in terms of the content of the posted messages, and exposes differences between users in respect to these activities.
Social media sites such as Flickr, YouTube, and Facebook host substantial amounts of user-contributed materials (e.g., photographs, videos, and textual content) for a wide variety of real-world events. These range from widely known events, such as the presidential inauguration, to smaller, community-specific events, such as annual conventions and local gatherings. By identifying these events and their associated user-contributed social media documents, which is the focus of this paper, we can greatly improve local event browsing and search in state-of-the-art search engines. To address our problem of focus, we exploit the rich \"context\" associated with social media content, including user-provided annotations (e.g., title, tags) and automatically generated information (e.g., content creation time). We form a variety of representations of social media documents using different context dimensions, and combine these dimensions in a principled way into a single clustering solution---where each document cluster ideally corresponds to one event---using a weighted ensemble approach. We evaluate our approach on a large-scale, real-world dataset of event images, and report promising performance with respect to several baseline approaches. Our preliminary experiments suggest that our ensemble approach identifies events, and their associated images, more effectively than the state-of-the-art strategies on which we build.
In recent years, we have witnessed a significant growth of "social computing" services, or online communities where users contribute content in various forms, including images, text or video. Content contribution from members is critical to the viability of these online communities. It is therefore important to understand what drives users to share content with others in such settings. We extend previous literature on user contribution by studying the factors that are associated with users' photo sharing in an online community, drawing on motivation theories as well as on analysis of basic structural properties. Our results indicate that photo sharing declines in respect to the users' tenure in the community. We also show that users with higher commitment to the community and greater "structural embeddedness" tend to share more content. We demonstrate that the motivation of self-development is negatively related to photo sharing, and that tenure in the community moderates the effect of self-development on photo sharing. Directions for future research, as well as implications for theory and practice are discussed.