WSTNet Interview: Dhiraj Murthy

Q. Dhiraj – thanks for taking the time to speak to us today. Even though your Lab is our newest WSTNet member, you’ve been analysing media for long time. Can you talk to us about your journey into social media and Web Science?

A. Of Course. My interest started doing post-graduate research in Sociology at Cambridge where I was looking at how traditional (non-technical) social spaces and interactions were being supplemented, augmented and even replaced by what (at the time) were new technologies such as blogs, newsgroups and forums (all async, pre-web technologies). I was particularly interested in how group identities were affected by the technologies that were mediating the interactions. I looked at the way in which musicians collaborated using technology vs. live interactions and how this might impact participants’ sense of identity, say, in terms of race/ethnicity. My work went on to look at impacts and opportunities around natural disasters such as hurricanes and we’ve gone on to study several US as well as international incidents. All of this work required the collection and analysis of what we could broadly call (in todays terms) social network data, which could be classified, mapped and visualised to help uncover and understand the observed effects. That process has been going on for me since 2006.

Q. Even those early projects and research questions still sound very relevant and quite contemporary. Would you say that during that time the process of developing research questions has remained relatively stable whilst the types and volumes of data that can be employed have changed rather more substantially? 

A. I witnessed a great deal of competitive system development for social messaging systems working internationally during the dot-com era and the key changes seemed to the ubiquity of messaging standards like the SMS text message (a 140 character format that Twitter later adopted) based on the growth of mobile networks and, critically, the rise of social “platforms” like Twitter (and later WhatsApp) which transformed the culture of what had been a private point-to-point messaging model into high-speed, real-time shared messaging spaces with API’s that (initially) disclosed information (both data and metadata) about the networks of content and networks of users across multiple locations. Twitter was instrumental in developing this model into what became a new opportunity for disciplines like Web Science to do detailed analysis on huge data sets.

Q. So if the availability of data and data types have driven/enabled the research in this way what has been your experience with the transition of Twitter to X and the loss of access to the Twitter API? 

A. The broader issues with social media APIs, data scraping bans and the resulting legal battles have obviously shaped the way in which data can be gathered/analysed and, arguably, is transforming (has transformed) what it means to do Web Science; but equally we have seen a continuing trend in which the Web/Internet overall has become less overtly text-based and much more visual with the enormous growth in video platforms. This means that as Web Scientists we have had to innovate and develop new/better techniques around computer vision, video analysis and the currently available data sets to do quality research. We now combine our data archives with new data, new ways to annotate and analyse data using mixed methods to be able to work with “small data” at a more personal level vs the level of firehose (i.e., complete) data sets that are not currently (no longer) available.

Q. If you are looking at more video data will the recent rise of high-quality (deep fake) AI generated video cause you particular difficulties?

A. Well bots and fake data have been around (in a smaller way) since the very beginning – there were simple bots to be found in early news groups so fake data and bots are not a new thing at all – though the scale and sophistication of the most recent examples is obviously more concerning and hence we are also looking at how we might better detect bad data and misinformation. 

Q.. Is that your main area of interest?

A. Not only that. We continue to look at ways in which the Web may (dis)empower society and how we might identify and promote (or inoculate against) those effects. We continue to look at group social behaviours during natural disasters where we have followed a number of US and international hurricane events. We’ve studied how cancer is reported on Twitter and how this relates to disease incidence across regions/groups as well the enculturation of young people into vaping (i.e., e-cigarettes) and how much impact social media images and messages may have in that process. But we have also been looking at identifying misinformation and tools (beyond labelling) to help users identify misleading information and how it spreads. 

Q. Many thanks for spending time to talk about your work and the Computational Media Lab. We have listed some papers and a link to your website below.

 

Dhiraj Murthy is the head of the WSTNet Computational Media Lab at the University of Texas at Austin.

To read more about Dhiraj’s work and the Austin Lab click below

 

WSTNet PhD Interview: Sungwon Jung

Q. Thanks for joining me Sungwon – could you tell us a little about yourself?

A. I’m a doctoral student in Journalism & Media at Uni Texas at Austin with the media focus more on social media. I’m really interested in what social media can tell us about group behaviour.

Q.. How did you come to be interested in the Web and Web Science methods?

A. I guess like a lot of other colleagues it comes from an interdisciplinary background: My Batchelors was in Sociology – I got interested in how people come together to take collective actions (so-called network actions) and the processes underlying that. To understand that I thought that computational methods would be really helpful and so I got a Masters in Data Science which ultimately led me to researching into a social media data as a proxy for how people act and interact.

Q. What shape does that take?

A. Broadly speaking I am using computational methods to look at how people behave on social media platforms where individual actions may become collective actions (via networks) and the extent to which this might predict/explain larger societal actions

Q. What projects have you been working on?

A. Initially I worked on the issues of political polarisation between different Indian groups using TikTok data where the chief focus was on polarisation between Indian diaspora groups vs. Indian homeland groups though there were also religious divisions between Hindu and Muslim groups.

Q. So within religious groups there would have been a common common cultural background but differences in social environment coming from local influences in India or overseas. Interesting.

A. We were looking to develop new techniques to study social media data both in terms of the content of the messages as well as metadata from hashtags. This can be quite challenging to interpret as a researcher without an Indian cultural background as in the case of group hashtags such as #NRI #Modi NRI being “Non-resident Indian” and Modi a leading political figure in Indian politics so were are dealing with a user-developed “Folksonomy” vs a more formal taxonomy.

Q.. What is your current research focussing on?

A. Now I am working with AI-based vision and data science techniques to study the impact of social media on health using social media data on Vaping and e-Cigarettes. We believe social media influences/shapes young peoples’ understanding of smoking/vaping health outcomes and at this early stage of understanding vaping health issues, social influence and peer pressure are potentially very important.

Q. In the same way that media depictions (Movies and TV) shaped the perception of tobacco usage for earlier generations of young people?

A. Exactly. The average age of users here is 18-25 in this TikTok group and may well be significantly affected by peer pressure on social media.
e.g. VapeCloud competitions displays bragging rights/status about the size of cloud that can be produced

Q. Presumably whilst we would observe that this is less negative than, say, competitive self-harm or anorexia support group, nonetheless this involves group behaviour and peer pressure.

A. Exactly. We also observed significant amounts of co-reporting (tacking on) of Vaping to other activities:
e.g., “I am playing X + vaping” or “I am doing Y + vaping” . So I am also interested in why these groups are reporting vaping in other contexts.

Q. How are you looking at the data?

A. I’m using TikTok (meta) data around the posting and developing computer vision techniques to look at images and video. That way we analyse the post itself in terms of the image/video as well as any annotation from metadata/tags. We analyse the post with image analysis, video speech-to-text conversion plus user text descriptions and tags. There is no TikTok API so we need to scrape manually.

Q. What are the challenges here?

A. Whilst it is not hard to get data it may be harder to confirm that it is valid/complete. We may not be looking at all the relevant hashtags (and these may change over time) and posts may include target hashtags even when the post is not actually focussed on vaping #vape – perhaps users are including popular hashtags in the post to get more likes. The data itself is largely unstructured and so we have to do more cross-checking since we know that however good our analytical approach may be, if the source data is flawed then we are going to get unreliable results: garbage-in-garbage-out. This will be especially true for image / video analysis as we are starting to see challenges in terms of fake data from bots and LLM’s and the current rise of AI video where AI content (deep fakes etc) are polluting data streams which may distort our research findings. Ultimately we can try to analyse what is happening but the causes may remain elusive. Why do they vape and even compete at vaping? What are the underlying models driving the behaviour? Social science research at this scale was previously not possible (i.e., analysing 50 paper questionnaires vs 50 million social media data points). This is the new norm and seems impressive but whilst it is much easier to gather more data than ever we need to worry more about quality than ever.

Q. What are the future objectives for this research?

A. Understanding vaping as a “normal” activity vs deviant activity. Understanding social bonding and competitive behaviour. Looking at the idea of “Vape” vs “Vape challenge”. Looking at how social rewards correlate with individual behaviour creating larger network (group) behaviours and the extent to which these behaviours buy group membership getting the user more attention and higher status.

Q. Thanks for speaking to me today and good luck with the rest of your research.

Sungwon Jung is a doctoral student in Journalism & Media at the University of Texas at Austin.

She is interested in the impacts of social media on health and in studying how individual actions can become collective (network) actions.

Can this approach shed any light on future health trends and the importance of messaging for young people as they form more/less healthy habits as part of social learning? 

WSTNet Interview: Matt Weber

Ian: Matt, it feels strange to welcome you as a more recent Lab Director when I think I’ve known you as part of the Web Science community for at least 10 years

Matt: Probably longer – I think my interest in Web Science and particularly Web data goes back to the very first Web Science conference in 2010 and perhaps before that.

Ian: So was Web Data your point of entry to Web Science?

Matt: Thats right, I’d spent a lot of time looking what was thought of as archived web data and trying to render those as large-scale researchable data collections. We went through a number of iterations from a system called Hub Zero through to Archives Unleashed and most recently that work was integrated into the Internet Archive research services by a team at University of Waterloo so that people who are looking to extract value and sound research conclusions from these data sets can find them and access them through well -supported high quality platforms and tools.

Ian: How hard it is to get everyone involved?

Matt: Well one of the major challenges is trying to get people to share and engage with these data sets outside of tightly controlled commercial offerings.

Ian: Well we’ve certainly seen Palantir, Recorded Future et al. work to derive interesting conclusions and predictions from large data sets like this.

Matt: I think the difference here is partly that many users (even if they are data rich) are much less interested in creating/curating data sets than they are in using them. We’ve seen humanities, CIS and engineering groups all derive huge benefits from well-curated third-party data. Getting those groups to create and share their own data too is tough without aligning the process with their academic objectives and the academic recognition system.

Ian: Has anyone cracked that problem in this space?

Matt: The Harvard Dataverse is an attractive platform which hosts data sets and generates benefits for both the contributors and the community as a whole by tracking/reporting which datasets are downloaded via a data DOI.

Ian: Which translates to recognisable impact in academe?

Matt: Absolutely, I had a data set which I was able to show had been downloaded more than 35’000 times. Thats significant impact.

Ian: So lets talk about the NetSci lab at Rutgers

Matt: This is a collaboration between a great team of leading academics in Communication, Information Science, and Journalism who are addressing a wider view of Human Networks interacting through Technological Networks as well as other contexts.

Ian: What is your current focus?

Matt: We are looking at systems of local information that feed/support their communities and how this intersects with the phenomena of misinformation. We’ve mapped the transition to more regional news structure and a steady decline in the production of quality local news (critical information, politics, education, disaster/safety) in favour of less substantial/serious content (sports, human interest etc) which, whilst potentially of interest, does little to support a local communities in more serious situations.

Ian: Do users simply live with less local content as a result?

Matt:In fact, this gap in local news coverage tends to increase the use of (local) social media such as Next Door and Facebook for new, where stories are largely unverified, not edited by a third party and, in some cases, anonymous. This leads to a greater risk that the information provided may be misinformation or even malicious.

Ian: How serious is the potential impact?

Matt: For example we have seen a troubling loss of local news connections between communities and infrastructure providers such that in the event of power outages in adverse weather events there is no longer a trusted independent local news source to disseminate news updates, timetables and disaster response information from the power company to the community but only what potentially poorly informed social media commentators may be saying. We are focused on better understanding the impact of the loss of a robust and trusted connection between physical systems and information systems.

Ian: What could be a potential response to address this disconnect?

Matt: We are considering the process of re-establishing a trust-based relationship between communities and service providers (industrial, government) via trusted intermediaries – a role that quality news/media organisations used to fill.

Ian: This sounds like really interesting work

Matt: We don’t believe we are even close to seeing the potential impact of mis-information – both inadvertent or even the weaponisation of (dis)information as it will continue to affect local and national news and our understanding of the truth.

Ian: Thanks for speaking to me today and welcome to the WSTNet.

WSTNet Lab Profile: Cardiff HateLab

Cardiff University is the home of a WSTNet lab with two related, but distinct groups: Pete Burnap’s Social Data Lab (based on data visualisation and analysis using COSMOS) which makes social media analysis much more accessible for non-coding academics and also Matt William’s HateLab which uses a COSMOS-based dashboard to identify and analyse hate speech structures and trends in a range of social media sources across modern forms of on-line hate including racial, political, gender and religious intolerance.  

Williams (who holds a chair in Criminology at Cardiff) has been researching the clues left in social media since 2011 but was frustrated by the lack of tools/accessibility for any but the most skilled coders and worked with Prof. Pete Burnap to develop a more user-friendly toolset called COSMOS which allows researchers to focus on the meanings and interpretations of social media data rather than the underlying technologies.

With new tools/possibilities delivered by COSMOS, new research questions began to surface and the “Hate Speech and Social media” project was launched in 2013. This led to the founding of the HateLab where Matt has been director since 2017 where his group has attracted more than £3m in funding. He has published a series of papers and in 2021 he published a summary of more than 20 years research in his book The Science of Hate

 

HateLab could be seen as something of a poster child for Web Science having been featured widely in the press and the media with HateLab research being covered in: LA TimesNew York PostThe Guardian (also here), The Times (also here and here), The Financial TimesThe IndependentTelegraph (also here), TortoiseNew ScientistPoliticoBBC NewsThe RegisterComputerWeeklyVerdictSky NewsTechWorld and Police Professional. On TV, their research underpinned an episode of BBC One’s Panorama, an episode of ITV’s Exposure and an ITV NEWS special report. HateLab is been used as part of the National Online Hate Crime Hub announced by the UK Home Secretary in 2017

HateLab collects data from several platforms including Twitter (They have also been highlighted by Twitter as a featured developer partner), 4Chan, Telegram and Reddit and the tools look for trends and patterns using AI techniques which link the timing, causality and impacts which can link physical acts of violence whilst the appearance and timing of hate speech. Williams has found certain patterns and timings in his work (he calls it the “half-life” of hate speech and this may be critical in understanding how to manage/calm/delay responses in on-line communities if strong reactions (esp. physical reactions to online hate speech) are seen to quickly fade and be much more temporary in nature than other forms of crime.

Whilst it is perhaps clear that real-world “trigger” events (such as Covid, Brexit, Trump speeches, London Bridge attacks etc.) can/do give rise to waves of on-line reactions (with hate being the least desirable of these) it is perhaps less obvious (and more interesting) to consider that a certain level and timing of hate speech might be associated with, and contribute to, higher levels of physical violence. HateLab is looking at the possibility of developing predictive models which not only allow non-academic groups how to gauge and better manage different types of hate speech and volatile communities on-line but might also help to prevent on-line hate spilling over into physical violence.

The recent case of Ex-President Trump and his on-line incitement to “march on the capital building” being a chilling example of the need for this sort of model.  

We asked Matt about his take on the new owner at Twitter and how Musk’s view on free speech might affect his research and his overall objective to reduce hate-speech …  

 “Twitter have been really busy since 2015 trying to manage the whole on-line harm issue and frankly they’ve done a pretty good job – They’ve employed huge numbers of moderators that have ensured that a lot of the more unpleasant material that is ON the platform (and that we have access to via the API for research purposes) is not VISIBLE on the platform where ordinary users can be harmed by it. There is obviously a trade-off between the notion of on-line harm and freedom of speech and we’ll have to wait and see what effect Elon’s new policies have on the resurgance of what is thought to be harmful content. Certainly we’ve seen a reduction in the amount of hatespeech across the twitter API over recent months/years but its unclear whether users have migrated to more tolerant platforms or whether the Twitter filtering is now being reflected in the API output. Overall we’ve had a very positive relationship with Twitter and we’d obviously like to continue to work with them”.

DISCLOSURE:

I have to admit to being just a tiny bit disappointed that Matt is not also the brains behind HateLab: the London-based cyberpunk band which I stumbled on when googling more about his work 😉

Government agencies are tapping a facial recognition company to prove you’re you – here’s why that raises concerns about privacy, accuracy and fairness

 

 Beginning this summer, you might need to upload a selfie and a photo ID to a private company, ID.me, if you want to file your taxes online.

Oscar Wong/Moment via Getty Images

James Hendler, Rensselaer Polytechnic Institute

The U.S. Internal Revenue Service is planning to require citizens to create accounts with a private facial recognition company in order to file taxes online. The IRS is joining a growing number of federal and state agencies that have contracted with ID.me to authenticate the identities of people accessing services.

The IRS’s move is aimed at cutting down on identity theft, a crime that affects millions of Americans. The IRS, in particular, has reported a number of tax filings from people claiming to be others, and fraud in many of the programs that were administered as part of the American Relief Plan has been a major concern to the government.

The IRS decision has prompted a backlash, in part over concerns about requiring citizens to use facial recognition technology and in part over difficulties some people have had in using the system, particularly with some state agencies that provide unemployment benefits. The reaction has prompted the IRS to revisit its decision.

a webpage with the IRS logo in the top left corner and buttons for creating or logging into an account

 

 

 

Here’s what greets you when you click the link to sign into your IRS account. If current plans remain in place, the blue button will go away in the summer of 2022.
Screenshot, IRS sign-in webpage

As a computer science researcher and the chair of the Global Technology Policy Council of the Association for Computing Machinery, I have been involved in exploring some of the issues with government use of facial recognition technology, both its use and its potential flaws. There have been a great number of concerns raised over the general use of this technology in policing and other government functions, often focused on whether the accuracy of these algorithms can have discriminatory affects. In the case of ID.me, there are other issues involved as well.

ID dot who?

ID.me is a private company that formed as TroopSwap, a site that offered retail discounts to members of the armed forces. As part of that effort, the company created an ID service so that military staff who qualified for discounts at various companies could prove they were, indeed, service members. In 2013, the company renamed itself ID.me and started to market its ID service more broadly. The U.S. Department of Veterans Affairs began using the technology in 2016, the company’s first government use.

To use ID.me, a user loads a mobile phone app and takes a selfie – a photo of their own face. ID.me then compares that image to various IDs that it obtains either through open records or through information that applicants provide through the app. If it finds a match, it creates an account and uses image recognition for ID. If it cannot perform a match, users can contact a “trusted referee” and have a video call to fix the problem.

A number of companies and states have been using ID.me for several years. News reports have documented problems people have had with ID.me failing to authenticate them, and with the company’s customer support in resolving those problems. Also, the system’s technology requirements could widen the digital divide, making it harder for many of the people who need government services the most to access them.

But much of the concern about the IRS and other federal agencies using ID.me revolves around its use of facial recognition technology and collection of biometric data.

Accuracy and bias

To start with, there are a number of general concerns about the accuracy of facial recognition technologies and whether there are discriminatory biases in their accuracy. These have led the Association for Computing Machinery, among other organizations, to call for a moratorium on government use of facial recognition technology.

A study of commercial and academic facial recognition algorithms by the National Institute of Standards and Technology found that U.S. facial-matching algorithms generally have higher false positive rates for Asian and Black faces than for white faces, although recent results have improved. ID.me claims that there is no racial bias in its face-matching verification process.

There are many other conditions that can also cause inaccuracy – physical changes caused by illness or an accident, hair loss due to chemotherapy, color change due to aging, gender conversions and others. How any company, including ID.me, handles such situations is unclear, and this is one issue that has raised concerns. Imagine having a disfiguring accident and not being able to log into your medical insurance company’s website because of damage to your face.

 

 

 

Facial recognition technology is spreading fast. Is the technology – and society – ready?

Data privacy

There are other issues that go beyond the question of just how well the algorithm works. As part of its process, ID.me collects a very large amount of personal information. It has a very long and difficult-to-read privacy policy, but essentially while ID.me doesn’t share most of the personal information, it does share various information about internet use and website visits with other partners. The nature of these exchanges is not immediately apparent.

So one question that arises is what level of information the company shares with the government, and whether the information can be used in tracking U.S. citizens between regulated boundaries that apply to government agencies. Privacy advocates on both the left and right have long opposed any form of a mandatory uniform government identification card. Does handing off the identification to a private company allow the government to essentially achieve this through subterfuge? It’s not difficult to imagine that some states – and maybe eventually the federal government – could insist on an identification from ID.me or one of its competitors to access government services, get medical coverage and even to vote.

As Joy Buolamwini, an MIT AI researcher and founder of the Algorithmic Justice League, argued, beyond accuracy and bias issues is the question of the right not to use biometric technology. “Government pressure on citizens to share their biometric data with the government affects all of us — no matter your race, gender, or political affiliations,” she wrote.

Too many unknowns for comfort

Another issue is who audits ID.me for the security of its applications? While no one is accusing ID.me of bad practices, security researchers are worried about how the company may protect the incredible level of personal information it will end up with. Imagine a security breach that released the IRS information for millions of taxpayers. In the fast-changing world of cybersecurity, with threats ranging from individual hacking to international criminal activities, experts would like assurance that a company provided with so much personal information is using state-of-the-art security and keeping it up to date.

[Over 140,000 readers rely on The Conversation’s newsletters to understand the world. Sign up today.]

Much of the questioning of the IRS decision comes because these are early days for government use of private companies to provide biometric security, and some of the details are still not fully explained. Even if you grant that the IRS use of the technology is appropriately limited, this is potentially the start of what could quickly snowball to many government agencies using commercial facial recognition companies to get around regulations that were put in place specifically to rein in government powers.

The U.S. stands at the edge of a slippery slope, and while that doesn’t mean facial recognition technology shouldn’t be used at all, I believe it does mean that the government should put a lot more care and due diligence into exploring the terrain ahead before taking those critical first steps.The Conversation

James Hendler, Professor of Computer, Web and Cognitive Sciences, Rensselaer Polytechnic Institute

This article is republished from The Conversation under a Creative Commons license. Read the original article.

James Hendler, Professor of Computer, Web and Cognitive Sciences, Rensselaer Polytechnic Institute
This article is republished from The Conversation under a Creative Commons license. 

Noshir Contractor elected president of ICA

WST Trustee and Web Science researcher Prof Noshir  Contractor has been elected as next president of the prestigious ICA ( International  Communications Association

Click here to see the details of the election.

About Noshir

Noshir S. Contractor is a Jane S. & William J. White Professor of Behavioral Sciences in the School of Engineering, School of Communication and the Kellogg School of Management at Northwestern University, USA. He is the director of Sonic Lab and a Trustee of the Web Science Trust.

About the ICA

(from current presidents introduction)

ICA started 70 years ago as a small organization of U.S.-based researchers. It has expanded to boast more than 6000 members in over 80 countries. Since 2003, we have been officially associated with the United Nations as a nongovernmental organization (NGO).

We publish five internationally renowned, peer-reviewed journals: Communication, Culture, and Critique (CCC), Communication Theory (CT), Human Communication Research (HCR), Journal of Communication (JoC), and the Journal of Computer-Mediated Communication (JCMC). Journal of Communication is the world’s top ranked communications journal on SCIMAGO, and Communication Theory is ranked #5.