WSTNet Lab Profile: Cardiff HateLab

Cardiff University is the home of a WSTNet lab with two related, but distinct groups: Pete Burnap’s Social Data Lab (based on data visualisation and analysis using COSMOS) which makes social media analysis much more accessible for non-coding academics and also Matt William’s HateLab which uses a COSMOS-based dashboard to identify and analyse hate speech structures and trends in a range of social media sources across modern forms of on-line hate including racial, political, gender and religious intolerance.  

Williams (who holds a chair in Criminology at Cardiff) has been researching the clues left in social media since 2011 but was frustrated by the lack of tools/accessibility for any but the most skilled coders and worked with Prof. Pete Burnap to develop a more user-friendly toolset called COSMOS which allows researchers to focus on the meanings and interpretations of social media data rather than the underlying technologies.

With new tools/possibilities delivered by COSMOS, new research questions began to surface and the “Hate Speech and Social media” project was launched in 2013. This led to the founding of the HateLab where Matt has been director since 2017 where his group has attracted more than £3m in funding. He has published a series of papers and in 2021 he published a summary of more than 20 years research in his book The Science of Hate


HateLab could be seen as something of a poster child for Web Science having been featured widely in the press and the media with HateLab research being covered in: LA TimesNew York PostThe Guardian (also here), The Times (also here and here), The Financial TimesThe IndependentTelegraph (also here), TortoiseNew ScientistPoliticoBBC NewsThe RegisterComputerWeeklyVerdictSky NewsTechWorld and Police Professional. On TV, their research underpinned an episode of BBC One’s Panorama, an episode of ITV’s Exposure and an ITV NEWS special report. HateLab is been used as part of the National Online Hate Crime Hub announced by the UK Home Secretary in 2017

HateLab collects data from several platforms including Twitter (They have also been highlighted by Twitter as a featured developer partner), 4Chan, Telegram and Reddit and the tools look for trends and patterns using AI techniques which link the timing, causality and impacts which can link physical acts of violence whilst the appearance and timing of hate speech. Williams has found certain patterns and timings in his work (he calls it the “half-life” of hate speech and this may be critical in understanding how to manage/calm/delay responses in on-line communities if strong reactions (esp. physical reactions to online hate speech) are seen to quickly fade and be much more temporary in nature than other forms of crime.

Whilst it is perhaps clear that real-world “trigger” events (such as Covid, Brexit, Trump speeches, London Bridge attacks etc.) can/do give rise to waves of on-line reactions (with hate being the least desirable of these) it is perhaps less obvious (and more interesting) to consider that a certain level and timing of hate speech might be associated with, and contribute to, higher levels of physical violence. HateLab is looking at the possibility of developing predictive models which not only allow non-academic groups how to gauge and better manage different types of hate speech and volatile communities on-line but might also help to prevent on-line hate spilling over into physical violence.

The recent case of Ex-President Trump and his on-line incitement to “march on the capital building” being a chilling example of the need for this sort of model.  

We asked Matt about his take on the new owner at Twitter and how Musk’s view on free speech might affect his research and his overall objective to reduce hate-speech …  

 “Twitter have been really busy since 2015 trying to manage the whole on-line harm issue and frankly they’ve done a pretty good job – They’ve employed huge numbers of moderators that have ensured that a lot of the more unpleasant material that is ON the platform (and that we have access to via the API for research purposes) is not VISIBLE on the platform where ordinary users can be harmed by it. There is obviously a trade-off between the notion of on-line harm and freedom of speech and we’ll have to wait and see what effect Elon’s new policies have on the resurgance of what is thought to be harmful content. Certainly we’ve seen a reduction in the amount of hatespeech across the twitter API over recent months/years but its unclear whether users have migrated to more tolerant platforms or whether the Twitter filtering is now being reflected in the API output. Overall we’ve had a very positive relationship with Twitter and we’d obviously like to continue to work with them”.


I have to admit to being just a tiny bit disappointed that Matt is not also the brains behind HateLab: the London-based cyberpunk band which I stumbled on when googling more about his work 😉

WSTnet Student Profile: Amir Javed

Amir, thanks for agreeing to be interviewed. Can you tell us where you are based and what your main research interests are..
I’m based at Cardiff University and my focus is on a particular type of cyber attack called “drive-by downloads” which are typically combined with social media posts on platforms like Twitter
How are these different from typical viruses or other attacks?
A Drive-By download involves one or more malicious scripts which execute without  requiring the user to specifically download or click a suspicious object – the act of visiting the URL is enough to infect the host machine.
How does the social media element play out here?
Social media platforms often host/distribute click-bait in the form of a message which provokes interest and/or an emotional reaction in the user and encourages them to follow a (typically shortened and hence unrecognisable) URL to respond to it.
So what angle is your research taking on this?
Rather than attempting to look at the vast range of topic/ideas that might prompt a user to follow click bait we are looking at the types of stimulus like Events (e.g. Sports matches) which have a specific date/time around which the click-bait and URLs may be focused. If we can work to specific events as a focus we may be able to analyse patterns of (social) attack discovering which users and sites are involved, how these are structured in terms of topics and social vectors and work to dampen the scale of the retweet network which is generated and ultimately predict where attacks may happen and find ways to inoculate against them.
What has your research shown so far?
We analysed tweets from several events and  categorised them as malicious or benign and within the malicious group the type of emotion (we discovered eight) that the tweets were trying to elicit to get a click-through or retweet. We found that fear-provoking tweets were most likely to be retweeted and persisted longer than other emotions.
We then analysed the effect of the different drive-by download scripts on the machine state of a test machine in order to subject these to a machine learning process. We were able to identify activities/patterns that the scripts attempted to execute on visiting the infected site and attempted to match/recognise these patterns within a short window as the script starts to execute. Success here would facilitate developing a “kill-switch” protocol that could potentially save the machine/network from infection. Our current model is identifying malicious URLs about 86% of the time which is very promising.
Where are you going next with the work?
We are keen to build a better profile of the influential users, the common topics, the infected sites (though these shift) and to be able to create an efficient and scalable method to scan for attacks/attackers using various factors (e.g. tweets from users created only hours/minutes before) such that we can weaken/disrupt the scale of the attack and ultimately inoculate users through an efficient combination of blacklisting and real-time detection processes.
How useful has the Web Science perspective been on this work?
Traditionally Cybersecurity has focused on machine impacts and technical networks but whilst the idea of the social exploit is far from new, social media enables social attacks and trust exploits on a scale we’ve never seen before and so understanding how social networks function and how they can be managed/influenced for better security is vital.
Where would you like to see Web Science go next as a discipline?
With a growing war between hackers and cybersecurity specialists there is not only a need to understand each specific attack in terms of machine learning/pattern matching but also to understand the broader social process of deliberate deception (feinting) in order to avoid detection. How do we filter for “noise”, fake data and other methods designed to fool automated detection and make our model resilient against such noise.
Amir has submitted his Thesis at Cardiff University and is shortly to be appointed a lecturer at Cardiff
Here are link to two of Amir’s related papers

WSTNet Lab Directors Meet at WebSci16

WSTNet Lab Directors Meeting, Hannover, 22 May 2016.

WSTNet Lab Directors Meeting, Hannover, 22 May 2016

WSTNet Lab Directors got together at the start of the Web Science Conference this week in Hannover, Germany. Highlights of the meeting include the election of Steffen Staab as Chair and Pete Burnap as Vice-Chair, planning for this years’ Web Science Summer School at University of Koblenz (30 June to 6 July – ), and firming up of arrangements for World Wide Web Week – a global event celebrating 10 years of Web Science to be held later this year.

Who’s who in the photo (from left to right): Thanassis Tiropanis (WSI), Manfred Hauswirth (FOKUS), Steffan Staab (Institute WeST), Noshir Contractor (SONIC), Sung-Hyon Myaeng (KAIST), Les Carr (WSI), John Erickson (RPI), Susan Davies (WST), Hans Akkermans (VU Amsterdam), Dave De Roure (Oxford e-Research), Anni Rowland-Campbell (Intersticia), Pete Burnap (Cardiff University), and Wolfgang Nejdl, (L3S).